簡易檢索 / 詳目顯示

研究生: 蘇怡仁
Su, Yi-Jen
論文名稱: 利用MPM分群技術支援系統管理者加強混合式建議服務
Using MPM Clustering to Support System Administrators to Enhance Hybrid-Based Recommendation Service
指導教授: 蔡尚榮
Tsai, Shang-Rong
焦惠津
Jiau, Hewijin Christine
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 93
中文關鍵詞: 建議服務分群
外文關鍵詞: Clustering, Recommendation Service
相關次數: 點閱:54下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 分群已經被廣泛的應用在樣式辨認、資料分析、影像處理、生物學及市場研究上。大多數的傳統分群演算法採用以距離為基礎的相似度量測法來計算資料彼此之間的相異度,但是此一量測方式並不適用於非數值屬性的資料,例如名詞性、布林、有序及類別資料等。在分群中,若應用一個不適合的相似度量測法可能導致一些隱藏在資料屬性間的重要資訊在分群的過程中被遺漏了,進而產生低品質分群的現象。
    這篇論文主要提出以MPM分群的演算法來提升建議引擎產生建議的品質,經由MPM演算法有效地將使用者在網站上的瀏覽路徑分群,以達到每一個群內的資料具有較高的相似度及群與群之間的資料有較低相似度的目標。MPM分群演算法主要是使用兩個技術來將瀏覽路徑資料做有效的分群,一為設計具有擷取循序資料特性的相似度量測,以計算使用者瀏覽路徑間之相似度;其次為利用矩陣排列和矩陣切割的方法來達到最佳化分群的目的。同時在論文中,也提出Heuristic_MPM分群演算法來改善MPM演算法之分群效率的問題。最後為了驗證MPM分群演算法是否可用來解決其他領域的分群需求,我們嘗試將它應用在Intrusion Detection系統的設計上,利用學習attack的行為特性來偵測其attack行為的企圖,亦達到不錯的辨識率的要求。
    建議服務藉由學習使用者以往的共同行為模式來預估其目前行為的目的,進而提供適當的建議給使用者來使用。網站管理者應用建議服務來達到提供自動化的資訊過濾服務給使用者,除了可以提高使用者再度蒞臨瀏覽網站的意願外,另一方面也可以有效地提升網站的競爭力。針對Web環境具有動態的特性,本研究也提出一個利用Moving Average Rule來建構一個可以察覺瀏覽趨勢的改變而機動地調整建議內容的Web建議引擎,系統管理者可以依照其網站的特性挑選不同建議產生方法的組合,以提供更恰當的建議服務給使用者。這一個動態調整建議的機制已實作在Linux作業系統的Apache Web伺服器上。

    Clustering has been widely adopted in numerous applications, including pattern recognition, data analysis, biology, image processing, and market research. When performing data mining, traditional clustering algorithms which use distance-based measurements to calculate the difference between data are unsuitable for non-numeric attributes such as nominal, Boolean, and categorical data. Applying an unsuitable similarity measurement in clustering may cause some valuable information embedded in the data attributes to be lost, and hence low quality clusters will be created.
    The thesis proposes a novel hierarchical clustering algorithm, referred to as MPM, to enhance the recommendation engine by high quality clusters. The goal is to retain the data features of interest while effectively grouping users' browsing patterns into clusters with high intra-class similarity and low inter-class similarity. MPM achieves these goals through two principal methods: (1) the adoption of a novel similarity measurement that has the ability to capture the "characterized properties" of information, and (2) the application of matrix permutation and matrix participation partitioning to the results of the similarity measurement (constructed in the form of a similarity matrix) for assigning data to appropriate clusters. This study also proposes a heuristic-based algorithm, the Heuristic_MPM, to reduce the processing times required for matrix permutation and matrix partitioning, which together constitute the bulk of the total MPM execution time. To extend the application domain of the MPM algorithm, it has been used to recognize the signature of intrusion activities. In simulation results, the novel signature-based intrusion detection system can identify known intrusion attacks affectively.
    The recommendation service uses recognized users' behavior patterns to predict users' goal. Based on the Moving Average Rule, this thesis also proposes a novel Web recommendation engine framework to bridge response to new navigation trends and dynamically adapt recommendations for users with appropriate suggestions through hyperlinks. The framework provides website administrators with various methods to generate recommendations. Furthermore, it responds to new Web trends, including Web pages that have been updated but have not yet been integrated into regular browsing patterns. Ultimately, this system empowers websites with dynamic recommendation set to effectively tailor users' needs. The dynamically adaptive recommendation mechanism has been implemented on Apache Web server in Linux environment. The simulation results demonstrate that the scheme outperformed static mechanism.

    Chinese Abstract i English Abstract iii Acknowledgment v Table of Contents vi List of Tables viii List of Figures ix 1. Introduction 1 1.1 Motivation 1 1.1.1 Clustering 1 1.1.2 Dynamic Browsing Trend 2 1.2 Thesis Contributions 4 1.2.1 MPM A Hierarchical Clustering Algorithm 4 1.2.2 Dynamic Web Recommendation Engine 5 1.3 Thesis Organization 6 2. Related Work 7 2.1 Clustering 7 2.1.1 Categorization of Clustering Methods 7 2.1.2 Sequential Data Clustering 8 2.1.3 Links-based Clustering Algorithm 9 2.2 Intrusion Detection System 10 2.3 Recommendation System 11 2.4 Web Recommendation System 12 2 5 Web Usage Mining 14 2 6 Association Rules 16 2.7 Pattern Similarity Measurement 18 2.8 Feature Matrices Model 19 2.9 Moving Average Rule 21 3. MPM Clustering 23 3.1 Similarity Measurement 24 3.2 Matrix Permutation 27 3.3 Similarity Matrix Partition 31 3.4 Avoiding the Ripple Effect 36 4. Experimental Results of MPM 38 4.1 Synthetic Data 38 4.2 Experimental Results of MPM Algorithm 40 4.3 Heuristic MPM Algorithm 41 4.4 Time and Space Complexity 48 5. MPM in Intrusion Detection 50 5.1 Data Mining and Intrusion Detection 51 5.1.1 Using Clustering in Intrusion Detection 52 5.2 MPMbased Intrusion Detection System 53 5.3 Experiment Results of MPM-based IDS 56 6. Dynamic Web Recommendation Engine 62 6.1 Site Topology Matrix 64 6.2 Recommendation Generator 66 6.3 Recommendation Coordinator 73 7. Experiment Results of the DynamicWeb Recommendation Engine 76 8. Conclusions and Future Work 81 8.1 Summary 81 8.2 Discussions and Future Work 83 Appendix 85 References 87 Vita 93

    [1] M. S. Chen, J. Han, and P. S. Yu, Data Mining: An Overviewfrom Database Perspective,”IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866–883, Dec. 1996.
    [2] A. Joshi and R. Krishnapuram, Robust Fuzzy Clustering Methods to Support Web Mining,”Proceedings of the ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, pp. 15–1–15–8, June 1998.
    [3] Y. J. Su, H. C. Jiau, Y. M. Lin, and S. R. Tsai, “MPM: A Matrix based Partition Method for Non-numeric Data Clustering,” Proceedings of the 2003 International Conference on Informatics, Cybernetics, and Systems, Dec. 2003.
    [4] P. Berkhin, “Survey of Clustering Data Mining Techniques,” tech. rep., Accrue Software Inc., 2002.
    [5] O. Nasraoui and C. Rojas, “From Static to Dynamic Web Usage Mining: Towards Scalable Profiling and Personalization with Evolutionary Computation,” Proceedings of Workshop on Information Technology, Mar. 2003.
    [6] Y. J. Su, H. C. Jiau, and S. R. Tsai, “Design a Dynamically Adaptive Recommendation Engine on WWW,” Proceedings of National Computer Symposium (NCS 2001), pp. I129–I136, Dec. 2001.
    [7] Y. J. Su, H. C. Jiau, and S. R. Tsai, “Adaptive Web Recommendation for New Navigation Trends,” Proceedings of the International Computer Symposium (ICS 2002), pp. 1359–1366, Dec. 2002.
    [8] Y. J. Su, H. C. Jiau, and S. R. Tsai, “Using the Moving Average Rule in a Dynamic Web Recommendation System,” International Journal of Intelligent Systems, vol. 22, no. 6, pp. 1–19, June 2007.
    [9] H. C. Jiau, Y. J. Su, Y. M. Lin, and S. R. Tsai, “MPM: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data,” Journal of Intelligent information Systems, vol. 26, no. 2, pp. 185–207, Mar. 2006.
    [10] S. Guha, R. Rasrogi, and K. Shim, “ROCK:A Robust Clustering Algorithm for Categorical Attributes,”Proceedings of the IEEE International Conference on Data Engineering, pp. 512–521, Mar. 1999.
    [11] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
    [12] L. Talavera and J. Bejar, “Generality Based Conceptual Clustering with Probabilistic Concepts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 196–206, Feb. 2001.
    [13] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
    [14] J. B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, June 1965.
    [15] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 281–297, June 1996.
    [16] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,”Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 73–84, June 1998.
    [17] G. Karypis, E. H. Han, and V.Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling,” IEEE Computer, vol. 32, no. 8, pp. 68–75, Aug. 1999.
    [18] C. Shahabi, F. BanaeiKashani, J. Faruque, and A. Faisal, “Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining,” Proceedings of the Second International Conference on Electronic Commerce and Web Technologies, pp. 280–294, Sep. 2001.
    [19] H. Debar, M. Dacier, and A.Wespi, “Towards a taxonomy of intrusion detection systems,” Computer Network, vol. 31, no. 8, pp. 805–822, Apr. 1999.
    [20] S. Kumar and E. Spafford, “A Pattern Matching Model for Misuse Intrusion Detection,” Computers and Security, vol. 14, no. 1, pp. 28–28(1), 1995.
    [21] U. Lindqvist and P. A. Porras, “Detecting Computer and Network Misuse Through the Production Based Expert System Toolset (PBEST),” Proceedings of the 1999 IEEE Symposium on Research in Security and Privacy, pp. 146–161, May 1999.
    [22] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. Kluwer, 2002.
    [23] L. Ertoz, A. Lazarevic, E. Eilertson, A. Lazarevic, P. Tan, P. Dokas, V. Kumar, and J. Srivastava, “Protecting Against Cyber Threats in Networked Information Systems,” Proceedings of the SPIE Annual Symposium on Aero Sense, Battle space Digitization and Network Centric Systems III, Apr. 2003.
    [24] R. Sekar, M. Bendre, D. Dhurjati, and P. Bollineni, “A Fast Automaton Based Method for Detecting Anomalous Program Behaviors,” Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 144–155, May 2001.
    [25] W. Lee and D. Xiang, “Information theoretic measures for anomaly detection,” Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 130–143, May 2001.
    [26] K. Julisch, “Clustering Intrusion Detection Alarms to Support Root Cause Analysis,” ACM Transactions on Information and System Security, vol. 6, no. 4, pp. 443–471, Nov. 2003.
    [27] W. Lee and S. Stolfo, “A Framework for Constructing Features and Models for Intrusion Detection Systems,” ACM Transactions on Information and System Security, vol. 3, no. 4, pp. 224–261, Nov. 2000.
    [28] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: a survey of the state of the art and possible extensions,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734–749, June 2005.
    [29] W. Hill, L. Stead,M. Rosenstein, and G. Furnas, “Recommending and Evaluating Choices in aVirtual Community of Use,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 194–201, May 1995.
    [30] P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl, “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proceedings of the CSCW 94 Conference on Computer Supported Cooperative Work, pp. 175–186, Oct. 1994.
    [31] U. Shardanand and P. Maes, “Social Information Filtering: Algorithms for Automating ’Word of Mouth’,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 210–217, May 1995.
    [32] M. Balabanovic and Y. Shoham, “Fab: Content-Based, Collaborative Recommendation,” Communications of ACM, vol. 40, no. 3, pp. 66–72, Mar. 1997.
    [33] P. Melville, R. Mooney, and R. Nagarajan, “Content Boosted Collaborative Filtering for Improved Recommendations,” Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 187–192, July 2002.
    [34] C. Basu, H. Hirsh, and W. Cohen, “Recommendation as Classification: Using Social and Content-Based
    Information in Recommendation,” Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence (AAAI 1998), pp. 714–720, July 1998.
    [35] R. J. Mooney, P. N. Bennett, and L. Roy, “Book Recommending Using Text Categorization with Extracted Information,” Proceedings of the AAAI Workshop on Recommender Systems, pp. 70–74, July 1998.
    [36] M. Pazzani and D. Billsus, “Learning and Revising User Profiles: The Identification of Interesting Web Sites,” Machine Learning, vol. 27, no. 3, pp. 313–331, June 1997.
    [37] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, “Evaluating collaborative filtering recommender systems,” ACM Transactions on Information Systems, vol. 22, no. 1, pp. 5–53, Jan. 2004.
    [38] M. D. Mulvenna, S. S. Anand, and A. G. Buchner, “Personalization on the Net using Web Mining,” Communications of the ACM, vol. 43, no. 8, pp. 122–125, Aug. 2000.
    [39] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic Personalization Based on Web Usage Mining,” Communications of the ACM, vol. 43, no. 8, pp. 142–151, Aug. 2000.
    [40] J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan, “Web usage mining: Discovery and applications of usage patterns from web data,” SIGKDD Explorations, vol. 1, no. 2, pp. 1–12, Jan. 2000.
    [41] R. Cooley, B. Mobasher, and J. Srivastava, “Web Mining: Information and Pattern Discovery on the World Wide Web,” Proceedings of the Ninth International Conference on Tools with Artificial Intelligence, pp. 558–567, Nov. 1997.
    [42] R. Cooley, B. Mobasher, and J. Srivastava, “Data preparation for mining World Wide Web browsing patterns,” Journal of Knowledge and Information Systems, vol. 1, no. 1, pp. 5–32, Feb. 1999.
    [43] Y. Zhou and B. Mobasher, “Web User Segmentation Based on a Mixture of Factor Analyzers,”Proceedings of the International Conference on E-Commerce and Web Technologies (ECWeb’06), pp. 11–20, Sep. 2006.
    [44] F. Masseglia, P. Poncelet, and M. Teisseire, “Using Data Mining Techniques on Web Access Logs to Dynamically Improve Hypertext Structure,” ACM SIGWEB Newsletter, vol. 8, no. 3, pp. 13–19, Oct. 1999.
    [45] B. Mobasher, R. Cooley, and J. Srivastava, “Creating Adaptive Web Sites Through Usage-Based Clustering of URLs,” Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange, pp. 19–25, Nov. 1999.
    [46] M. Perkowitz and O. Etzioni, “Adaptive Web Sites: Conceptual Cluster Mining,” Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 264–269, July 1999.
    [47] B. Mobasher, H. Dai, T. Luo, Y. Sung, and J. Zhu, “Discovery of Aggregate Usage Profiles for Web Personalization,” Data Mining and Knowledge Discovery, vol. 6, no. 1, pp. 61–82, Jan. 2002.
    [48] O. R. Zaiane, M. Xin, and J. Han, “Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs,” Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries, pp. 19–29, Apr. 1998.
    [49] M. Spiliopoulou, “Web usage mining for site evaluation: Making a site better fit its users,” Communications of the ACM, vol. 43, no. 8, pp. 127–134, Aug. 2000.
    [50] M. Spiliopoulou and L. C. Faulstich, “WUM: A Web Utilization Miner,” Proceedings of the EDBT Workshop of the WebDB98, pp. 109–115, Mar. 1998.
    [51] M. Spiliopoulou, L. C. Faulstich, and K. Wilkler, “A Data Miner analyzing Navigational Behavior of Web Users,” Proceedings of the Workshop on Machine Learning in User Modeling of the ACAI’99 International Conference, pp. 588–589, July 1999.
    [52] O. market web reporter, 1996. URL http://www.openmarket.com.
    [53] Webtrends, 1995. URL http://www.webtrends.com.
    [54] net.Genesis, 1996. URL http://www.netgen.com.
    [55] B. Mobasher and N. Jain and E. Han and J. Srivastava, “Web mining: Pattern discovery from world wide web transactions,” Tech. Rep. TR96050, University of Minnesota, Sep. 1996.
    [56] R. Cooley, B. Mobasher, and J. Srivastava, “Grouping web page references into transactions for mining World Wide Web browsing patterns,” Proceedings of the 1997 IEEE Knowledge and Data Engineering Exchange Workshop, p. 2, Nov. 1997.
    [57] P. Pirolli and S. Card, “Information foraging in information access environments,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 51–58, May 1995.
    [58] P. Pirolli, J. Pitkow, and R. Rao, “Silk from a sow’s ear: Extracting usable structures from the web,” Proceedings of the 1996 Conference on Human Factors in Computing Systems, pp. 118–125, Apr. 1996.
    [59] I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White, “Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering,” Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280–284, Aug. 2000.
    [60] R. Krishnapuram, A. Joshi, O. Nasraoui, and L. Yi, “Low Complexity Fuzzy Relational Clustering Algorithms for Web Mining,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp. 596–607, Aug. 2001.
    [61] O. Nasraoui, H. Frigui, R. Krishnapuram, and A. Joshi, “Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering,” International Journal of Artificial Intelligence Tools, vol. 9, no. 4, pp. 509–526, Dec. 2000.
    [62] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp. 487–499, Sep. 1994.
    [63] J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 5, pp. 209–221, Nov. 1998.
    [64] R. Gencay and T. Stengos, “Moving Average Rule, Volume and Predictability of Security Returns with Feedforward Network,” Journal of Forecasting, vol. 17, no. 5/6, pp. 401–414, Sep. 1998.
    [65] F. E. James, “Monthly Moving Averages An Effective Investment Toll?,” Journal of Financial and Quantitative Analysis, vol. 3, no. 3, pp. 315–326, Sep. 1968.
    [66] A. Gionis, T. Kujala, and H. Mannila, “Fragments of order,” Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 129–136, June 2003.
    [67] H. Mannila, H. Toivonen, and A. I. Verkamo, “Discovery of frequent episodes in event sequences,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259–289, Nov. 1997.
    [68] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14, Mar. 1995.
    [69] W. Lee, S. Stolfo, , and K. Mok, “Mining in a Dataflow Environment: Experience in Network Intrusion Detection,” Proceedings of the Fifth ACM SIGKDD International
    Conference on Knowledge Discovery and Data Mining, pp. 114–124, Aug. 1999.
    [70] D. Barbara, J. Couto, S. Jajodia, and N. Wu, “ADAM: A Testbed for Exploring the Use of Data Mining in Intrusion Detection,” ACM SIGMOD Record, vol. 30, no. 4, pp. 15–24, Dec. 2001.
    [71] L. Portnoy, E. Eskin, and S. J. Stolfo, “Intrusion detection with unlabeled data using clustering,” Proceedings of the ACM CSS Workshop on Data Mining Applied to Security, Nov. 2001.
    [72] P. K. Chan, M. V. Mahoney, and M. H. Arshad, Managing Cyber Threats: Issues, Approaches, and Challenges (Massive Computing). Springer, 2005.
    [73] X. Li and N. Ye, “Mining normal and intrusive activity patterns for computer intrusion detection,” Proceedings of the Second Symposium on Intelligence and Security Informatics, pp. 226–238, June 2004.
    [74] X. Li and N.Ye, “Grid and dummy cluster based learning of normal and intrusive clusters for computer intrusion detection,” Quality and Reliability Engineering International, vol. 18, no. 3, pp. 344–377, May 2002.
    [75] The third international knowledge discovery and data mining tools competition dataset, 1999. URL http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
    [76] B. Pfahringer, “Winning the KDD99 Classification Cup: Bagged Boosting,” ACM SIGKDD Explorations Newsletter, vol. 1, no. 2, pp. 65–66, Jan. 2000.
    [77] M. Sabhnani and G. Serpen, “Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context,” Proceedings of the International Conference on Machine Learning, Models, Technologies and Applications, pp. 209–215, June 2003.
    [78] G. Pant, P. Srinivasan, and F. Menczer, “Crawling the Web,” Web Dynamics: Adapting to Change in Content, Size, Topology and Use, pp. 153–178, 2004.
    [79] M. Kobayashi and K. Takeda, “Information retrieval on the web,” ACM Computing Surveys, vol. 32, no. 2, pp. 144–173, June 2000.

    下載圖示 校內:2008-08-21公開
    校外:2008-08-21公開
    QR CODE