| 研究生: |
陳欣怡 Chen, Hsin-Yi |
|---|---|
| 論文名稱: |
準大快速循序樣式樹之維護 Maintenance Algorithm for Pre-large FUSP Trees |
| 指導教授: |
洪宗貝
Hong, T. P. 李昇暾 Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 經營管理碩士學位學程(AMBA) Advanced Master of Business Administration (AMBA) |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 英文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 快速更新頻繁特徵樹 、循序序列 、準大快速更新循序樣式樹 、資料探勘 、準大法則 |
| 外文關鍵詞: | sequential patterns, pre-large FUSP tree, data mining, FUSP tree, pre-large concept |
| 相關次數: | 點閱:124 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年來,隨著資訊科技的發達和廣泛的應用,如何從這些龐大且複雜的資料當中萃取有用的資訊與知識,儼然已成為一個重要的研究領域。在這些大量的資料庫中,如何找出有用的消費者使用習性以供決策者做判斷更是重要的議題。在過去,研究者通常假設資料庫是靜態的以簡化資料萃取的問題,也因此大部份的傳統資料萃取演算法都屬於批次處理,而無法利用之前已萃取出來的資訊,來幫助從不斷更新的資料庫中作挖掘。在實際生活中,資料庫是不斷的被加入或是刪除的。因此,設計一份有效且有用的資料探勘方法亦趨重要。在本論文中,我們把快速更新序列樹的演算法做一個擴充,提出了準大快速更新循序樣式樹的資料結構和其維護的演算法。透過已經先建立的準大快速更新循序樣式樹,當有資料做新增或是刪除時,只需要對準大快速更新循序樣式樹去做一個維護的工作。透過我們所提出的維護演算法,我們可以減少重掃整個資料庫以及重新建樹的時間,使它能有效的處理交易資料的新增或是刪除。實驗結果也顯示,在處理交易資料的新增或是刪除時,我們提出的準大快速更新循序樣式樹之維護演算法可以比批次的快速更新序列數樹或是快速更新序列數樹的維護演算法更快的去更新資料庫。
Mining useful information and helpful knowledge from large databases has evolved into an important research area in recent years. Among the classes of knowledge derived, finding sequential patterns in temporal transaction databases is very important since it can help to model customer behavior. In the past, researchers usually assumed databases were static to simplify data-mining problems. In real-world applications, new customer sequences may be added or deleted to databases frequently. Designing an efficient and effective mining algorithm that can maintain sequential patterns as a database grows is thus important. In this thesis, we attempt to extend the FUSP-tree construction algorithm for efficiently handling the insertion and deletion of sequential patterns based on the concepts of pre-large sequences in order to decrease time consuming in re-scan the whole database and re-construct the pre-large FUSP tree. Experimental results also show that the proposed pre-large FUSP-tree maintenance algorithm for the insertion or deletion of transactions ran faster than batch FUSP-tree algorithm and the FUSP-tree maintenance algorithm, and uploaded the database much quickly.
[1] R. Agrawal, T. Imielinksi and A. Swami, “Mining association rules between sets of items in large database,” The ACM SIGMOD Conference, pp. 207-216, 1993.
[2] R. Agrawal, T. Imielinksi and A. Swami, “Database mining: a performance perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925, 1993.
[3] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” The International Conference on Very Large Data Bases, pp. 487-499, 1994.
[4] R. Agrawal and R. Srikant, “Mining sequential patterns,” The Eleventh IEEE International Conference on Data Engineering, pp. 3-14, 1995.
[5] R. Agrawal, R. Srikant and Q. Vu, “Mining association rules with item constraints,” The Third International Conference on Knowledge Discovery in Databases and Data Mining, pp. 67-73, 1997.
[6] R. Srikant and R. Agrawal, Mining Sequential Patterns: Generalizations and Performance Improvements, Proc. 5th Int’l Extending Database Technology, pp.3-17, 1996
[7] M. S. Chen, J. Han and P. S. Yu, “Data mining: An overview from a database perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866-883, 1996.
[8] D. W. Cheung, J. Han, V.T. Ng, and C. Y. Wong, “Maintenance of discovered association rules in large databases: An incremental updating approach,” The Twelfth IEEE International Conference on Data Engineering, pp. 106-114, 1996.
[9] D. W. Cheung, S. D. Lee, and B. Kao, “A general incremental technique for maintaining discovered association rules,” In Proceedings of Database Systems for Advanced Applications, pp. 185-194, Melbourne, Australia, 1997.
[10] H. Cheng, X. Yan, and J. Han, “IncSpan: incremental mining of sequential patterns in large database”, The ACM SIGKDD international conference on Knowledge discovery and data mining, pp.527-532, 2004.
[11] C. I. Ezeife, “Mining Incremental association rules with generalized FP-tree,” Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, pp. 147-160, 2002.
[12] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama, "Mining optimized association rules for numeric attributes," The ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 182-191, 1996.
[13] J. Han and Y. Fu, “Discovery of multiple-level association rules from large database,” The Twenty-first International Conference on Very Large Data Bases, pp. 420-431, 1995.
[14] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” The 2000 ACM SIGMOD International Conference on Management of Data, 2000.
[15] T. P. Hong, C. Y. Wang, and Y. H. Tao, “A new incremental data mining algorithm using pre-large itemsets,” Intelligent Data Analysis, Vol. 5, No. 2, pp. 111-129, 2001.
[16] T. P. Hong, C. Y. Wang, and S. S. Tseng, “Incremental data mining for sequential patterns using pre-large sequences,” The fifth world multi-conference on Systemic, Cybernetics and Informatics, 2001.
[17] T. P. Hong and C. Y. Wang, “Maintenance of association rules using pre-large itemsets,” Intelligent Databases: Technologies and Applications, pp. 44-60, 2006.
[18] M. V. Joshi, G. Karypis, and V. Kumar. A universal formulation of sequential patterns, Technical Report, Department of Computer Science, University of Minnesota, 1999.
[19] C. W. Lin, T. P. Hong, W. H. Lu and W. Y. Lin, “An Incremental FUSP-Tree Maintenance Algorithm,” International Conference on Intelligent System Design and Applications, pp. 445 - 449, 2008.
[20] C. W. Lin, T. P. Hong and W. H. Lu, “An Efficient FUSP-Tree Update Algorithm for Deleted Customer Sequences,” The Fourth International Conference on Innovative Computing, Information and Control, 2009 (accepted to appear).
[21] M. Y. Lin and S.Y. Lee, “Incremental update on sequential patterns in large databases,” The Tenth IEEE International Conference on Tools with Artificial Intelligence, pp. 24-31, 1998.
[22] M. Y. Lin, S. Y. Lee, and S. S. Wang, DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology, Proc. of the 6th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp.198-209, 2002.
[23] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient algorithm for discovering association rules,” The AAAI Workshop on Knowledge Discovery in Databases, pp. 181-192, 1994.
[24] H. Mannila, H. Toivonen, and A. Inkeri Verkamo, Discovery of Frequent Episodes in Event Sequences, Data Mining and Knowledge Discovery, vol. 1(3), pp. 259-289, 1997
[25] F. Masseglia, P. Poncelet, and M. Teisseire, Pre-Processing Time Constraints for Efficiently Mining Generalized Sequential Patterns, Proc. of the 11th Int’l Symposium on Temporal Representation and Reasoning (TIME'04), pp. 87-95, 2004.
[26] J. S. Park, M. S. Chen, P. S. Yu, “Using a hash-based method with transaction trimming for mining association rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, pp. 812-825, 1997.
[27] Y. Qiu, Y. J. Lan and Q. S. Xie, “An improved algorithm of mining from FP-tree,” Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, August, pp 26-29, 2004.
[28] N. L. Sarda and N. V. Srinivas, “An adaptive algorithm for incremental mining of association rules,” The Ninth International Workshop on Database and Expert Systems, pp. 240-245, 1998.
[29] R. Srikant and R. Agrawal, “Mining generalized association rules,” The Twenty-first International Conference on Very Large Data Bases, pp. 407-419, 1995.
[30] R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” The 1996 ACM SIGMOD International Conference on Management of Data, pp. 1-12, 1996.
[31] C.Y. Wang, T.P. Hong, S.S. Tseng, “Maintenance of sequential patterns for record deletion,” IEEE International Conference on Data Mining, pp. 536-541, 2001.
[32] O. R. Zaiane and E. H. Mohammed, “COFI-tree mining: A new approach to pattern growth with reduced candidacy generation,” IEEE International Conference on Data Mining, 2003.