| 研究生: |
林浚瑋 Lin, Chun-Wei |
|---|---|
| 論文名稱: |
擴充頻繁樣式樹於遞增式、效益與模糊項目集探勘之研究 Extension of Frequent Pattern Trees for Incremental, Utility and Fuzzy Itemsets Mining |
| 指導教授: |
盧文祥
Lu, Wen-Hsiang |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2009 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 155 |
| 中文關鍵詞: | 資料探勘 、模糊資料探勘 、效益探勘 、樹狀結構 、維護演算法 |
| 外文關鍵詞: | data mining, fuzzy data mining, utility mining, tree structure, maintenance algorithm |
| 相關次數: | 點閱:184 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,資料探勘研究已經成為重要的研究主題,而關聯規則探勘在資料探勘裡是最熱門的研究主題之一。在過去,傳統演算法以批次的方式去處理所有的交易記錄來探勘關聯規則。在一般的商業交易系統裡,交易記錄在動態的資料庫裡常會不斷地被新增、刪除或是修改。因此,設計一個良好的演算法可以有效地在動態資料庫裡去維護已經探勘出來的規則變得相當重要。在論文的第一部份,針對在動態資料中交易記錄的新增、刪除或是修改的情況下,三種準大快速更新頻繁特徵樹(Pre-FUFP-tree)演算法因此被提出並用來有效地維護更新樹的結構。基於準大項目集的內容(pre-large concepts)裡的兩個門檻值,直到一定數量的交易記錄被處理前,我們可以避免掉重建樹狀結構的需求。而頻繁樣式成長相似(FP-growth-like)演算法也被用來從更新後的快速更新頻繁樣式樹(FUFP tree)裡去探勘出所需要的資訊。
在關聯規則探勘裡,項目(item)在交易資料裡經常只被當成購買或是沒有購買的兩種二元變數(binary variables)。效益探勘(utility mining) 因此被提出用來顯示出其他的隱含因素,像是價錢或是利潤(效益)。論文的第二部份,基於向下封閉性的特性(downward closure property),新的高效益樣式樹(HUP-tree)演算法被提出來有效地探勘高效益項目集。為了後續的探勘過程,相關的資訊因此被保留在高效益樣式樹(HUP tree)裡。為了探勘高效益項目集,因此我們也提出一個高效益樣式成長(HUP-growth)演算法。
過去的關聯規則探勘方法僅關注處理二元的資料庫。近年來,為了處理數量資料庫的問題,許多基於階層式的模糊資料探勘演算法也因此被提出。在論文的第三部份,我們嘗試擴充頻繁樣式樹的演算法以用來處理在數量資料庫裡模糊區域的全域值。模糊頻繁特徵樹(fuzzy FP-tree)演算法、壓縮模糊頻繁特徵樹(CFFP-tree)演算法與較高邊界模糊頻繁特徵樹(UBFFP-tree)演算法因此被提出以有效地探勘出模糊頻繁項目集。為了減少處理的時間,最大基數(maximum cardinality)被用來使得模糊區域的數目等於原本的項目數目。基於所設計的樹狀結構,三種探勘演算法因此被分別地提出用來探勘出模糊頻繁項目集。
在本論文裡,實驗結果分別地顯示出所提出的演算法於處理關聯規則探勘、高效益探勘與模糊資料探勘這三方面的效能。
Data mining, also referred to as knowledge discovery, has recently emerged as an important research topic, and association rules mining is considered as one of the most referenced sub-topics in data mining. In the past, traditional algorithms process all records in a batch way for mining association rules. In real-world applications, records are constantly being inserted, deleted or modified in dynamic databases. Designing an algorithm that can efficiently maintain association rules in dynamic databases is critically important. In the first part of this dissertation, three Pre-FUFP maintenance algorithms are thus proposed to efficiently maintain and update the FUFP-tree structures regardless of whether records are inserted, deleted or modified in dynamic databases. Based on two support thresholds of pre-large concepts, it helps avoid the need to re-build the tree structure until after a number of records have been processed. The FP-growth-like algorithm is then implemented to mine the desired information for the updated FUFP trees.
In the association rules mining, it treats items as binary variables in databases, which considers whether an item is bought in a record or not. Utility mining was thus proposed to reflect any other implicit factors, such as prices or profits. In the second part of this dissertation, a novel HUP-tree algorithm is proposed to efficiently mine the high utility itemsets based on the downward closure property. A HUP tree is first designed to keep the related information for later mining process. A HUP-growth mining algorithm is then presented to efficiently mine high utility itemsets from it.
In the past, most association rules mining focused on processing binary variables in databases. In recent years, many fuzzy data mining algorithms have been proposed for managing quantitative data, and most of them are processed in the level-wise approaches. In the third part of this dissertation, we attempt to extend the FP-tree algorithm for handling quantitative data from the global values of fuzzy regions. Thus, the fuzzy FP-tree algorithm, the compressed fuzzy frequent pattern tree (CFFP-tree) algorithm, and the upper-bound fuzzy frequent pattern tree (UBFFP-tree) algorithm are then proposed to efficiently mine the fuzzy frequent itemsets. The maximum cardinality is used to make the number of fuzzy regions processed equivalent to the number of the original items for reducing the processing time. Three mining algorithm are then proposed to mine the fuzzy frequent itemsets based on the designed tree structures, respectively.
Experimental results showed that the performance of the proposed algorithms in three parts of this dissertation for handling association rules mining, high utility mining and fuzzy data mining, respectively.
[1] R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad, "A tree projection algorithm for generation of frequent item sets," Journal of Parallel and Distributed Computing, vol. 61, pp. 350-371, 2001.
[2] R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases," The 20th International Conference on Very Large Data Bases, pp. 487-499, 1994.
[3] R. Agrawal and R. Srikant, "Mining sequential patterns," The International Conference on Data Engineering, pp. 3-14, 1995.
[4] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases," International Conference on Management of Data, pp. 207-216, 1993.
[5] R. Agrawal, T. Imielinski, and A. Swami, "Database mining: A performance perspective," IEEE Transactions on Knowledge and Data Engineering, vol. 5, pp. 914-925, 1993.
[6] F. Berzal, J. C. Cubero, N. Marín, and J. M. Serrano, "Tbar: An efficient method for association rule mining in relational databases," Data and Knowledge Engineering, vol. 37, pp. 47-64, 2001.
[7] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, "Dynamic itemset counting and implication rules for market basket data," SIGMOD Record, vol. 26, pp. 255-264, 1997.
[8] K. C. C. Chan and W. H. Au, "Mining fuzzy association rules," The 6th International Conference on Information and Knowledge Management, pp. 209-215, 1997.
[9] R. Chan, Q. Yang, and Y. D. Shen, "Mining high utility itemsets," The 3rd IEEE International Conference on Data Mining, pp. 19-26, 2003.
[10] C. C. Chang and C. Y. Lin, "Perfect hashing schemes for mining association rules," The Computer Journal, vol. 48, pp. 168-179, 2005.
[11] M. S. Chen, J. Han, and Y. P. S., "Data mining: An overview from a database perspective," IEEE Transactions on Knowledge and Data Engineering, vol. 8, pp. 866-883, 1996.
[12] D. W. Cheung, H. Jiawei, V. T. Ng, and C. Y. Wong, "Maintenance of discovered association rules in large databases: An incremental updating technique," The 12th International Conference on Data Engineering, pp. 106-114, 1996.
[13] D. W. L. Cheung, S. D. Lee, and B. Kao, "A general incremental technique for maintaining discovered association rules," The 5th International Conference on Database Systems for Advanced Applications, pp. 185-194, 1997.
[14] M. Delgado, N. Marin, D. Sanchez, and M. A. Vila, "Fuzzy association rules: General model and applications," IEEE Transactions on Fuzzy Systems, vol. 11, pp. 214-225, 2003.
[15] D. Dubois, E. H¨ullermeier, and H. Prade, "A systematic approach to the assessment of fuzzy association rules," Data Mining and Knowledge Discovery, vol. 13, pp. 167-192, 2006.
[16] C. I. Ezeife and Y. Su, "Mining incremental association rules with generalized fp-tree," The 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, pp. 147-160, 2002.
[17] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, "Mining optimized association rules for numeric attributes," The15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 182-191, 1996.
[18] B. Goethals, Frequent itemset mining dataset repository. Available: http://fimi.cs.helsinki.fi/data/.
[19] G. Grahne and J. Zhu, "Fast algorithms for frequent itemset mining using fp-trees," IEEE Transactions on Knowledge and Data Engineering, vol. 17, pp. 1347-1362, 2005.
[20] J. Han, J. Pei, Y. Yin, and R. Mao, "Mining frequent patterns without candidate generation: A frequent-pattern tree approach," Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004.
[21] T. P. Hong and C. Y. Wang, "Maintenance of association rules using pre-large itemsets," Intelligent Databases: Technologies and Applications, pp. 44-60, 2006.
[22] T. P. Hong and C. Y. Wang, "An efficient and effective association-rule maintenance algorithm for record modification," Expert Systems with Applications, vol. 37, pp. 618-626, 2010.
[23] T. P. Hong, C. S. Kuo, and S. C. Chi, "A data mining algorithm for transaction data with quantitative values," The 7th National Conference on Fuzzy Theory and Its Applications, pp. 874-878, 1999.
[24] T. P. Hong, C. Y. Wang, and Y. H. Tao, "A new incremental data mining algorithm using pre-large itemsets," Intelligent Data Analysis, vol. 5, pp. 111-129, 2001.
[25] T. P. Hong, C. S. Kuo, and S. L. Wang, "A fuzzy aprioritid mining algorithm with reduced computational time," Applied Soft Computing, vol. 5, pp. 1-10, 2004.
[26] T. P. Hong, C. W. Lin, and Y. L. Wu, "An efficient fufp-tree maintenance algorithm for record modification," International Journal of Innovative Computing, Information and Control, vol. 4, pp. 2875-2887, 2008.
[27] T. P. Hong, C. W. Lin, and Y. L. Wu, "Incrementally fast updated frequent pattern trees," Expert Systems with Applications, vol. 34, pp. 2424-2435, 2008.
[28] T. P. Hong, C. W. Lin, and Y. L. Wu, "Maintenance of fast updated frequent pattern trees for record deletion," Computational Statistics & Data Analysis, vol. 53, pp. 2485-2499, 2009.
[29] K. Hu, Y. Lu, L. Zhou, and C. Shi, "Integrating classification and association rule mining: A concept lattice framework," The 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pp. 443-447, 1999.
[30] A. Kandel, Fuzzy expert systems, 1992.
[31] M. Kaya and R. Alhajj, "Mining multi-cross-level fuzzy weighted association rules," IEEE International Conference Intelligent Systems, pp. 225-230, 2004.
[32] B. Kosko, Fuzzy engineering: Prentice Hall, Englewood Cliffs, NJ, 1997.
[33] C. M. Kuok, A. Fu, and M. H. Wong, "Mining fuzzy association rules in databases," SIGMOD Record, vol. 27, pp. 41-46, 1998.
[34] B. Lent, A. Swami, and J. Widom, "Clustering association rules," The13th International Conference on Data Engineering, pp. 220-231, 1997.
[35] Y. C. Li and C. C. Chang, "A new fp-tree algorithm for mining frequent itemsets," Lecture Notes in Computer Science, vol. 3309, pp. 266-277, 2004.
[36] Y. C. Li, J. S. Yeh, and C. C. Chang, "Efficient algorithms for mining share-frequent itemsets," The 11th World Congress of International Fuzzy Systems Association, pp. 539-543, 2005.
[37] Y. C. Li, J. S. Yeh, and C. C. Chang, "Fast algorithm for mining share-frequent itemsets," The 7th Asia Pacific Web Conference, pp. 417-428, 2005.
[38] Y. C. Li, J. S. Yeh, and C. C. Chang, "Direct candidates generation: A novel algorithm for discovering complete share-frequent itemsets," Lecture Notes in Computer Science, vol. 3614, pp. 551-560, 2005.
[39] F. Liu, Z. Lu, and S. Lu, "Mining association rules using clustering," Intelligent Data Analysis, vol. 5, pp. 309-326, 2001.
[40] J. Liu, Y. Pan, K. Wang, and J. Han, "Mining frequent item sets by opportunistic projection," The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 229-238, 2002.
[41] Y. Liu, W. k. Liao, and A. Choudhary, "A fast high utility itemsets mining algorithm," The 1st International Workshop on Utility-Based Data Mining, pp. 90-99, 2005.
[42] Y. Liu, W. k. Liao, and A. Choudhary, "A two-phase algorithm for fast discovery of high utility itemsets," Lecture Notes in Computer Science, vol. 3518, pp. 689-695, 2005.
[43] J. Luo and S. M. Bridges, "Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection," International Journal of Intelligent Systems, vol. 15, pp. 687-703, 2000.
[44] Microsoft, Example database foodmart of microsoft analysis services. Available: http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.
[45] S. Papadimitriou and S. Mavroudi, "The fuzzy frequent pattern tree," The 9th WSEAS International Conference on Computers, pp. 1-7, 2005.
[46] J. S. Park, M. S. Chen, and P. S. Yu, "An effective hash-based algorithm for mining association rules," SIGMOD Record, vol. 24, pp. 175-186, 1995.
[47] J. S. Park, M. S. Chen, and P. S. Yu, "Using a hash-based method with transaction trimming for mining association rules," IEEE Transactions on Knowledge and Data Engineering, vol. 9, pp. 813-825, 1997.
[48] N. L. Sarda and N. V. Srinivas, "An adaptive algorithm for incremental mining of association rules," The 9th International Workshop on Database and Expert Systems Applications, p. 240, 1998.
[49] W. Shitong, K. F. L. Chung, and S. Hongbin, "Fuzzy taxonomy, quantitative database and mining generalized association rules," Intelligent Data Analysis, vol. 9, pp. 207-217, 2005.
[50] R. Srikant and R. Agrawal, "Mining sequential patterns: Generalizations and performance improvements," The 5th International Conference on Extending Database Technology: Advances in Database Technology, pp. 3-17, 1996.
[51] R. Srikant and R. Agrawal, "Mining quantitative association rules in large relational tables," SIGMOD Record, vol. 25, pp. 1-12, 1996.
[52] Y. Sucahyo and R. Gopalan, "Building a more accurate classifier based on strong frequent patterns," Lecture Notes in Computer Science, vol. 3339, pp. 1036-1042, 2005.
[53] C. Y. Wang, T. P. Hong, and S. S. Tseng, "Maintenance of discovered sequential patterns for record deletion," Intelligent Data Analysis, vol. 6, pp. 399-410, 2002.
[54] H. Yao and H. J. Hamilton, "Mining itemset utilities from transaction databases," Data and Knowledge Engineering, vol. 59, pp. 603-626, 2006.
[55] H. Yao, H. J. Hamilton, and C. J. Butz, "A foundational approach to mining itemset utilities from databases," The 4th SIAM International Conference on Data Mining, pp. 211-225, 2004.
[56] Q. Yong, L. Yong Jie, and X. Qing Song, "An improved algorithm of mining from fp-tree," International Conference on Machine Learning and Cybernetics, pp. 1665-1670, 2004.
[57] J. S. Yue, E. Tsang, D. Yeung, and D. Shi, "Mining fuzzy association rules with weighted items," IEEE International Conference on Systems, Man, and Cybernetics, pp. 1906-1911, 2000.
[58] L. A. Zadeh, "Fuzzy sets," Information and Control, pp. 338-353, 1965.
[59] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, "New algorithms for fast discovery of association rules," Internatioanl Conference on Knowledge Discovery and Data Mining, pp. 283-286, 1997.
[60] Z. Zheng, R. Kohavi, and L. Mason, "Real world performance of association rule algorithms," The International Conference on Knowledge Discovery and Data Mining, pp. 401-406, 2001.