| 研究生: |
李佳樺 Li, Chia-Hua |
|---|---|
| 論文名稱: |
高效率之高效益量化項目集探勘演算法 Efficient Algorithms for Mining High Utility Quantitative Itemsets |
| 指導教授: |
謝孫源
Hsieh, Sun-Yuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 英文 |
| 論文頁數: | 66 |
| 中文關鍵詞: | 效益探勘 、量化項目集探勘 、高效益項目集探勘 、高效益量化項目集探勘 |
| 外文關鍵詞: | utility pattern mining, quantitative itemset mining, high utility itemset mining, high utility quantitative itemset mining |
| 相關次數: | 點閱:111 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
高效益量化項目集探勘(High Utility Quantitative Itemset Mining)為一門新興的研究議題,其考慮項目的效益屬性(例如:數量、重要度等),從資料庫中找出高效益且帶有數量資訊的項目集。在商業應用中,該技術可從顧客的交易紀錄中,找出「購買M到N個A商品的顧客也會同時購買P到Q個B商品」這種高獲利的消費行為。雖然此技術能廣泛地應用於許多領域中,然而,現今方法儲存結構牽涉龐大計算量,且往往在探勘過程中產生過多候選項目集,因而導致探勘工作效率不佳、使用者不易分析結果等問題。有鑑於此,本論文提出一個名為HUQI-Miner (High Utility Quantitative Itemsets Miner)的演算法,能夠更有效率地從資料庫中挖掘高效益量化項目集(High Utility Quantitative Itemset,簡稱HUQI)。為了提供使用者精簡的探勘結果並提升探勘工作之效能,本論文率先提出兩種新型的HUQI,分別稱為精簡型及彙整型HUQI。透過所提出的推論方法,精簡型HUQI可推導出符合某些條件的HUQI,而彙整型HUQI則提供使用者效益值較高的HUQI之彙整。本論文利用真實及模擬資料驗證所提方法之執行效率。實驗結果顯示所提出之新創方法所挖掘的型樣不僅精簡又有意義,且其執行速度及記憶體使用量大幅優於目前最佳的演算法。
High utility quantitative itemsets (abbreviated as HUQIs) mining is a novel research topic in data mining field, which considers the concept of utility like quantity and profit to find high utility itemsets carrying information about quantity. In market analysis, it could reveal to decision-makers that shopping behavior could bring high profit to the company. For example, the customers who purchase M to N units of a product A are likely to purchase P to Q units of a product B. However, existing algorithms involve high computational cost and traditional HUQI mining may produce too many itemsets during the mining process, causing performance problems and making results hard to be utilized by users. In view of this, this thesis proposes a novel algorithm named HUQI-Miner (High Utility Quantitative Itemsets Miner) for efficiently mining HUQIs in databases. Moreover, to efficiently present users concise mining results, this thesis proposes two new types of HUQIs, named concise HUQIs and summarized HUQIs, respectively. Concise HUQIs can be used to infer the HUQIs satisfying certain constrains, while summarized HUQIs provide users a summarization about certain HUQIs. Experimental results on both real and synthetic datasets show that the numbers of concise and summarized HUQIs can be several orders of magnitudes smaller than that of HUQIs. Moreover, HUQI-Miner outperforms the state-of-the-art algorithms in terms of both execution time and memory usage.
[1] R. C. Agarwal, C. C. Aggarwal and V. V. V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Itemsets,” Journal of Parallel and Distributed Computing, 61(3):350-371, 2001.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” in Proc. of the Int’l Conf. on Very Large Data Bases, pp. 487-499, 1994.
[3] R. Agrawal, T. Imielinski and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pp. 207-216, 1993.
[4] C. F. Ahmed, S. K. Tanbeer, B. Jeong and H. Choi, “Interactive Mining of High Utility Patterns over Data Streams,” Expert Systems with Applications, 39(15):11979-11991, 2012.
[5] C. F. Ahmed, S. K. Tanbeer and B. Jeong, “A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases,” ETRI Journal, 32(5):676-686, 2010.
[6] C. F. Ahmed, S. K. Tanbeer, B. Jeong and Y. Lee, “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases,” IEEE Transactions on Knowledge and Data Engineering, 21(12):1708-1721, 2009.
[7] Y. Aumann and Y. Lindell, “A Statistical Theory for Quantitative Association Rules,” Journal of Intelligent Information Systems, 20(3):255-283, 2003.
[8] R. Chan, Q. Yang and Y. Shen, “Mining High Utility Itemsets,” in Proc. of the IEEE Int’l Conf. on Data Mining, pp. 19, 2003.
[9] S. Chen and T. C. Huang, “An Efficient Model for Mining Precise Quantitative Association Rules with Multiple Minimum Supports,” Int’l Journal of Innovative Computing, Information and Control, 9(1):207-222, 2013.
[10] C. Creighton and S. Hanash, “Mining Gene Expression Databases for Association Rules,” Bioinformatics, 19(1):79-86, 2003.
[11] A. Erwin, R. P. Gopalan and N. R. Achuthan, “Efficient Mining of High Utility Itemsets from Large Datasets,” in Proc. of Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 554-561, 2008.
[12] A. Erwin, R. P. Gopalan and N. R. Achuthan, “A Bottom-Up Projection Based Algorithm for Mining High Utility Itemsets,” in Proc. of the Int’l Conf. on Artificial Intelligence and Data Mining, pp. 3-10, 2007.
[13] A. Erwin, R. P. Gopalan and N. R. Achuthan, “CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach,” in Proc. of the Int’l Conf. on Instruction & Technology, pp. 71-76, 2007.
[14] P. Fournier-Viger, C. W. Wu, S. Zida and Vincent S. Tseng, “FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning,” in Proc. of the Int’l Symposium on Methodologies for Intelligent Systems, pp. 83-92, 2014.
[15] P. Fournier-Viger, C. W. Wu and Vincent S. Tseng, “Novel Concise Representations of High Utility Itemsets Using Generator Patterns,” Advanced Data Mining and Applications, Lecture Notes in Computer Science, 8933:30-43, 2014.
[16] E. Georgii, L. Richter, U. Rückert and S. Kramer, “Analyzing Microarray Data Using Quantitative Association Rules,” Bioinformatics, 21(2):123-129, 2005.
[17] B. Goethals, “Survey on Frequent Pattern Mining,” Manuscript, 2003.
[18] R. Gupta, G. Fang, B. Field, M. Steinbach and V. Kumar, “Quantitative Evaluation of Approximate Frequent Pattern Mining Algorithms,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 301-309, 2008.
[19] A. Gyenesei, “A Fuzzy Approach for Mining Quantitative Association Rules,” Turku Centre for Computer Science, TUCS Technical Reports, no. 336, 2000.
[20] J. Han, J. Pei and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pp. 1-12, 2000.
[21] T. Hong, G. Lan and H. Huang, “Mining High Quantitative Utility Sequential Patterns,” in Proc. of the Conference on Information Technology and Applications in Outlying Islands, pp. 26-31, 2013.
[22] P. Y. Hsu, Y. L. Chen and C. C. Ling, “Algorithms for Mining Association Rules in Bag Database,” Information Sciences, 166(1-4):31-47, 2004.
[23] J. Hu and A. Mojsilovic, “High-utility Pattern Mining: A Method for Discovery of High-utility Item Sets,” Pattern Recognition, 40(11):3317-3324, 2007.
[24] Y. Ke, J. Cheng and W. Ng, “Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 227-236, 2006.
[25] J. Koh and I. Chiu, “An Efficient Approach for Mining Top-k High Utility Specialized Query Expansions on Social Tagging Systems,” in Proc. of the Int’l Conf. on Database Systems for Advanced Applications, pp. 361-376, 2014.
[26] S. Krishnamoorthy, “Pruning Strategies for Mining High Utility Itemsets,” Expert Systems with Applications, 42:2371-2381, 2015.
[27] G. Lan, T. Hong and V. S. Tseng, “An Efficient Projection-based Indexing Approach for Mining High Utility Itemsets,” Knowledge and Information System, 38(1):85-107, 2014.
[28] G. Lan, T. Hong and Y. Chao, “Multi-criteria Utility Mining Using Minimum Constraints,” in Proc. of the Int’l Conf. on Industrial, Engineering and Other Applica-tions of Applied Intelligent Systems, pp. 42-47, 2014.
[29] G. Lan, T. Hong, H. Huang and S. Pan, “Mining High Fuzzy Utility Sequential Patterns,” in Proc. of the Int’l Conf. on Fuzzy Theory and Its Application, pp. 420-424, 2013.
[30] G. Lan, T. Hong, V. S. Tseng and S. Wang, “An Improved Approach for Sequential Utility Pattern Mining,” in Proc. of the IEEE Int’l Conf. on Granular Computing, pp. 226-230, 2012.
[31] B. Le, H. Nguyen, T. A. Cao and B. Vo, “A Novel Algorithm for Mining High Utility Itemsets,” in Proc. of the Asian Conf. on Intelligent Information and Data-base Systems , pp. 13-17, 2009.
[32] W. Lee, S. J. Stolfo and K. W. Mok, “Mining Audit Data to Build Intrusion Detection Models,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 66-72, 1998.
[33] B. Lent, A. N. Swami and J. Widom, “Clustering Association Rules,” in Proc. of the Int’l Conf. on Data Engineering, pp. 220-231, 1997.
[34] Y. Li, J. Yeh and C. Chang, “Isolated Items Discarding Strategy for Discovering High Utility Itemsets,” Data & Knowledge Engineering, 34(1):198-217, 2008.
[35] C. Lin, T. Hong, G. Lan, J. Wong and W. Lin, “Incrementally Mining High Utility Patterns based on Pre-large Concept,” Applied Intelligence, 40(2):343-357, 2014.
[36] C. Lin, T. Hong, J. Wong and G. Lan, “Privacy Preserving High Utility Mining Based on Genetic Algorithms,” in Proc. of the IEEE Int’l Conf. on Granular Computing, pp. 191-195, 2013.
[37] C. Lin, T. Hong and W. Lu, “An Effective Tree Structure for Mining High Utility Itemsets,” Expert Systems with Applications, 38(6):7419-7424, 2011.
[38] X. Lin, Q. Zhu, F. Li, Z. Geng and S. Shi, “A Share Strategy for Utility Frequent Patterns Mining,” in Proc. of the Int’l Conf. on Fuzzy Systems and Knowledge Discovery, pp. 1428-1432, 2010.
[39] J. Liu, K. Wang and B. Fung, “Direct Discovery of High Utility Itemsets without Candidate Generation,” in Proc.of the IEEE Int’l Conf. on Data Mining, pp. 984-989, 2012.
[40] M. Liu and J. Qu, “Mining High Utility Itemsets without Candidate Generation,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Management, pp. 55-64, 2012.
[41] Y. Liu, J. Li, W. Liao, A. N. Choudhary and Y. Shi, “High Utility Itemsets Mining,” Int’l Journal of Information Technology and Decision Making, 9(6):905-934, 2010.
[42] Y. Liu, W. Liao and A. Choudhary, “A Fast High Utility Itemsets Mining Algorithm,” in Proc. of the Int’l Workshop on Utility-Based Data Mining, pp. 90-99, 2005.
[43] S. K. Madria, S. S. Bhowmick, W. K. Ng and E. Lim, “Research Issues in Web Data Mining,” in Proc. of the Int’l Conf. on Data Warehousing and Knowledge Discovery, pp. 303-312, 1999.
[44] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang and D. Yang, “H-mine: Hyper-Structure Mining of Frequent Patterns Large Databases,” in Proc. of Int’l Conf. Data Mining, pp. 441-448, 2001.
[45] U. Ruckert, L. Richter and S. Kramer, “Quantitative Association Rules Based on Half-Spaces: An optimization approach,” in Proc. of the IEEE Int’l Conf. on Data Mining, pp. 507-510, 2004.
[46] A. Salleb-aouissi, C. Vrain, C. Nortet, X. Kong and D. Cassard, “QuantMiner for Mining Quantitative Association Rules,” Journal of Machine Learning Research, 14(1):3153-3157, 2013.
[47] A. Salleb-aouissi, C. Vrain and C. Nortet, “QuantMiner : A Genetic Algorithm for Mining Quantitative Association Rules,” in Int’l Joint Conf. on Artificial Intelligence, pp. 1035-1040, 2007.
[48] A. Savasere, E. Omiecinski and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” in Proc. Int’l Conf. Very Large Data Bases, pp. 432-444, 1995.
[49] B. Shie, H. Hsiao and V. S. Tseng, “Efficient Algorithms for Discovering High Utility User Behavior Patterns in Mobile Commerce Environments,” Knowledge and Information Systems, 37(2):363-387, 2013.
[50] B. Shie, P. S. Yu and V. S. Tseng, “Efficient Algorithms for Mining Maximal High Utility Itemsets from Data Streams with Different Models,” Expert Sys-tems with Applications, 39(17):12947-12960, 2012.
[51] B. Shie, V. S. Tseng and P. S. Yu, “Online Mining of Temporal Maximal Utility Itemsets from Data Streams,” in Proc. of the Annual ACM Symposium on Applied Computing, pp. 1622-1626, 2010.
[52] W. Song, Y. Liu and J. Li, “Mining High Utility Itemsets by Dynamically Pruning the Tree Structure,” Applied Intelligence, 40(1):29-43, 2014.
[53] W. Song, Y. Liu and J. Li, “Vertical Mining for High Utility Itemsets,” in Proc. of the IEEE Int’l Conf. on Granular Computing, pp. 429-434, 2012.
[54] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pp. 1-12, 1996.
[55] P. S. M. Tsai and C. Chen, “Mining Quantitative Association Rules in a Large Database of Sales Transactions,” Journal of Information Science and Engineering, pp. 667-681, 2001.
[56] V. S. Tseng, B. Shie, C. Wu and P. S. Yu, “Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases,” IEEE Transactions on Knowledge and Data Engineering, 25(8):1772-1786, 2013.
[57] V. S. Tseng, C. Wu, B. Shie and P. S. Yu, “UP-Growth: An Efficient Algorithm for High Utility Itemset Mining,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 253-262, 2010.
[58] B. Vo, H. Nguyen, T. B. Ho and B. Le, “Parallel Method for Mining High utility Itemsets from Vertically Partitioned Distributed Databases,” in Proc. of the Int’l Conf. on Knowledge-based and Intelligent Information and Engineering Systems, pp. 251-260, 2009.
[59] C. Wang, S. Cheng and Y. Huang, “A Fuzzy Approach for Mining High Utility Quantitative Itemsets,” in Proc. of the IEEE Int’l Conf. on Fuzzy System, pp. 1909-1913, 2009.
[60] J. Wang, Y. Liu, L. Zhou, Y. Shi and X. Zhu, “Pushing Frequency Constraint to Utility Mining Model,” in Proc. of the Int’l Conf. on Computational Science, pp. 685-692, 2007.
[61] W. Wang, J. Yang and Philip S. Yu, “Efficient Mining of Weighted Association Rules (WAR),” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 270-274, 2000.
[62] C. Wu,Y. Lin, P. S. Yu and V. S. Tseng, “Mining High Utility Episodes in Complex Event Sequences,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 536-544, 2013.
[63] C. Wu, B. E. Shie, V. S. Tseng and P. S. Yu, “Mining Top-k High Utility Itemsets,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 78-86, 2012.
[64] C. Wu, P. Fournier-Viger, P. S. Yu and V. S. Tseng, “Efficient Mining of a Concise and Lossless Representation of High Utility Itemsets,” in Proc. of the IEEE Int’l Conf. on Data Mining, pp. 824-833, 2011.
[65] F. Wu, Y. Hu and K. Pai, “Linked List Based High Utility Itemsets Mining Using Pattern Growth Method,” in Proc. of the Int’l Conf. on Data Mining, pp. 25-34, 2009
[66] H. Yao and H. J. Hamilton, “Mining Itemset Utilities from Transaction Databases,” Data & Knowledge Engineering, 59(3):603-626, 2006.
[67] Y. Y. Yao and Zhong, “An Analysis of Quantitative Measures Associated with Rules,” in Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 479-488, 1999.
[68] J. Yeh and G. Liao, “Evaluation of Item Discretization Methods for Quantitative Association Rule Mining,” Journal of Quality, 18(6):1-29, 2011.
[69] J. Yeh, C. Chang and Y. Wang, “Efficient Algorithms for Incremental Utility Mining,” in Proc. of the Int’l Conf. on Ubiquitous Information Management and Communication, pp. 212-217, 2008.
[70] J. Yeh, Y. Li and C. Chang, “Two-Phase Algorithms for a Novel Utility-Frequent Mining Model,” in Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining Workshops, pp. 433-444, 2007.
[71] S. Yen and Y. Lee, “Mining High Utility Quantitative Association Rules,” in Proc. of the Int’l Conf. on Data Warehousing and Knowledge Discovery, pp. 283-292, 2007.
[72] J. Yin, Z. Zheng, L. Cao, Y. Song and W. Wei, “Efficiently Mining Top-K High Utility Sequential Patterns,” in Proc. of the IEEE Int’l Conf. on Data Mining, pp. 1259-1264, 2013.
[73] J. Yin, Z. Zheng and L. Cao, “USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns,” in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 660-668, 2012.
[74] M. J. Zaki, “Scalable Algorithms for Association Mining,” IEEE Transactions on Knowledge and Data Engineering, 12(3):372-390, 2000.
[75] Frequent Itemset Mining Implementtations Repository. Available at (http://fimi.cs.helsinki.fi/)
[76] IBM Quest Data Mining Project, Quest Synthetic Data Generation Code. Available at (http://www.almaden.ibm.com/cs/quest/syndata.html)
[77] Microsoft Corporation. Example Database FoodMart2000 of Microsoft SQL Server Analysis Services. Available at (https://technet.microsoft.com/en-us/library/aa217032(v=sql.80).aspx)
[78] NU-MineBench Version 2.0 Dataset and Technical Report. Available at (http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html)