研究生: |
許家維 Hsu, Chia-Wei |
---|---|
論文名稱: |
於資料串流環境以時間遞減模式獲取最新頻繁樣式之研究 Using TDM to Find Recently Frequent Patterns over Online Data Stream |
指導教授: |
徐立群
Shu, Lih-Chyun |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 會計學系 Department of Accountancy |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 87 |
中文關鍵詞: | 頻繁樣式探勘 、資料串流 、時間遞減模式 、平滑式檢索框架 |
外文關鍵詞: | time decaying model, sliding window, frequent pattern mining, data stream |
相關次數: | 點閱:129 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料串流具有連續性及時間敏感性的資料型態,資料所包含的資訊會隨著時間經過而逐漸產生不同重要性的變化,當使用者對資料串流期待從中探勘到頻繁樣式,可以提供做為重要決策的依據,但資料串流的特性,會讓使用者對於應用系統回覆頻繁樣式時更加要求達到及時性,所以在許多資料串流應用方面,使用者往往對於近期的資料串流所包含的資訊有著較為濃厚的興趣。本研究因應資料串流處理特性及困難之處,提出了以平滑檢索框架進行資料串流探勘的方法,並且利用遞減因子將新進資料串流與歷史資料兩者包含的模式資訊做為區分,此方法稱為「時間遞減模式」 (TDM)。本文將事先給予平滑檢索框架大小、最小支持度和最大許可誤差的參數設定,要求能符合一次且完全對新進資料串流完整檢視的條件下,計算出合適的遞減因子,能幫助降低已過時交易模式資訊的支持度,即讓已過時、無代表性的交易模式喪失對新進資料串流的影響力,最後獲取資料串流上最新的頻繁樣式;同時,定期的修整作業,可有效刪除資料中存在的過時且不具頻繁性的交易模式,降低維護資料串流時電腦資源的耗用,也提高了TDM執行效率及面對不同型態資料串流的可延伸性;最後的實驗及分析比較結果,證實TDM比起estDec演算法還具有較高的執行效率及較優秀的可延伸性。
Data stream is a continuous and time-sensitive data type, thus, the importance of information contained in such data type gradually changes over time. While users expect to find frequent patterns from data stream for decision-making, but due to the characteristics of data stream applications, they also require these patterns to respond in a much more efficient manner. Therefore, in many data stream applications, people tend to focus on information contained in the most recent streams. In response to the characteristics and difficulties of data stream processing, this paper propose a method (Time decaying model, TDM) for mining frequent patterns in a sliding window of data streams. To differentiate the patterns of recently generated data streams from those historical data streams, a time decay factor is applied. Parameter settings for the size of sliding window and minimum & maximum support error are given in advance to calculate an appropriate decayed factor. With pre-defined condition of only allowing one complete scan of the streams, the decayed factor is used to reduce influences from out-of-date transactions on newly generated data streams in order to obtain the most recent frequent patterns. Meanwhile, periodical pruning operations will delete obsolete and infrequent items as data stream flows, this procedure not only saves the consumption of computer resources, but also improve scalability and efficiency of data stream mining. Lastly, our result shows that the proposed method is more efficient and scalable than estDec algorithms.
1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, 207-216, Washington, D.C., 1993.
2. R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad. A Tree Projection Algorithm for Generation of Frequent Item Sets. Journal of Parallel and Distributed Computing, 61(3), 350-371, 2001.
3. S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proc. of the 1997 ACM-SIGMOD Conf. on Management of Data, 255-264, 1997.
4. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 2002 ACM Symp. Principles of Database Systems (POD’s 02), 1-16, Madison, WI, 2002.
5. M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In Proc. of 35th ACM Symposium on Theory of Computing, 30-39, San Diego, CA, 2003.
6. H. Chang and W.S. Lee. Finding recent frequent itemsets adaptively over online data streams. In Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’03), 487-492, 2003.
7. Y. Chi, H. Wang, P.S. Yu, and R.R. Muntz. Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. IEEE Int. Conf. on Data Mining, 59-66, 2004.
8. J. Cheng, Y. Ke, and W. Ng. A survey on algorithms for mining frequent itemsets over data streams. Knowledge and Information Systems, 16(1), 1-27, 2008.
9. W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus. Knowledge discovery in databases: An overview. AI Magazine, 13(3):57-70, 1992.
10. F.H. Grupe and M.M. Owrang. DATA BASE MINING discovering new knowledge and competitive advantage. Information Systems Management, 12(4):26-31, 1995.
11. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. IEEE Symposium on Foundations of Computer Science (FOCS’00), 359-366, Redondo Beach, CA, 2000.
12. M.N. Garofalakis, J. Gehrke, and R. Rastogi. Querying and mining data streams: You only get one look (tutorial). In Proc. ACM SIGMOD Intl. Conf. on Management of Data, 635-635, Madison, Wisconsin, 2002.
13. B. Goethals, and M.J. Zaki. Advances in frequent itemset mining implementations: report on FIMI'03. ACM SIGKDD Explorations Newsletter, 6(1):109-117, 2004.
14. M.M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review. ACM SIGMOD Record, 34(2):18-26, 2005.
15. C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu. Mining Frequent Patterns in Data Streams at Multiple Time Granularities. DATA MINING: Concepts and Techniques,191-210, 2006.
16. J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53-87, 2004.
17. J. Han and M. Kamber. DATA MINING: Concepts and Techniques. Morgan Kaufmann, San Francisco, CA, 2006.
18. N. Jiang, and L. Gruenwald. Research issues in data stream association rule mining. ACM SIGMOD Record, 35(1):14-19, 2006.
19. H. Li, S. Lee, and M. Shan. An efficient algorithm for mining frequent itemsets over the entire history of data streams. Int. Workshop on Knowledge Discovery in Data Streams, 2004.
20. G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’02), 346-357, Hong Kong, China, 2002.
21.K. Patroumpas and T. Sellis. Window Specification over Data Streams. In International Conference on Semantics of a Networked World (ICSNW), 2006.
22. J.X. Yu, Z. Chong, H. Lu, Z. Zhang, and A. Zhou. A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176(14), 1986, 2006.
23. Y. Zhu, and D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proc. Int. Conf. on Very Large Data Bases, 358-369 , 2002
24. 劉泓郁:《建構一個以模糊關聯法則為基礎之產品開發系統》。桃園:私立元智大學‧工業工程與管理學系碩士論文,2003。
25. 梁定澎:《決策支援系統與企業智慧》。智勝文化,台北,2002。