簡易檢索 / 詳目顯示

研究生: 許家維
Hsu, Chia-Wei
論文名稱: 於資料串流環境以時間遞減模式獲取最新頻繁樣式之研究
Using TDM to Find Recently Frequent Patterns over Online Data Stream
指導教授: 徐立群
Shu, Lih-Chyun
學位類別: 碩士
Master
系所名稱: 管理學院 - 會計學系
Department of Accountancy
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 87
中文關鍵詞: 頻繁樣式探勘資料串流時間遞減模式平滑式檢索框架
外文關鍵詞: time decaying model, sliding window, frequent pattern mining, data stream
相關次數: 點閱:129下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資料串流具有連續性及時間敏感性的資料型態,資料所包含的資訊會隨著時間經過而逐漸產生不同重要性的變化,當使用者對資料串流期待從中探勘到頻繁樣式,可以提供做為重要決策的依據,但資料串流的特性,會讓使用者對於應用系統回覆頻繁樣式時更加要求達到及時性,所以在許多資料串流應用方面,使用者往往對於近期的資料串流所包含的資訊有著較為濃厚的興趣。本研究因應資料串流處理特性及困難之處,提出了以平滑檢索框架進行資料串流探勘的方法,並且利用遞減因子將新進資料串流與歷史資料兩者包含的模式資訊做為區分,此方法稱為「時間遞減模式」 (TDM)。本文將事先給予平滑檢索框架大小、最小支持度和最大許可誤差的參數設定,要求能符合一次且完全對新進資料串流完整檢視的條件下,計算出合適的遞減因子,能幫助降低已過時交易模式資訊的支持度,即讓已過時、無代表性的交易模式喪失對新進資料串流的影響力,最後獲取資料串流上最新的頻繁樣式;同時,定期的修整作業,可有效刪除資料中存在的過時且不具頻繁性的交易模式,降低維護資料串流時電腦資源的耗用,也提高了TDM執行效率及面對不同型態資料串流的可延伸性;最後的實驗及分析比較結果,證實TDM比起estDec演算法還具有較高的執行效率及較優秀的可延伸性。

    Data stream is a continuous and time-sensitive data type, thus, the importance of information contained in such data type gradually changes over time. While users expect to find frequent patterns from data stream for decision-making, but due to the characteristics of data stream applications, they also require these patterns to respond in a much more efficient manner. Therefore, in many data stream applications, people tend to focus on information contained in the most recent streams. In response to the characteristics and difficulties of data stream processing, this paper propose a method (Time decaying model, TDM) for mining frequent patterns in a sliding window of data streams. To differentiate the patterns of recently generated data streams from those historical data streams, a time decay factor is applied. Parameter settings for the size of sliding window and minimum & maximum support error are given in advance to calculate an appropriate decayed factor. With pre-defined condition of only allowing one complete scan of the streams, the decayed factor is used to reduce influences from out-of-date transactions on newly generated data streams in order to obtain the most recent frequent patterns. Meanwhile, periodical pruning operations will delete obsolete and infrequent items as data stream flows, this procedure not only saves the consumption of computer resources, but also improve scalability and efficiency of data stream mining. Lastly, our result shows that the proposed method is more efficient and scalable than estDec algorithms.

    第一章 緒論..........1 第一節 研究背景...1 第二節 研究目的 ...4 第三節 論文架構 ...9 第二章 文獻探討......10 第一節 關聯性法則介紹.....10 一、關聯性法則條件說明及設定......12 二、辭彙索引式資料樹(Lexicographic tree).......13 三、頻繁樣式資料樹(Frequent pattern tree)......18 四、頻繁樣式資料樹與辭彙索引式資料樹之比較.......24 第二節 資料串流探勘演算法之概述......29 一、資料串流檢視框架......29 二、近似資料串流探勘演算法......32 第三章 演算法設計及執行......39 第一節 時間遞減模式......40 一、平滑式框架之支持度計算......40 二、遞減因子之介紹......42 第二節 近期頻繁樣式結構樹 ......49 一、RFP-tree之建置與更新......50 二、RFP-tree之修整作業......59 三、使用RFP-tree進行近期頻繁樣式探勘......67 第四章 效能分析與實驗結果......71 第一節 實驗環境介紹......71 第二節 時間遞減模式演算法之實作......72 一、不同的資料檢索框架尺寸對TDM執行之影響.......73 二、不同的最小支持度對TDM執行之影響.......75 三、修整RFP-tree及TIL之頻繁性對TDM執行之影響......77 四、不同的資料串流對TDM執行之影響......78 五、時間遞減模式演算法與estDec演算法之比較......80 第五章 結論與未來展望......83 參考文獻......85

    1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, 207-216, Washington, D.C., 1993.
    2. R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad. A Tree Projection Algorithm for Generation of Frequent Item Sets. Journal of Parallel and Distributed Computing, 61(3), 350-371, 2001.
    3. S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proc. of the 1997 ACM-SIGMOD Conf. on Management of Data, 255-264, 1997.
    4. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 2002 ACM Symp. Principles of Database Systems (POD’s 02), 1-16, Madison, WI, 2002.
    5. M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In Proc. of 35th ACM Symposium on Theory of Computing, 30-39, San Diego, CA, 2003.
    6. H. Chang and W.S. Lee. Finding recent frequent itemsets adaptively over online data streams. In Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining (SIGKDD’03), 487-492, 2003.
    7. Y. Chi, H. Wang, P.S. Yu, and R.R. Muntz. Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. IEEE Int. Conf. on Data Mining, 59-66, 2004.
    8. J. Cheng, Y. Ke, and W. Ng. A survey on algorithms for mining frequent itemsets over data streams. Knowledge and Information Systems, 16(1), 1-27, 2008.
    9. W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus. Knowledge discovery in databases: An overview. AI Magazine, 13(3):57-70, 1992.
    10. F.H. Grupe and M.M. Owrang. DATA BASE MINING discovering new knowledge and competitive advantage. Information Systems Management, 12(4):26-31, 1995.
    11. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. IEEE Symposium on Foundations of Computer Science (FOCS’00), 359-366, Redondo Beach, CA, 2000.
    12. M.N. Garofalakis, J. Gehrke, and R. Rastogi. Querying and mining data streams: You only get one look (tutorial). In Proc. ACM SIGMOD Intl. Conf. on Management of Data, 635-635, Madison, Wisconsin, 2002.
    13. B. Goethals, and M.J. Zaki. Advances in frequent itemset mining implementations: report on FIMI'03. ACM SIGKDD Explorations Newsletter, 6(1):109-117, 2004.
    14. M.M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review. ACM SIGMOD Record, 34(2):18-26, 2005.
    15. C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu. Mining Frequent Patterns in Data Streams at Multiple Time Granularities. DATA MINING: Concepts and Techniques,191-210, 2006.
    16. J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53-87, 2004.
    17. J. Han and M. Kamber. DATA MINING: Concepts and Techniques. Morgan Kaufmann, San Francisco, CA, 2006.
    18. N. Jiang, and L. Gruenwald. Research issues in data stream association rule mining. ACM SIGMOD Record, 35(1):14-19, 2006.
    19. H. Li, S. Lee, and M. Shan. An efficient algorithm for mining frequent itemsets over the entire history of data streams. Int. Workshop on Knowledge Discovery in Data Streams, 2004.
    20. G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’02), 346-357, Hong Kong, China, 2002.
    21.K. Patroumpas and T. Sellis. Window Specification over Data Streams. In International Conference on Semantics of a Networked World (ICSNW), 2006.
    22. J.X. Yu, Z. Chong, H. Lu, Z. Zhang, and A. Zhou. A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176(14), 1986, 2006.
    23. Y. Zhu, and D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proc. Int. Conf. on Very Large Data Bases, 358-369 , 2002
    24. 劉泓郁:《建構一個以模糊關聯法則為基礎之產品開發系統》。桃園:私立元智大學‧工業工程與管理學系碩士論文,2003。
    25. 梁定澎:《決策支援系統與企業智慧》。智勝文化,台北,2002。

    下載圖示 校內:2014-07-08公開
    校外:2014-07-08公開
    QR CODE