簡易檢索 / 詳目顯示

研究生: 戴群達
Dai, Chun-Ta
論文名稱: 探勘多期資料下樣式變化之研究
Mining changes of patterns from multi-period datasets
指導教授: 王逸琳
Wang, I-Lin
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 66
中文關鍵詞: 樣式發現變化探勘候選樣式森林時間性資料探勘演算法資料結構
外文關鍵詞: pattern discovery, algorithm, candidate-pattern forest, temporal data mining, change mining, data structure
相關次數: 點閱:63下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資料探勘可以找出隱藏在資料下的知識,其中包括樣式發現;一般而言,樣式被認為是一類很重要的知識,因為它代表在大量資料中某些有意義的事件,同時也可以用來產生關聯法則。因此,瞭解樣式的變化是一個重要的研究議題。變化探勘即是為了瞭解資料的變化情況,期望更進一步地協助探勘者制訂決策;然而過去關於樣式變化的研究,皆是採用探勘頻繁樣式的處理程序,也就是在第一階段先刪除不夠頻繁的樣式,只保留符合使用者所設定門檻值下的樣式。如此做法不但會遺失部份資訊,亦不甚合理,因為變化探勘的目標是尋找有「變化」的樣式,而不只是變化的「頻繁」樣式;另外,大部份的研究皆只是比對兩期的資料,但若能觀察愈多期的資料,則所得到的結論也將會愈可靠;若能處理多期的資料,也就可以進行時間性資料探勘,譬如分析每日、每月或每季資料的變化情況等以提供管理者有用的資料變化資訊,方便其掌握市場的變動趨勢或開拓新的客源。由於相關議題在文獻中並無太多著墨,因此我們將以探勘多期時間下樣式的變化為主要的研究議題,期能提供不同於探勘頻繁樣式之外的另一個研究方向。

    本研究提出一個新的演算法來探勘所有樣式;並直接採用以樣式的成長幅度作為篩選的標準,以篩選出所有符合變化趨勢的樣式。為了改善探勘效率,本研究發展一個候選樣式森林的特殊資料結構及演算法以探勘多期資料的變化;並為該特殊的資料結構設計一套彈性的的機制以增加發現樣式變化的可能性,以及可以避免太多不具參考價值的樣式。為了測試本研究所提出之演算法效率,我們另外設計並實作了一套以探勘頻繁樣式為基礎之演算法來探勘多期時間下樣式的變化。進行一連串的測試結果之後,我們發現本研究所提出之資料結構與演算法的確可以非常地有效率探勘多期時間下樣式的變化,而且探勘愈多期的資料集合愈能突顯本研究所提出的演算法之優越性。此外,我們亦針對各期資料集合間的相似度,以及樣式的變化率門檻值對不同的變化探勘演算法所造成的影響加以探討。

    Pattern discovery is a common task in data mining. Given the transaction datasets of multi periods, we are concerned with a temporal data mining problem that detects any pattern of interested changes that have been consistent from some period to the last period. Discovering such changes from the transaction database of multi periods will help the managers to detect the tendency of customer needs so that potential customers may be identified. To the best of our knowledge, previous studies in change mining only focus on datasets of two datasets, although the tendency of changes are more meaningful for datasets of multi periods in real-world applications. Conventional data mining techniques that seek frequent patterns could be modified for mining changes from datasets of multi periods, but such
    approaches would require many pairwise comparisons between datasets of consecutive periods and thus not so efficient.

    In this thesis, we propose an algorithm called MCP for mining changes from multi-period datasets. MCP is based on a novel data structure modified from the popular frequent-pattern tree(FP-tree), and seeks the target patterns in a very efficient way. In particular, starting from the last two periods, our algorithm first constructs a candidate-pattern forest (CP-forest) to store those patterns of qualified changes, and then iteratively updates the CP-forest using the dataset of each period. The CP-forest is
    carefully designed such that useless information will not be stored and qualified patterns can be easily identified by tree traversals.

    Computational experiments have been conducted to compare MCP and another algorithm called modiFP which is modified from the popular FP-growth algorithm for mining the changes of patterns from multi-period datasets. Several parameters have be used to evaluate the performance of MCP and modiFP, and the results show that MCP is much more efficient than modiFP, especially when the number of periods increases or when the datasets of consecutive periods share more similarities.

    摘要i Abstract iii 誌謝 v 表目錄 viii 圖目錄 x 第一章緒論 1 11 研究動機 2 12 研究目的 3 13 論文架構 3 第二章文獻探討 5 21 探勘頻繁樣式與關聯法則 5 211 探勘頻繁樣式與關聯法則相關符號表示 5 212 探勘頻繁樣式與關聯法則演算法 6 22 時間性資料探勘 8 23 變化探勘 9 231 Change detection 9 232 Emerging pattern 11 233 Exception rules 13 234 Same attribute and same cut 3 24 小結 16 第三章探勘樣式的變化 18 31 問題定義 18 32 相關議題 21 33 MCP演算法 23 331 建立原始搜尋樹架構 24 332 挑選候選樣式及建立樣式森林 28 333 探勘多期變化樣式 32 34 時間容忍 34 35 範例說明 36 36 小結 44 第四章資料測試與分析 45 41 資料產生器說明 45 42 modiFP 演算法 46 43 測試結果 47 431 驗證期數的影響 47 432 驗證相似度的影響 49 433 驗證支持度的影響 51 434 驗證變化率的影響 53 435 驗證大資料檔 56 44 小結 59 第五章結論與建議 60 51 結論 60 52 後續研究建議 61 參考文獻 63

    Agrawal, R. and Srikant, R. Fast algorithm for mining association rules in large databases. In Proceedings of 20th International Conference on Very Large Data Bases,
    487-499, 1994.
    Agrawal, R. and Srikant, R. Mining sequential patterns. Proceedings of the Eleventh International Conference on Data Engineering, 3-14, 1995.
    Ale, J. K. and Rossi, G. H. An approach to discovering temporal association rules. In Proceedings of the 2000 ACM Symposium on Applied Computing(SAC'00), 294-300, 2000.
    Antunes, C. and Oliveira, A. Temporal data mining: an overview. Workshop on Temporal Data Mining(KDD2001), 2001.
    Bailey, J., Manoukian, T. and Ramamohanarao, K. Fast algorithms for mining emerging patterns. Lecture Notes in Computer Science, 2431, 39-50, 2002.
    Bay, S. D. and Pazzani, M. J. Detecting group diRerences: Mining contrast sets. Data Mining and Knowledge Discovery, 5, 213-246, 2001.
    Bayardo, R. J. E±ciently mining long patterns from databases. In Proceedings of the ACM-SIGMOD international conference on management of data, 85-93, 1998.
    Berry, M. J. and LinoR, G. Data mining techniques: For marketing, sales, and customer support. John Wiley and Sons, 1997.
    Brin, S., Motwani, R., Ullman, J. D. and Tsur, S. Dynamic itemset counting and implication rules for market basket data. SIGMOD Record, 26(2), 255-276, 1997.
    Ceglar, A. and Roddick, J. F. Association mining. ACM Computing Surveys, 38(2), 1-42, 2006.
    Dong, G. and Li, J. Mining border descriptions of emerging patterns from dataset paris. Knowledge and Information Systems, 8, 178-202, 2004.
    Dong, G., Zhang, X., Wong, L. and Li, J. Caep : Classi¯cation by aggregating emerging patterns. Tokyo, Japan: In:Proceedings of teh 2nd international conference on
    discovery science, 1999.
    Grahne, G. and Zhu, J. Fast algorithms for frequent itemset mining using fp-trees. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1347-1362,
    2005.
    Han, J., Pei, Y., Yin, Y. and Mao, R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Dis-
    covery, 8(2004), 2004.
    Kantardzic, M. Data mining:concepts, models, methods, and algorithms. WILEY-INTERSCIENCE, 2003.
    Keogh, E., Chakrabarti, K., Pazzani, M. and S., M. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information
    Systems, 3(3), 263-286, 2001.
    Kim, J. K., Song, H. S., Kim, T. S. and Kim, H. K. Detecting the change of customer behavior based on decision tree analysis. Expert Systems, 22(4), 193-205, 2005.
    Li, T., Zhu, S., Wang, X. S. and Jajodia, S. Looking into the seeds of time: Discovering temporal patterns in large transation sets. Information Sciences, 176, 1003-1031,
    2006.
    Li, Y., Ning, P., Wang, X. S. and Jajodia, S. Discovering calendar-based temporal association rules. Data and knowledge Engineering, 44(2), 193-218, 2002.
    Liu, B., Hsu, W., Han, H. S. and Xia, Y. Mining changes for real-life applications. Second International Conference on Data Warehousing and Knowledge Discovery,
    337{346, 2000.
    Liu, B., Hsu, W., Mun, L. and Lee, H. Y. Finding interesting patterns using user expectations. IEEE Transactions on Knowledge and Data Engineering, 11, 817-
    832, 1999.
    Ozden, S., Ramaswamy, S. and A., S. Cyclic association rules. In Proceedings of the 14th International Conference on Data Engineering(ICDE'98), 412-421, 1998.
    Padmanabhan, B. and Tuzhilin, A. Unexpectedness as a measure on interstingness in knowledge discovery. Decision Support Systems, 27, 303-318, 1999.
    Park, J. S., Chen, M. S. and Yu, P. S. Using a hash-based method with transaction trimming and database scan reduction for mining association rules. IEEE
    Transactions on Knowledge and Data Engineering, 9(5), 813-825, 1997.
    Pei, J., Han, J., Chen, Q., Dayal, U. and Hsu, M. C. Pre¯xspan: mining sequential patterns e±ciently by prefix-projected pattern growth. Proceedings of the 17th
    International Conference on Data Enigneering, 215-224, 2001.
    Quinlan, R. C4.5: program for machine learning. Morgan Kaufmann, 1992.
    Roddick, J. F. and Spiliopoulou, M. A survey of temporal kn
    adigms and methods. IEEE Transactions on Knowledge
    14(4), 750-767, 2002.
    Savasere, A., Omiecinski, E. and Navathe, S. An e±cient algorithm for mining association rules in large databases. Zurich, Switzerland: In proceedings of the 21st
    International Conference on Very Large Data Bases, 1995.
    Song, H. S., Kim, J. K. and Kim, S. H. Mining the change of customer behavior in an internet shopping mall. Expert Systems with Application, 21(3), 157-168, 2001.
    Suzuki, E. and Zytkow, J. M. Uni¯ed algorithm for undirected discovery of execption rules. International Journal of Intelligent Systems, 20, 673-691, 2005.
    Tung, A., Lu, H., Han, J. and Feng, L. E±cient mining of intertransaction association rules. IEEE Transactions on Knowldege and Data Engineering, 15(1), 43-56, 2003.
    Webb, G. I., Pazzani, M. and Billsus, D. Machine learning for user modeling. User modeling and User-Adapted Interaction, 11, 19-29, 2001.

    下載圖示 校內:2009-08-14公開
    校外:2009-08-14公開
    QR CODE