簡易檢索 / 詳目顯示

研究生: 楊恭娟
Yang, Kung-Jiuan
論文名稱: 高效性部份週期樣式探勘演算法之研究
A Study on Efficient Algorithms for Partial Periodic Pattern Mining
指導教授: 陳裕民
Chen, Yuh-Min
共同指導教授: 洪宗貝
Hong, Tzung-Pei
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2013
畢業學年度: 102
語文別: 英文
論文頁數: 110
中文關鍵詞: 資料探勘部分週期樣式探勘事件序列投影技術最大子樣式法上限邊界
外文關鍵詞: Data mining, partial periodic pattern mining, event sequence, projection technique, upper-bound
相關次數: 點閱:103下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,資料探勘技術已被廣泛應用在各種資料的分析應用上,其中週期樣式探勘之相關應用,亦已成為受歡迎的研究議題之一。事實上,在實際生活的例子中,並不容易找到發生時間位置是固定且”完整”的週期樣式,因而衍生出”部份”週期樣式探勘之研究。然而,相較於完整週期樣式探勘,因部分週期樣式探勘允許在一些時間位置中忽略其事件的發生與否,故將會產生更大量的候選樣式集,進而造成須耗費更大量的時間成本來處理這些候選樣式集。因此如何設計出一高效性之演算法之議題便顯得相當重要。除此之外,傳統的部份週期樣式探勘僅以頻率出現的準則來挖掘出有價值之樣式,此方式也許會讓一些具有重要卻出現頻率較低的事件無法通過所設定門檻值因而被忽略。因此,本論文除了發展高效性之部分週期樣式探勘演算法之外,亦進一步提出以事件具有不同權重值與不同門檻值標準之相關議題,以求能從事件序列中挖掘出不同於僅以頻率出現為主要考量之部份週期樣式資訊。
    關於部份週期樣式探勘的議題,本論文先提出一個以投影技術為基礎之PPA (Projection-based Pattern Mining Approach)演算法,能有效地應用於事件序列之部分週期樣式挖掘議題裡。為了有效提升執行效能,本論文提出另一個具有刪除及過濾機制之PRA (Pruning Redundancy Approach) 演算法,以有效降低不必要候選樣式之數量。在實驗評估裡,實驗結果顯示在長週期及低門檻值情況下所提的方法相較於傳統MSA(Max-Subpattern Mining Algorithm)演算法能有70%以上的效能改善率。
    另一方面,由於在傳統部份週期樣式探勘議題裡,假設所有事件之重要性皆為相同,且僅以一最小支持度門檻值作為有效樣式之判定標準。因此本論文分別提出事件具有不同權重值(Weighted Partial Periodic Pattern Mining)或多重門檻值(Partial Periodic Pattern Mining with Multiple Minimum Constraints)考量之研究議題。其中,由於WPPM不具向下封閉性,因此提出一個具二階段式之PWA (Projection-based Weighted Mining Approach)演算法,其第一階段是以每個週期的最大權重值作為週期上限邊界值之高估模型,進而找出所有可能是高權重頻繁部份週期樣式之集合;接著在第二階段裡,執行一次資料掃描以獲得各樣式之實際平均權重值,以找出所需之樣式。另一個具有多重最小門檻限制之議題則是提出一個PAMMS (Projection-based Mining Approach with Multiple Minimum Supports)的演算法,PAMMS會以事件集合中的最小門檻值作為第一階段找出所有可能為高頻繁部份週期樣式之集合,在第二階段則是從這集合裡,獲得每個滿足自身實際支持度門檻之部分週期樣式集合。最後的實驗評估結果顯示在數個資料庫測試上,PWA及PAMMS可兼顧原有的探勘效率並能有效地找出重要的部分週期樣式。

    Data mining techniques have been widely used in a variety of data-analysis applications in recent years. To find useful rules or patterns in a single long-term time series data, the periodic pattern mining has become a very popular research topic. In real-life examples, "partial” periodic pattern mining is more flexible than “full” periodic patterns. The main reason is that the events of some time positions in a pattern can be uncertain. However, since partial periodic pattern mining can ignore the events of some time positions in a period, it has to generate a large number of candidate patterns in mining. Then, how to develop efficient partial periodic pattern mining algorithms for saving time cost is a critical issue. Besides, since most of studies related to partial periodic pattern mining only consider the supports of items in period segments, many useful patterns with low-frequency but high-significance in event sequence data may not be found. Hence, in this dissertation, we propose not only two efficient projection-based mining algorithms but also the two new issues, respectively named weighted partial periodic pattern mining (WPPP) and partial periodic pattern mining with multiple minimum constraints (PPPMM).
    As to the traditional partial periodic pattern mining, the two algorithms, PPA (Projection-based Pattern Mining Approach) and PRA (Pruning Redundancy Approach), were proposed to enhance the execution efficiency in finding partial periodic patterns from a single event sequence with single or multiple events in a time point. Different from the PPA algorithm without any strategies, the PRA algorithm adopts two effective strategies, pruning and filtering, to reduce a large number of candidates in mining. The experimental results on several synthetic and real datasets showed the proposed approaches get up to 70% performance improvement when compared to the traditional MSA (Max-Subpattern Hit Set) algorithm.
    For the issue of WPPP, since the downward-closure property cannot be kept in this problem, an effective upper-bound model, which the maximum weight of all events in a period segment as the upper-bound of any sub-pattern in that segment, is developed to achieve this goal. Based on the model, a two-phase mining approach PWA (Projection-based Weighted Mining Approach) is also presented to complete the WPPP mining tasks. For another issue PPPMM, an efficient two-phase mining approach PAMMS (Projection-based Mining Approach with Multiple Minimum Supports) is proposed to handle this problem. Especially, since the downward-closure property is not kept in the problem of PPPMM, the minimum constraint value of all events in mining is used to avoid information losing. Finally, the experimental results show that the performance of both PWA and PAMMS in terms of pruning effectiveness and execution efficiency on synthetic and real datasets.

    中文摘要 I ABSTRACT III 誌謝 V Content VI List of Tables VIII List of Figures X Chapter 1 Introduction 1 1.1 Motivation 6 1.2 Overview of the Dissertation 8 1.3 Organization of the Dissertation 10 Chapter 2 Background and Related Work 11 2.1 Data Mining 11 2.2 Sequential Pattern Mining 12 2.3 Periodic Pattern Mining 14 2.3.1 Max-Subpattern Hit Set Algorithm 15 2.3.2 PFP-growth algorithm 17 2.4 The Projection-based Algorithm 18 2.5 Weight-based Sequential Pattern Mining 20 Chapter 3 Partial Periodic Pattern Mining for Event Sequences 22 3.1 Introduction 22 3.2 Problem Statement and Definitions 23 3.3 The Projection-based PPA Mining Algorithm 27 3.4 The Pruning Redundancy Algorithm PRA 36 3.4.1 The Pruning Strategy for Unpromising Candidates 36 3.4.2 An Effective Filtering Strategy Based on Relationship of Tuple Events 37 3.4.3 The Proposed Mining Algorithm, PRA 38 3.4.4 An Example of PRA 42 3.5 Experimental Evaluation 48 3.5.1 Experimental Datasets 49 3.5.2 Evaluation of PPA and PRA Algorithms 50 3.5.3 Evaluation of the Number of Candidate Patterns Generated 50 3.5.4 Efficiency Evaluation 52 3.5.5 Evaluation on a Real Oil Dataset 53 3.5.6 Pattern Explanation 55 Chapter 4 Partial Periodic Pattern Mining with Event Weights 57 4.1 Introduction 57 4.2 Problem Statement and Definitions 59 4.3 The Proposed PWA Algorithm 63 4.4 An Example of PWA 64 4.5 Experimental Evaluation 71 4.5.1 Experimental Datasets 72 4.5.2 Efficiency Evaluation 73 4.5.3 Evaluation on a Real Oil Dataset 74 4.5.4 Weighted Partial Periodic Patterns of an Oil Dataset 76 Chapter 5 Partial Periodic Pattern with Multiple Minimum Support Thresholds 79 5.1 Introduction 79 5.2 Problem Statement and Definitions 82 5.3 The Proposed Mining Algorithm - PAMMS 84 5.4 An Example of PAMMS 88 5.5 Experimental Evaluation 94 5.5.1 Experimental Datasets 95 5.5.2 Experiment with Synthetic Dataset 96 5.5.3 Evaluation on PAMMS vs. PFPA 97 5.5.4 Experiment with a Real-life Dataset 98 Chapter 6 Conclusion and Future Work 100 6.1 Conclusions 100 6.2 Future Work 103 Bibliography 105

    [1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” In Proceedings of the 4th International Conference Foundations of Data Organization and Algorithms, pp. 69–84, 1993.
    [2] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules,” In Proceedings of the International Conference on Very Large Data Bases, pp. 487–499, 1994.
    [3] R. Agrawal and R. Srikant, “Mining Sequential Pattern,” In Proceedings of the International Conference on Data Engineering, pp. 3–14, 1995.
    [4] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, Y. K. Lee, and H.J. Choi, “Single-Pass Incremental and Interactive Mining for Weighted Frequent Patterns,” Expert Systems with Applications, Vol. 39, pp. 7976–7994, 2012.
    [5] W. G. Aref, M. G. Elfeky, and A. K. Elmagarmid, “Incremental, Online and Merge Mining of Partial Periodic Patterns in Time Series Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 3, pp. 332–342, 2004.
    [6] C. Berberidis, W. G. Aref, M. Atallah, I. Vlahavas, and A. K. Elmagarmid, “Multiple and Partial Periodicity Mining in Time Series Databases,” In Proceedings of the 15th European Conference on Artificial Intelligence, pp. 370–374, 2002.
    [7] C. Berberidis, I. Vlahavas, W. G. Aref, M. Atallah, and A. K. Elmagarmid, “On the Discovery of Weak Periodicities in Large Time Series,” Lecture Notes in Computer Science, Vol. 2431, pp. 169–186, 2002.
    [8] C. H. Cai, A. W. Fu, C. H. Cheng, and W. W. Kwong, “Mining Association Rules with Weighted Items,” In Proceedings of the International Database Engineering and Applications Symposium, IDEAS 98, pp. 68–77, 1998.
    [9] H. Cao, D. W. Cheung, and N. Mamoulis, “Discovering Partial Periodic Patterns in Discrete Data Sequences,” In Proceedings of the 8th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 653-658, 2004.
    [10] R. S. Chen, Y.C. Hu, “A Novel Method for Discovering Fuzzy Sequential Patterns Using The Simple Fuzzy Partition Method,” Journal of American Society for Information Science and Technology. Vol. 54 (7), pp. 660–670, 2003.
    [11] S. S. Chen, T. C. Huang, Z. M. Lin, “New and Efficient Knowledge Discovery of Partial Periodic Patterns with Multiple Minimum Supports,” The Journal of Systems and Software, Vol. 84, pp. 1638– 1651, 2011.
    [12] H. Cheng, X. Yan and J. Han, “Incspan: Incremental Mining of Sequential Patterns in Large Databases,” In Proceedings of the SIGKDD’04, pp. 527–532, 2004.
    [13] D. A. Chiang, Y.F. Wang, S.L. Lee, C.J. Lin, “Goal-oriented Sequential Pattern for Network Banking Churn Analysis”, Expert Systems with Applications. Vol. 25, No. 3, pp. 293–302, 2003.
    [14] M. G. Elfeky, W. G. Aref, and A. K. Elmagarmid, “Using Convolution to Mine Obscure Periodic Patterns in One Pass,” In Proceedings of the 9th International Conference on Extending Database Technology, pp. 605–620, 2004.
    [15] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast Subsequence Matching in Time-Series Databases,” In Proceedings of the ACM SIGMOD International Conference Management of Data, pp. 419–429, 1994.
    [16] M. N. Garofalakis, R. Rastogi, K. Shim, “Spirit: Sequential Pattern Mining with Regular Expression Constraints,” In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB ‘99), pp. 223–234, 1999.
    [17] J. Han, W. Gong, and Y. Yin, “Mining Segment-Wise Periodic Patterns in Time-Related Databases,” In Proceedings of the 4th International Conference Knowledge Discovery and Data Mining, 1998.
    [18] J. Han, G. Dong, and Y. Yin, “Efficient Mining of Partial Periodic Patterns in Time Series Databases,” In Proceedings of the 15th International Conference Data Engineering, pp. 106–115, 1999.
    [19] J. Han, J. Pei and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. pp. 1–12, 2000.
    [20] J. Han, and M. Kamber, “Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)”, second ed., Morgan Kaufmann, 2006.
    [21] IBM Quest Data Mining Projection, “Quest Synthetic Data Generation Code,” available at (http://www.almaden.ibm.com/cs/quest/syndata.htm), 1996.
    [22] E. Keogh, S. Lonardi, and B.Y. chi’ Chiu, “Finding Surprising Patterns in a Time Series Database in Linear Time and Space”, In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 550–556, 2002.
    [23] C. Kim, J. Lim, R. T. Ng, and K. Shim, “SQUIRE: Sequential Pattern Mining with Quantities,” The Journal of Systems and Software, Vol. 80, No. 10, pp. 1726–1745, 2007.
    [24] H. P. Kriegel, K.M. Borgwardt, P. Kroger, A. Pryakhin, M. Schubert, A. Zimek, “Future Trends in Data Mining,” Data Mining Knowledge Discovery, Vol. 15, No. 1, pp. 87–97, 2007.
    [25] G. C. Lan, T. P. Hong, and V. S. Tseng, “A Projection-Based Approach for Discovering High Average-Utility Itemsets,” Journal of Information Science and Engineering, Vol. 28, No. 1, pp. 193-209, 2012.
    [26] Y. Li, P. Ning, S. Jajodia, “Discovering Calendar-based Temporal Association Rules,” In Proceedings of the 8th International Symposium on Temporal Representation and Reasoning, pp. 111–118, 2001.
    [27] H. F. Li, C. C. Ho, and S. Y. Lee, “Incremental Updates of Closed Frequent Itemsets over Continuous Data Streams,” Expert Systems with Applications, Vol. 36, No. 2, pp. 2451–2458, 2009.
    [28] M. Y. Lin, S. Y. Lee, “Fast Discovery of Sequential Patterns by Memory Indexing”, In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, pp. 150–160, 2002.
    [29] Y. Liu, J. Feng, and J. Yin, “The Incremental Mining of Constrained Cube Gradients,” International Journal of Information Technology & Decision Making, Vol. 6, No. 2, pp. 253-278, 2007.
    [30] K. M. V. Madan Kumar, P. V. S. Srinivas and C. Raghavendra Rao, Sequential Pattern Mining With Multiple Minimum Supports by MS-SPADE, International Journal of Computer Science Issues, Vol. 9(5), No. 1, 2012.
    [31] F. Masseglia, P. Poncelet, and M. Teisseire, “Efficient Mining of Sequential Patterns with Time Constraints: Reducing the Combinations,” Expert Systems with Applications, Vol. 36, No. 2, pp. 2677–2690, 2009.
    [32] B. Özden, S. Ramaswamy, and A. Silberschatz. “Cyclic Association Rules,” In Proceedings of the 14th International Conference Data Engineering, pp. 412–421, 1998.
    [33] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-based Algorithm for Mining Association Rules,” In Proceedings of the ACM SIGMOD, pp. 175–186, 1996.
    [34] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1424–1440, 2004
    [35] H. Pinto, J. Han, J. Pei, and K. Wang, “Multi-dimensional Sequence Pattern Mining”, In Proceedings of the International Conference on Information and Knowledge Management, 2001.
    [36] S. Qiao, T. Li, J. Peng, and J. Qiu, “Parallel Sequential Pattern Mining of Massive Trajectory Data,” International Journal of Computational Intelligence Systems, Vol. 3, No. 3, pp. 343–356, 2010.
    [37] A. Savasere, E. Omiecinski, S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” In Proceedings of the International Conference on Very Large Data Bases, pp. 432–444, 1995.
    [38] R. Srikant, R. Agrawal, “Mining Quantitative Association Rules,” In Proceedings of the ACM SIGMOD, pp. 1–12, 1996.
    [39] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” In Proceedings of the 5th International Conference Extending Database Technology, pp. 3–17, 1996.
    [40] F. Tao, “Weighted Association Rule Mining Using Weighted Support and Significant Framework,” In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 661–666, 2003.
    [41] I. H. Toroslu and M. Kantarcioglu, “Mining Cyclic Patterns,” In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, pp. 83–92, 2001.
    [42] P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K Closed Sequential Patterns,” In Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), pp. 347–354, 2003.
    [43] S. Vijayalakshmi, V. Mohan, and S. Suresh Raja, “Mining Constraint-based Multidimensional Frequent Sequential Pattern in Web Logs,” European Journal of Scientific Research, Vol. 36 No. 3, pp 480–490, 2009.
    [44] M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos, “Identifying Similarities, Periodicities and Bursts for Online Search Queries,” In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 131–142, 2004.
    [45] J. Wang and J. Han, “BIDE: Efficient Mining of Frequent Closed Sequences,” In Proceedings of the 20th International Conference on Data Engineering, pp. 79–90, 2004.
    [46] H. W. Wu and J. T. Lee, “Mining Closed Flexible Patterns in Time-series Databases,” Expert Systems with Applications, Vol. 37, No. 3, pp. 2098–2107, 2010.
    [47] K. J. Yang, T. P. Hong, Y. M. Chen, and G. C. Lan, “Projection-Based Partial Periodic Pattern Mining For Event Sequences,” Expert Systems with Applications, Vol. 40, No. 10, pp. 4232–4240, 2013.
    [48] W. Yang and G. L. Lee, “Efficient Partial Multiple Periodic Patterns Mining without Redundant Rules,” In Proceedings of the IEEE Annual International Computer Software and Applications Conference, Vol. 1, pp. 430–435, 2004.
    [49] U. Yun and J. J. Leggett, “WSpan: Weighted Sequential Pattern Mining in Large Sequence Databases,” In Proceedings of the 3rd International IEEE Conference on Intelligent Systems, 2006.
    [50] U. Yun, “A New Framework for Detecting Weighted Sequential Patterns in Large Sequence Databases,” Knowledge-Based Systems, Vol. 21, pp. 110–122, 2008.
    [51] U. Yun and K. H. Ryu, “Discovering Important Sequential Patterns With Length-Decreasing Weighted Support Constraints,” International Journal of Information Technology & Decision Making, Vol. 9, No. 4, pp. 575–599, 2010.
    [52] U. Yun and K. H. Ryu, “Approximate Weighted Frequent Pattern Mining with/without Noisy Environments,” Knowledge-Based Systems, Vol. 24, pp. 73–82, 2011.
    [53] Z. Zhao, D. Yan, and W. Ng, “Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases,” In Proceedings of the 15th International Conference on Extending Database Technology, pp. 74–85, 2012.

    下載圖示 校內:2017-12-31公開
    校外:2018-12-31公開
    QR CODE