簡易檢索 / 詳目顯示

研究生: 蔣佩雯
Jiang, Pei-Wen
論文名稱: 於複雜事件序列中有效率探勘頻繁目標情節之研究
Efficient Algorithms for Mining Frequent Target Episodes from Complex Event Sequences
指導教授: 曾新穆
Tseng, Shin-Mu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 101
語文別: 英文
論文頁數: 66
中文關鍵詞: 情節探勘同時發生事件複雜事件序列資料探勘
外文關鍵詞: episode mining, simultaneous events, complex event sequence, data mining
相關次數: 點閱:132下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在事件序列中探勘頻繁情節已經成為資料探勘中一個重要的研究議題。目前大部分的研究中,都只專注於如何在單一事件的序列中探勘頻繁情節。然而對於很多應用,他們需要去分析同一個時間點上有多個事件發生的序列。我們將這樣的序列稱作為複雜事件序列。另外,某些應用只對於目標情節有興趣,目標情節中的定義為在情節中的最後一個事件必須為目標事件。有鑑於此,在本論文當中,我們探討如何在複雜事件序列中有效率探勘頻繁目標情節。為了不要限制情節的長度,我們利用最小發生來計算情節的支持度。為了可以在複雜序列中探勘情節,我們將目前最有效率的演算法PPS修正成PPS+作為基本的方法。並進一步提出了EM-SES演算法來改進PPS+的不足。除此之外,我們還提出了TEM-SES演算法使得在探勘的過程中能夠避免探勘出所有的頻繁情節。經由在模擬資料集以及真實資料集的實驗中,都能夠觀察到我們所提出來的方法有較佳的效能。

    Mining frequent episodes from event sequences is an important topic in data mining fields. Most of existing related researches focused on mining frequent episodes from a single event sequence. However, sequences containing simultaneous events are frequently encountered, and we refer to such sequences as complex event sequences. For some practical applications, users are often interested in target episodes where the last event of an episode is the target event type. In this thesis, we address the problem on mining frequent target episodes from complex event sequences. To avoid restricting the length of episode, the support of an episode is counted by the size of minimal occurrences set. We extend the state-of-the-art algorithm PPS to PPS+ as a base method for mining episodes from complex event sequences, and propose the EM-SES algorithm to overcome the drawback of PPS+. Furthermore, the TEM-SES algorithm is proposed to reduce the cost of mining a complete set of frequent episodes. Experimental evaluation on both synthetic and real datasets demonstrates that our algorithms deliver good performance in terms of efficiency and effectiveness.

    中文摘要 I Abstract II 誌謝 III Content IV List of Tables VI List of Figures VII 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Problem Statement 5 1.4 Contribution 5 1.5 Thesis Organization 7 2. Related Work 8 2.1 Episode Mining using Sliding Windows 8 2.2 Episode Mining using Minimal Occurrences 11 2.3 Episode Mining from a Complex Event Sequence 14 3. Proposed Methods 16 3.1 Preliminary 16 3.2 Extension of Position Pairs Set 21 3.3 Episode Mining using Simultaneous Event Sets 27 3.3.1 Mining Frequent Simultaneous Event Sets 27 3.3.2 Database Encoding 28 3.3.3 Mining Frequent Episodes 29 3.4 Strategy for Updating Minimal Occurrences of Episode 32 3.5 Support Maximum Time Duration 35 3.6 Target Episode Mining using Simultaneous Event Sets 35 3.6.1 Mining Frequent Target Episodes 36 3.6.2 Mining Prefix Episodes of Frequent Target Episodes 39 4. Experimental Evaluation 41 4.1 Experiments on Synthetic Data 41 4.1.1 Complex Event Sequence Simulator 41 4.1.2 Experimental Design 42 4.1.3 Experimental Results 43 4.1.3.1 Varying Minimum Support 44 4.1.3.2 Varying Maximum Time Duration 45 4.1.3.3 Varying the Number of Time Points of Complex Event Sequence 47 4.1.3.4 Varying of the Average Number of Simultaneous Events 49 4.2 Experiments on Real Data 49 4.2.1 Data Preprocessing 51 4.2.2 Experimental Results 54 4.2.2.1 Varying of the Minimum Support 55 4.2.2.2 Varying of the Coverage Rate 57 5. Conclusions and Future Work 59 5.1 Conclusions 59 5.2 Future Work 61 References 62

    [1] R. Agrawal and R. Srikant, “Mining sequential patterns,” In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), pages 3-14, 1995.
    [2] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” In Proceedings of the 36th International Conference on Very Large Data Bases (VLDB’94), pages 487-499, 1994.
    [3] G. Casas_Garriga. “Discovering unbounded episodes in sequential data,” In Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2003.
    [4] Abhi Dattasharma, Praveen Kumar Tripathi and Sridhar G, “Identifying Stock Similarity Based on Multi-event Episodes,” In Proceedings of the 7th Australasian Data Mining Conference (AusDM), pages 153-162, 2008.
    [5] Abhi Dattasharma, Praveen Kumar Tripathi and Sridhar G, “Identifying stock similarity based on episode distances,” In Proceedings of 11th International Conference on Computer and Information Technology, pages 28-35, 2008.
    [6] M. N. Garofalakis, R. Rastogi, and K. Shim. “Spirit: Sequential pattern mining with reqular expression of constraints,” IEEE Transactions on Knowledge and Data Engineering (TKDE), 14(3):530-552, 2002.
    [7] J. Han and J. Pei, “Mining frequent patterns by pattern-growth: Methodology and implications,” ACM SIGKDD Explorations (Special Issue on Scalable Data Mining Algorithms), 2(2):14-20, 2000.
    [8] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U.Dayal, and M. Hsu, “Freespan: Frequent pattern-projected sequential pattern mining,” In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pages 355-359, 2000.
    [9] S. K. Harms, J. Deogun, J. Saquer, and T. Tadesse, “Discovering representative episodal asspciation rules from event sequences,” In Proceedings of the 2011 IEEE International Conference on Data Mining (ICDM’01), 2001.
    [10] Kuo-Yu Huang and Chia-Hui Chang, “Efficient mining of frequent episodes from complex sequences,” Information Systems, 33:96-114, 2008.
    [11] K. Hwang, M. Cai, Y. Chen, M. Qin, “Hybrid intrusion detection with weighted signature generation over anomalous Internet episodes,” IEEE Transactions on Dependable and Secure Computing, pages 41-55, 2007.
    [12] S. Laxman, P. S. Sastry and K. P. Unnikrishnan, “A fast algorithm for finding frequent episodes in event streams”, In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07), 2007.
    [13] C. R. Lin, C. H. Yun and M. S. Chen, “Utiliaing slice scan and selective hash for episode mining,” In Proceedings of 7th International Conference on Knowledge Discovery and Data Mining (KDD’01), 2001.
    [14] J. Luo and S. M. Bridges, “Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection,” International Conference on Data Engineering (ICDE’01), pages 205-214, 2001.
    [15] J. Luo, S. M. Bridges and R. B. Vaughn, “ Fuzzy frequent episodes for real-time intrusion detection,” IEEE International Conference on Fuzzy Systems, 2001.
    [16] X. Ma, H. PANG, K. L. TAN, “Finding constrained frequent episodes using minimal occurrences,” In Proceedings of 4th IEEE International Conference on Data Mining, pages 471-474,2004.
    [17] H. Mannila, H. Toivonen and A. I. Verkamo, “Discovering frequent episodes in sequences,” In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD’95), pages 210-215, 1995.
    [18] H. Mannila, H. Toivonen, “Discovering generalized episodes using minimal occurrences,” In Proceedings of the Seconf International Conference on Knowledge Discovery and Data Mining (KDD’96), pages 146-151, 1996.
    [19] H. Mannila, H. Toivonen and A. I. Verkamo, “Discovering frequent episodes in event sequences,” Data Mining and Knowledge Discovery (DMKD), 1(3):259-289, 1997.
    [20] Nicolas Meger and Christophe Rigotti, “Constraint-based mining of episode rules and optimal window sizes,” In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 313-324, 2004.
    [21] N. L. N. Meger, C. Leschi and C. Rigotti, “ Mining episodes rules in stulong dataset,” In ECML/PKDD 2004 Discovery Challenge (PKDD), 2004.
    [22] A. Ng and A. W.-C. Fu. “Mining frequent episodes for relating financial events and stock trends,” In Proceedings of 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2003.
    [23] Debprakash Patnaik, Patrick Butler, Naren Ramakrishnan, Laxmi Parida, Benjamin J. Keller and Bavid A. Hanauer, MD, “Experimences with mining temporal event sequences from electronic medical records: initial successes and some challenges,” In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), pages 360-368, 2011.
    [24] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U.Dayal, and M.-C. Hsu, “Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth,” In Proceedings of the International Conference on Data Engineering (ICDE’01), pages 215-226, 2001.
    [25] J. Pei, J. Han and W.wang, “Mining sequential patterns with constraints in large database,” In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM’02), 2002.
    [26] K. H. Min Qin, “ Frequent episode rules for internet anomaly dection,” In Proceedings of The Third IEEE International Symposium on Network Computing and Applications (NCA), 2004.
    [27] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” In Proceedings of the 5th International Conference on Extending Database Technology (EDBT’96), volume 1057 of Lecture Notes in Computer Science, pages 3-17, springer, 1996.
    [28] Min-Feng Wang, Yen-Ching Wu and Meng-Feng Tsai, “Exploiting frequent episodes in weighted suffix tree to improve intrusion detection system,” In Proceedings of International Conference Advanced Information Networking and Applications (AINA’08), pages 1246–1252, 2008.
    [29] M. J. Zaki. “Scalable algorithms for association mining,” IEEE Transactions on Knowledge and Data Engineering (TKDE), 12(3):372-390, 2000.
    [30] H. Zhu, P. Wang, X. He, Y. Li, W. Wang, and B. Shi, “Efficient episode mining with minimal and non-overlapping occurrences,” In Proceedings of 10th IEEE International Conference on Data Mining, 2010.

    下載圖示 校內:2017-11-30公開
    校外:2017-11-30公開
    QR CODE