簡易檢索 / 詳目顯示

研究生: 吳永斌
Wu, Yong-Bin
論文名稱: 漸進式資料庫中正向與負向循序性樣式之探勘
Mining Positive and Negative Sequential Patterns in a Progressive Database
指導教授: 黃仁暐
Huang, Jen-Wei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 52
中文關鍵詞: 漸進式循序性樣式負向循序性樣式關注頻率比例PS樹按層次遍歷剪枝策略
外文關鍵詞: progressive sequential pattern, negative sequential pattern, frequency ratio of interest, PS-tree, level-order traversal, pruning strategy
相關次數: 點閱:128下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 正向循序性樣式探勘著重於出現的項目,而負向循序性樣式探勘傾向於找出發生與未發生項目之間的關係。過去僅有少數的研究與負向循序性樣式探勘有關,而且在每一個研究中,對於負向循序性樣式的定義都不一致。以往使用在正向循序性樣式上的支持度閾值也一直被套用在負向循序性樣式上,使得較有趣的樣式不容易被找出。除此之外,正向循序性樣式探勘已經發展出漸增式循序性樣式探勘與漸進式循序性樣式探勘,而負向循序性樣式探勘卻仍只於靜態資料庫上執行。漸進式循序性樣式探勘會在資料庫中發掘最新的循序性樣式,而最新的樣式能夠提供更有價值的資訊。然而,以往的漸進式循序性樣式探勘演算法包含一些多餘的處理程序,並存在著可以加快效率的空間。在本研究中,我們旨在結合負向循序性樣式探勘與漸進式循序性樣式探勘,提出Propone演算法以達成有效率的探勘。我們提高正向漸進式循序性樣式的探勘速度,並且賦予負向循序性樣式一個新的定義,藉此找出更具意義與有趣的樣式。我們也另外提出一個剪枝策略的方法來減少候選負向循序性樣式的數量與計算時間。藉由修改以往的演算法並以其與Propone比較過後,實驗結果顯示Propone的效能優於其他演算法。

    Positive sequential pattern (PSP) mining focuses on appearing items, while negative sequential pattern (NSP) mining tends to find the relationship between occurring and non-occurring items. There are few works involved in NSP mining, and the definitions of NSP are inconsistent in each work. The support threshold for PSP is always applied on NSP, which cannot bring out interesting patterns. In addition, PSP has been discovered on incremental databases and progressive databases, while NSP mining is only performed on static databases. Progressive sequential pattern mining finds the most up-to-date patterns, which can provide more valuable information. However, the previous progressive sequential pattern mining algorithm contains some redundant process. In this paper, we aim to find NSP on progressive databases. A new definition of NSP is given to discover more meaningful and interesting patterns. We propose an algorithm, Propone, for efficient mining process. We also propose a lever-order traversal strategy and a pruning strategy to reduce the calculation time and the number of negative sequential candidates (NSC). By comparing Propone with some modified previous algorithms, the experimental results show that Propone outperforms comparative algorithms.

    中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Positive Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Negative Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Positive Sequential Pattern (PSP) . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Negative Sequential Pattern (NSP) . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Constraints on Negative Sequence . . . . . . . . . . . . . . . . . . . . . . 13 3.2.2 Subsequence of Negative Sequence . . . . . . . . . . . . . . . . . . . . . . 14 3.2.3 Support Counting of Negative Sequences . . . . . . . . . . . . . . . . . . 14 3.2.4 Definition of NSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Progressive Mining of Positive and Negative Sequential Patterns . . . . . . . . . . . . 18 4.1 Propone: Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 PS-tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 Outputting PSP and NSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Different Minimum Support Thresholds . . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Different Length of POI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.4 Different Number of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.5 Analyses of FRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.6 Number of Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.7 Test on Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6 Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    [1] Kdd cup 2007. https://www.cs.uic.edu/~liub/Netflix-KDD-Cup-2007.html.
    [2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), pages 487–499, Sept. 1994.
    [3] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), pages 3–14, Mar. 1995.
    [4] R. Agrawal and R. Srikant. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology (EDBT’96), pages 1–17, Mar. 1996.
    [5] J. F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843, 1983.
    [6] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02), pages 429–435, 2002.
    [7] I. Batal, D. Fradkin, J. Harrison, F. Moerchen, and M. Hauskrecht. Mining recent temporal patterns for event detection in multivariate time series data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12), pages 280–288, 2012.
    [8] K.-Y. Chen, B. P. Jaysawal, J.-W. Huang, and Y.-B. Wu. Mining frequent time intervalbased event with duration patterns from temporal database. In Proceedings of the 2014 International Conference on Data Science and Advanced Analytics (DSAA’14), pages 548–554, Oct. 2014.
    [9] Y.-C. Chen, J.-C. Jiang, W.-C. Peng, and S.-Y. Lee. An efficient algorithm for mining time interval-based patterns in large database. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10), pages 49–58, 2010.
    [10] Y.-C. Chen, W.-C. Peng, and S.-Y. Lee. Ceminer–an efficient algorithm for mining closed patterns from time interval-based data. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM’11), pages 121–130, Dec. 2011.
    [11] H. Cheng, X. Yan, and J. Han. Incspan: Incremental mining of sequential patterns in large database. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pages 527–532, 2004.
    [12] X. Dong, Z. Zheng, L. Cao, Y. Zhao, C. Zhang, J. Li, W. Wei, and Y. Ou. e-nsp: Efficient negative sequential pattern mining based on identified positive patterns without database rescanning. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11), pages 825–830, 2011.
    [13] P. Fournier-Viger, C.-W. Wu, and V. S. Tseng. Mining maximal sequential patterns without candidate maintenance. In Proceedings of the 9th International Conference on Advanced Data Mining and Applications (ADMA’13), pages 169–180, Dec. 2013.
    [14] M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB’99), pages 223–234, 1999.
    [15] A. Gomariz, M. Campos, R. Marin, and B. Goethals. Clasp: An efficient algorithm for mining frequent closed sequences. In Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13), pages 50–61, Apr. 2013.
    [16] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: Frequent pattern-projected sequential pattern mining. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pages 355–359, 2000.
    [17] R. L. Haupt and S. E. Haupt. Practical genetic algorithms. John Wiley & Sons, 2004.
    [18] S.-C. Hsueh, M.-Y. Lin, and C.-L. Chen. Mining negative sequential patterns for ecommerce recommendations. In IEEE 2008 Asia-Pacific Services Computing Conference (APSCC’08), pages 1213–1218, Dec. 2008.
    [19] J.-W. Huang, C.-Y. Tseng, J.-C. Ou, and M.-S. Chen. A general model for sequential pattern mining with a progressive database. IEEE Transactions on Knowledge and Data Engineering (TKDE’08), 20(9):1153–1167, Sept. 2008.
    [20] A. Koper and H. S. Nguyen. Sequential pattern mining from stream data. In Proceedings of the 7th International Conference on Advanced Data Mining and Applications (ADMA’11), pages 278–291, Dec. 2011.
    [21] C.-H. Lin, D.-Y. Chiu, Y.-H. Wu, and A. L. P. Chen. Mining frequent itemsets from data streams with a time-sensitive sliding window. In Proceedings of the 2005 SIAM International Conference on Data Mining (SDM’05), pages 68–79, 2005.
    [22] N. P. Lin, H.-J. Chen, and W.-H. Hao. Mining negative sequential patterns. In Proceedings of the 6th WSEAS International Conference on Applied Computer Science (ACOS’07), pages 654–658, 2007.
    [23] N. P. Lin, H.-J. Chen, W.-H. Hao, H.-E. Chueh, and C.-I. Chang. Mining strong positive and negative sequential patterns. WSEAS Transactions on Computers, 7(3):119–124, Mar. 2008.
    [24] C. Luo and S. M. Chung. Efficient mining of maximal sequential patterns using multiple samples. In Proceedings of the 2005 SIAM International Conference on Data Mining (SDM’05), pages 415–426, 2005.
    [25] A. Marascu and F. Masseglia. Mining sequential patterns from temporal streaming data. In Proceedings of the 1st ECML/PKDD Workshop on Mining Spatio-Temporal Data (MSTD’05), pages 1–13, Nov. 2005.
    [26] A. Marascu and F. Masseglia. Mining sequential patterns from data streams: a centroid approach. Journal of Intelligent Information Systems, 27(3):291–307, Nov. 2006.
    [27] F. Masseglia, P. Poncelet, and M. Teisseire. Incremental mining of sequential patterns in large databases. Data & Knowledge Engineering, 46(1):97–121, July 2003.
    [28] I. Miliaraki, K. Berberich, R. Gemulla, and S. Zoupanos. Mind the gap: Large-scale frequent sequence mining. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13), pages 797–808, 2013.
    [29] M. Mitchell. An introduction to genetic algorithms. MIT press, 1998.
    [30] F. Moerchen and D. Fradkin. Robust mining of time intervals with semi-interval partial order patterns. In Proceedings of the 2010 SIAM International Conference on Data Mining (SDM’10), pages 315–326, 2010.
    [31] S. N. Nguyen, X. Sun, and M. E. Orlowska. Improvements of incspan: Incremental mining of sequential patterns in large database. In Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’05), pages 442–451, May 2005.
    [32] J.-Z. Ouh, P.-H. Wu, and M.-S. Chen. Experimental results on a constraint based sequential pattern mining for telecommunication alarm data. In Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE’01), pages 186–193 vol.2, Dec. 2001.
    [33] W. Ouyang and Q. Huang. Mining positive and negative sequential patterns with multiple minimum supports in large transaction databases. In Proceedings of the 2010 Second WRI Global Congress on Intelligent Systems (GCIS’10), pages 190–193, Dec. 2010.
    [34] W.-m. Ouyang and Q.-h. Huang. Mining negative sequential patterns in transaction databases. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics (ICMLC’07), pages 830–834, Aug. 2007.
    [35] A. Pandey, J. Srivastava, and S. Shekhar. Technical report 01-004, web proxy server with intelligent prefetcher for dynamic pages using association rules. Univ. Minnesota, Comput. Sci. Eng. Dept., Twin Cities, Tech. Rep, Jan. 2001.
    [36] A. Pandey, R. R. Vatsavai, X. Ma, J. Srivastava, and S. Shekhar. Data mining for intelligent web prefetching. In Proceedings of the Workshop on Mining Data Across Multiple Customer Touchpoints for CRM (MDCRM’02), 2002.
    [37] S. Parthasarathy, M. J. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. In Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM’99), pages 251–258, 1999.
    [38] D. Patel, W. Hsu, and M. L. Lee. Mining relationships among interval-based events for classification. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08), pages 393–404, 2008.
    [39] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th International Conference on Data Engineering (ICDE’01), pages 215–224, Apr. 2001.
    [40] J. Pei, J. Han, and W. Wang. Mining sequential patterns with constraints in large databases. In Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM’02), pages 18–25, 2002.
    [41] J. Wang and J. Han. Bide: Efficient mining of frequent closed sequences. In Proceedings of the 20th International Conference on Data Engineering (ICDE’04), pages 79–90, 2004.
    [42] S.-Y. Wu and Y.-L. Chen. Mining nonambiguous temporal patterns for interval-based events. IEEE Transactions on Knowledge and Data Engineering (TKDE’07), 19(6):742–758, June 2007.
    [43] X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large datasets. In Proceedings of the 2003 SIAM International Conference on Data Mining (SDM’03), pages 166–177, 2003.
    [44] D. Yuan, K. Lee, H. Cheng, G. Krishna, Z. Li, X. Ma, Y. Zhou, and J. Han. Cispan: Comprehensive incremental mining algorithms of closed sequential patterns for multi-versional software mining. In Proceedings of the 2008 SIAM International Conference on Data Mining (SDM’08), pages 84–95, 2008.
    [45] M. J. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1-2):31–60, 2001.
    [46] Y. Zhao, H. Zhang, L. Cao, C. Zhang, and H. Bohlscheid. Mining both positive and negative impact-oriented sequential rules from transactional data. In Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’09), pages 656–663, Apr. 2009.
    [47] Y. Zhao, H. Zhang, S. Wu, J. Pei, L. Cao, C. Zhang, and H. Bohlscheid. Debt detection in social security by sequence classification using both positive and negative patterns. In Proceedings of the 2009 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD’09), pages 648–663, Sept. 2009.
    [48] Z. Zheng, Y. Zhao, Z. Zuo, and L. Cao. Negative-gsp: An efficient method for mining negative sequential patterns. In Proceedings of the 8th Australasian Data Mining Conference - Volume 101 (AusDM’09), pages 63–67, 2009.
    [49] Z. Zheng, Y. Zhao, Z. Zuo, and L. Cao. An efficient ga-based algorithm for mining negative sequential patterns. In Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’10), pages 262–273, June 2010.

    無法下載圖示 校內:2021-06-28公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE