研究生: |
陳冠穎 Chen, Kuan-Ying |
---|---|
論文名稱: |
高效益時間間隔特徵樣式探勘 Mining Frequent and Top-K High Utility Time Interval-based Event with Duration Patterns from Temporal Database |
指導教授: |
黃仁暐
Huang, Jen-Wei |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 英文 |
論文頁數: | 58 |
中文關鍵詞: | 循序式樣式探勘 、時間間隔樣式探勘 、高效益樣式探勘 |
外文關鍵詞: | Sequential pattern mining, time interval-based event with duration pattern mining, high utility pattern mining |
相關次數: | 點閱:104 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
時間間隔為基礎的模式挖掘提出是為了改善缺乏時間區間的訊息。以往的關於時間間隔為基礎的樣式挖掘的研究皆著重於事件與事件之間的關係,而不考慮每個事件的持續時間。然而,同樣的事件擁有不同的持續時間可能導致截然不同的結果。例如,如果有些患者咳嗽一天,他們會感冒一個星期。相對的,如果一些患者咳嗽為一年,他們在未來可能有相當高的機率得到肺炎。在本研究中,我們提出了兩種演算法,SARA和SARS,用來抽取出時間間隔並包含時間長度的樣式,稱之為TIED。TIED模式不僅能擁有事件之間的關係,也顯示了每個事件發生和結束與時間週期。在實驗中,我們提出了一個簡易的演算法並修改以往研究者的演算法與SARA和SARS進行比較。實驗結果證明,SARA和SARS,比其他兩種演算法的效率更高且使用較少的記憶體用量。但時間間隔樣式仍具有局限性,它沒有考慮每個事件在每個時間的價值。因此我們提出另一種樣式,高效益性時間間隔特徵樣式,它結合了時間間
隔樣式挖掘的概念並與高效益樣式挖掘。另外我們提出一個演算法,LMSpan,用來挖掘高效益時間間隔樣式。在實驗中,我們設計一個簡易的演算法,Naive UDP,並與LMSpan做比較。從實驗結果得知,LMSpan比Naive UDP需要較少的時間和記憶體且更為有效率。
Time interval-based pattern mining is proposed to improve the lack of the information of time intervals by sequential pattern mining. Previous works of time interval-based pattern mining focused on the relations between events without considering the duration of each event. However,
the same event with different time durations will cause definitely different results. For example, if some patients cough for one day, they may get a cold for one week. In contrast, if some patients cough for one year, they may get pneumonia in the future. In this work, we propose two algorithms, SARA and SARS, to extract the frequent Time Interval-based Event with Duration, TIED, patterns. TIED patterns not only keep the relations between two events
but also reveal the time periods when each event happens and ends. In the experiments, we propose a naive algorithm and modify a previous algorithm to compare the performances with
SARA and SARS. The experimental results show that SARA and SARS are more efficient in execution time and memory usage than other two algorithms. Otherwise, TIED pattern still
has limitation, which do not consider the utility value in each time stamp. We propose another pattern, High Utility Time Interval-based with Duration Pattern, which combine the concept of utility pattern mining and TIED pattern mining. We design an algorithm, LMSpan, to mine High Utility Time Interval-based with Duration, HU-TIED, Pattern. In the experiments, we design another naive algorithm, Naive UDP to compare with LMSpan. LMSpan take less time and memory use and generate less candidate than Naive UDP.
[1] R. Agrawal and S. Ieong. Aggregating web offers to determine product prices. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 435–443, 2012.
[2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Base, pages 487– 499, 1994.
[3] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 12nd IEEE International Conference on Data Engineering, pages 3–14, 1995.
[4] J. Allen. Maintaining knowledge about temporal intervals. Communication of ACM, 26(11):832–843, 1983.
[5] S. Aseervatham, A. Osmani, and E. Viennet. bitSPADE: A lattice-based sequential pattern mining algorithm using bitmap representation. In Proceedings of the 16th IEEE International Conference on Data Mining, pages 792–797, 2006.
[6] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 429–435, 2002.
[7] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web, pages 519–528, 2012.
[8] I. Batal, D. Fradkin, J. Harrison, F. Moerchen, and M. Hauskrecht. Mining recent temporal patterns for event detection in multivariate time series data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 280–288, 2012.
[9] Z. A. Bawab, G.-H. Mills, and J.-F. Crespo. Finding trending local topics in search queries for personalization of a recommendation system. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 397–405, 2012.
[10] H.-P. Cao, N. Mamoulis, and D.-W. Cheung. Mining frequent spatio-temporal sequential patterns. In Proceedings of the 15th IEEE International Conference on Data Mining, pages 82–89, 2005.
[11] Y.-C. Chen, S.-H. Chang, W.-C. Peng, and S.-Y. Lee. Efficient algorithms for influence maximization in social networks. Knowledge and Information Systems, 33(3):577–601, 2012.
[12] Y.-C. Chen, C.-C. Chen, W.-C. Peng, and W.-C. Lee. Mining correlation patterns among appliances in smart home environment. In Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 222–233, 2014.
[13] Y.-C. Chen, J.-C. Jiang, W.-C. Peng, and S.-Y. Lee. An efficient algorithm for mining time interval-based patterns in large database. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pages 49–58, 2010.
[14] Y.-C. Chen, W.-C. Peng, and S.-Y. Lee. CEMiner – An efficient algorithm for mining closed patterns from time interval-based data. In Proceedings of the 21st IEEE International Conference on Data Mining, pages 121–130, 2011.
[15] Y.-C. Chen, W.-C. Peng, and S.-Y. Lee. Exploring community structures for influence maximization in social networks. In Proceedings of the 6th International Workshop on Social Network Mining and Analysis, joint with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1–6, 2012.
[16] Y.-C. Chen, W.-C. Peng, and W.-C. Lee. A novel system for mining useful correlation in smart home. In Proceedings of the 6th International Workshop on Domain Driven Data Mining, joint with the 23th IEEE International Conference on Data Mining, pages 357–364, 2013.
[17] Y.-C. Chen, W.-Y. Zhu, W.-C. Peng, W.-C. Lee, and S.-Y. Lee. CIM: Community-based influence maximization in social networks. ACM Transactions on Intelligent Systems and Technology, 5(2):1–31, 2014.
[18] H. Cheng, X.-Y. Yan, and J.-W. Han. IncSpan: Incremental mining of sequential patterns in large database. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 527–532, 2004.
[19] I.-N. Davidson, S. Gilpin, O.-T. Carmichael, and P.-B. Walker. Network discovery via constrained tensor analysis of fMRI data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 194–202, 2013.
[20] W.-H. Du, J.-W. Rau, J.-W. Huang, and Y.-S. Chen. Improving the quality of tags using state transition on progressive image search and recommendation system. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, pages 3233–3238, 2012.
[21] A. Erwin, R.-P. Gopalan, and N.-R. Achuthan. Efficient mining of high utility itemsets from large datasets. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 554–561, 2008.
[22] M.-N. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of the 25th International Conference on Very Large Data Base, pages 223–234, 1999.
[23] T. Guyet and R. Quiniou. Mining temporal patterns with quantitative intervals. In Proceedings of the 18th IEEE International Conference on Data Mining Workshops, pages 218–227, 2008.
[24] P. Haider, L. Chiarandini, and U. Brefeld. Discriminative clustering for market segmentation. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 417–425, 2012.
[25] J.-W. Han, J. Pei, and Y.-W. Yin. Mining frequent patterns without candidate generation. ACM Special Interest Group on Management of Data, 29(2):1–12, 2000.
[26] C.-C. Ho, H.-F. Li, F.-F. Kuo, and S.-Y. Lee. Incremental mining of sequential patterns over a stream sliding window. In Proceedings of the International Workshop on Mining Evolving and Streaming Data, joint with 16th IEEE International Conference on Data Mining, pages 677–681, 2006.
[27] Y.-H. Hu, C.-K. Huang, H.-R. Yang, and Y.-L. Chen. On mining multi-time-interval sequential patterns. Data and Knowledge Engineering, 68(10):1112–1127, 2009.
[28] J.-W. Huang, S.-C. Lin, and M.-S. Chen. DPSP: Distributed progressive sequential pattern mining on the cloud. In Proceedings of the 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 27–34, 2010.
[29] J.-W. Huang, C.-Y. Tseng, M.-C. Chen, and M.-S. Chen. PISAR: Progressive image search and recommendation system by auto-interpretation and user behavior. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, pages 1442–1447, 2011.
[30] J.-W. Huang, C.-Y. Tseng, J.-C. Ou, and M.-S. Chen. A general model for sequential pattern mining with a progressive database. IEEE Transactions on Knowledge and Data Engineering, 20(9):1153–1167, 2008.
[31] H.-F. Li, H.-Y. Huang, Y.-C. Chen, Y.-J. Liu, and S.-Y. Lee. Fast and memory efficient mining of high utility itemsets in data streams. In Proceedings of the 18th IEEE International Conference on Data Mining, pages 881–886, 2008.
[32] Y.-C. Li, J.-S. Yeh, and C.-C. Chang. Isolated items discarding strategy for discovering high-utility itemsets. Data and Knowledge Engineering, 64(1):198–217, 2008.
[33] V. C.-C. Liao and M.-S. Chen. DFSP: a depth-first spelling algorithm for sequential pattern mining of biological sequences. Knowledge and Information Systems, 38(3):623–639, 2014.
[34] M.-Y. Lin and S.-Y. Lee. Incremental update on sequential patterns in large databases by implicit merging and efficient counting. Information System, 29(5):385–404, 2004.
[35] X.-M. Liu, J. Biagioni, J. Eriksson, Y. Wang, G. Forman, and Y.-M. Zhu. Mining largescale, sparse GPS traces for map inference: Comparison of approaches. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 669–677, 2012.
[36] Y. Mao, W.-L. Chen, Y.-X. Chen, C.-Y. Lu, M. Kollef, and T. Bailey. An integrated data mining approach to real-time clinical monitoring and deterioration warning. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1140–1148, 2012.
[37] F. Masseglia, P. Poncelet, and M. Teisseire. Incremental mining of sequential patterns in large databases. Data and Knowledge Engineering, 46(1):97–121, 2003.
[38] F. Morchen and D. Fradkin. Robust mining of time intervals with semi-interval partial order patterns. In Proceedings of the 20th SIAM International Conference on Data Mining, pages 315–326, 2010.
[39] F. Nakagaito, T. Ozaki, and T. Ohkawa. Discovery of quantitative sequential patterns from event sequences. In Proceedings of the 19th IEEE International Conference on Data Mining Workshops, pages 31–36, 2009.
[40] S. Nguyen and M. Orlowska. Improvements of INCSPAN: Incremental mining of sequential patterns in large database. In Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 442–451, 2005.
[41] P. Papapetrou, G. Kollios, and S. Sclaroff. Mining frequent arrangements of temporal intervals. Knowledge and Information System, 21(2):133–171, 2009.
[42] P. Papapetrou, G. Kollios, S. Sclaroff, and D. Gunopulos. Discovering frequent arrangements of temporal intervals. In Proceedings of the 15th IEEE International Conference on Data Mining, pages 354–361, 2005.
[43] S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. Proceedings of the 18th ACM International Confereince on Information and Knowledge Management, 1999.
[44] D. Patel, W. Hsu, and M.-L. Lee. Mining relationships among interval-based events for classification. In Proceedings of the ACM Special Interest Group on Management of Data, pages 393–404, 2008.
[45] J. Pei, J.-W. Han, B. Mortazavi-asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th IEEE International Conference on Data Engineering, pages 215–224, 2001.
[46] Q. Y. R. Chan and Y.-D. Shen. Mining high-utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining, pages 19–26, 2003.
[47] I. Shafer, K. Ren, V. Boddeti, Y. Abe, G.-R. Ganger, and C. Faloutsos. RainMon: An integrated approach to mining bursty timeseries monitoring data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1158–1166, 2012.
[48] K. Shi and K. Ali. GetJar mobile application recommendations with very sparse datasets. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 204–212, 2012.
[49] P. Sondhi, J.-M. Sun, H.-H. Tong, and C.-X. Zhai. SympGraph: A framework for mining clinical notes through symptom relation graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1167–1175, 2012.
[50] R.-B.-V. Subramanyam, A.-S. Rao, R. Karnati, S.-R. Suvvari, and D.-V.-N. Somayajulu. Mining closed sequential patterns in progressive databases. Journal of Information and Knowledge Management, 12(3):1–10, 2013.
[51] Y. Tabei, A. Kishimoto, M. Kotera, and Y. Yamanishi. Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 176–184, 2013.
[52] J. Tang, S. Wu, and J.-M. Sun. Confluence: Conformity influence in large social networks. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 347–355, 2013.
[53] L.-A. Tang, X. Yu, Q.-Q. Gu, J.-W. Han, A. Leung, and T. L. Porta. Mining lines in the sand: On trajectory discovery from untrustworthy data in cyber-physical system. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 410–418, 2013.
[54] V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu. UP-Growth: an efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 253–262, 2010.
[55] J.-Y. Wang and J.-W. Han. BIDE: Efficient mining of frequent closed sequences. In Proceedings of the 20th IEEE International Conference on Data Engineering, pages 79–90, 2004.
[56] K. Wang, Y. Xu, and J.-X. Yu. Scalable sequential pattern mining for biological sequences. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pages 178–187, 2004.
[57] L.-Y. Wei, Y. Zheng, and W.-C. Peng. Constructing popular routes from uncertain trajectories. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 195–203, 2012.
[58] J. T.-Y. Weng, S.-H. Wu, Y.-C. Chen, W.-C. Hsu, C.-S. Lee, and A. T.-A. Cheng. Epigenetic profiling of DNA methylation changes associated with chronic alcohol consumption: a 12-year follow up study. In Proceedings of the 6th IEEE International Conference on BioMedical Engineering and Informatics, pages 471–476, 2012.
[59] C.-W. Wu, B.-E. Shie, V. S. Tseng, and P. S. Yu. Mining top-K high utility itemsets. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 78–86, 2012.
[60] S.-Y. Wu and Y.-L. Chen. Mining nonambiguous temporal patterns for interval-based events. IEEE Transactions on Knowledge and Data Engineering, 19(6):742–758, 2007.
[61] S. Xiang, L. Yuan, W. Fan, Y.-L. Wang, P.-M. Thompson, and J.-P. Ye. Multi-source learning with block-wise missing data for alzheimer’s disease prediction. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 185–193, 2013.
[62] X.-T. Yang, A. Ghoting, Y. Ruan, and S. Parthasarathy. A framework for summarizing and analyzing twitter feeds. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 370–378, 2012.
[63] H. Yao, H.-J. Hamilton, and L. Geng. A unified framework for utility-based measures for mining itemsets. In Proceedings of 2nd Workshop on Utility-Based Data Mining, joint with the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 28–37, 2006.
[64] J.-F. Yin, Z.-G. Zheng, and L.-B. Cao. USpan: an efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 660–668, 2012.
[65] J.-F. Yin, Z.-G. Zheng, L.-B. Cao, Y. Song, and W. Wei. Efficiently mining top-K high utility sequential patterns. In Proceedings of the 23th IEEE International Conference on Data Mining, pages 1259–1264, 2013.
[66] J. Yuan, Y. Zheng, and X. Xie. Discovering regions of different functions in a city using human mobility and pois. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 186–194, 2012.
[67] L. Yuan, Y.-L.Wang, P.-M. Thompson, V.-A. Narayan, and J.-P. Ye. Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1149–1157, 2012.
[68] W.-J. Zhou, H.-X. Jin, and Y. Liu. Community discovery and profiling with social messages. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 388–396, 2012.
[69] F. Zhu, X.-F. Yan, J.-W. Han, and P. Yu. Efficient discovery of frequent approximate sequential patterns. In Proceedings of the 17th IEEE International Conference on Data Mining, pages 751–756, 2007.