| 研究生: |
楊鄭緯 Yang, Cheng-Wei |
|---|---|
| 論文名稱: |
在時間間隔序列中考慮持續時間配對和事件配對之子序列搜尋 Subsequence Search in Time Interval-based Event Sequences with Duration Match and Event Match |
| 指導教授: |
黃仁暐
Huang, Jen-Wei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 子序列搜尋 、時間間隔事件 、子序列容錯搜尋 、端點索引 |
| 外文關鍵詞: | Subsequence search, Time interval-based event with duration, Subsequence search with errors, Endpoint Index |
| 相關次數: | 點閱:87 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
先前在時間間隔序列中的子序列搜尋工作集中在事件之間的關係,而不考慮每個事件的持續時間。然而具有不同持續時間的相同事件可能導致不同的結果。在這項研究中,我們提出 了一個索引結構,稱為端點索引(Endpoint Index)是基於倒排索引(Inverted Index)的概念,有效率地提取時間間隔事件並考慮持續時間,時間間隔子序列(TIED subsequence)。時間間隔子序列搜索(TIED subsequence search)結合事件之間的關係並考慮事件間隔的持續時間。這使得時間間隔子序列搜索的結果比使用傳統搜索方法獲得的結果更精確。我們還提出了一種演算法SSD,考慮持續時間的子序列搜索,並結合剪枝策略來有效率地搜索時間間隔子序列。 對於性能評估,我們修改以前的演算法,以與我們提出的方法SSD_Endpoint和SSD_ER比較。實驗結果顯示,SSD_Endpoint和SSD_ER在記憶體使用和執行時間方面比現在最先進的演算法更高效率。
Previous works of subsequence search in time interval-based event sequences have focused on the relations between events without considering the duration of each event. However, the same event with different time durations may lead to different results. In this work, we propose an index structure referred to as Endpoint Index based on the concept of inverted index to efficiently extract the Time Interval-based Event with Duration, TIED, subsequence. TIED subsequence search considers duration of event intervals in conjunction with relation among events. This makes the results of TIED subsequence search more accurate than those obtained using traditional search methods. We also propose an algorithm SSD, Subsequence Search with Duration, incorporating a pruning strategy to search TIED subsequences efficiently. For the performance evaluation, we modify previous algorithms to compare with our proposed methods SSD_Endpoint and SSD_ER. The experimental results demonstrate that SSD_Endpoint and SSD_ER are more efficient than state-of-the-art algorithms in terms of memory usage and execution time.
[1] R. Agrawal and S. Ieong. Aggregating web offers to determine product prices. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 435–443, 2012.
[2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Base, pages 487– 499, 1994.
[3] R. Agrawal and R. Srikant. Mining sequential patterns. In Data Engineering, 1995. Proceedings of the Eleventh International Conference on, pages 3–14. IEEE, 1995.
[4] J. M. Ale and G. H. Rossi. An approach to discovering temporal association rules. In Proceedings of the 2000 ACM symposium on Applied computing-Volume 1, pages 294–300. ACM, 2000.
[5] J. Allen. Maintaining knowledge about temporal intervals. Communication of ACM, 26(11):832–843, 1983.
[6] S. Aseervatham, A. Osmani, and E. Viennet. bitSPADE: A lattice-based sequential pattern mining algorithm using bitmap representation. In Proceedings of the 16th IEEE International Conference on Data Mining, pages 792–797, 2006.
[7] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 429–435, 2002.
[8] I. Batal, D. Fradkin, J. Harrison, F. Moerchen, and M. Hauskrecht. Mining recent temporal patterns for event detection in multivariate time series data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 280–288, 2012.
[9] Z. A. Bawab, G.-H. Mills, and J.-F. Crespo. Finding trending local topics in search queries for personalization of a recommendation system. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 397–405, 2012.
[10] R. Bellman and R. Kalaba. On adaptive control processes. IRE Transactions on Automatic Control, 4(2):1–9, 1959.
[11] B. Berendt. Explaining preferred mental models in allen inferences with a metrical model of imagery. In Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society, pages 489–494. Citeseer, 1996.
[12] B. Bergen and N. Chang. Embodied construction grammar in simulation-based language understanding. Construction grammars: Cognitive grounding and theoretical extensions, 3:147–190, 2005.
[13] D. J. Berndt and J. Clifford. Using dynamic time warping to fi patterns in time series. In KDD workshop, volume 10, pages 359–370. Seattle, WA, 1994.
[14] K.-Y. Chen, B. P. Jaysawal, J.-W. Huang, and Y.-B. Wu. Mining frequent time interval- based event with duration patterns from temporal database. In Data Science and Advanced Analytics (DSAA), 2014 International Conference on, pages 548–554. IEEE, 2014.
[15] Y.-C. Chen, J.-C. Jiang, W.-C. Peng, and S.-Y. Lee. An efficient algorithm for mining time interval-based patterns in large database. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 49–58. ACM, 2010.
[16] Y.-C. Chen, W.-C. Peng, and S.-Y. Lee. CEMiner – An efficient algorithm for mining closed patterns from time interval-based data. In Proceedings of the 21st IEEE International Conference on Data Mining, pages 121–130, 2011.
[17] H. Cheng, X.-Y. Yan, and J.-W. Han. IncSpan: Incremental mining of sequential patterns in large database. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 527–532, 2004.
[18] A. Corradini. Dynamic time warping for off-line recognition of a small gesture vocabulary. In Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001. Proceedings. IEEE ICCV Workshop on, pages 82–89. IEEE, 2001.
[19] D. Cutting and J. Pedersen. Optimization for dynamic inverted index maintenance. In Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 405–411. ACM, 1989.
[20] M.-N. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of the 25th International Conference on Very Large Data Base, pages 223–234, 1999.
[21] P. Haider, L. Chiarandini, and U. Brefeld. Discriminative clustering for market segmentation. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 417–425, 2012.
[22] J. Han, J. Pei, and M. Kamber. Data mining: concepts and techniques. Elsevier, 2011.
[23] C.-C. Ho, H.-F. Li, F.-F. Kuo, and S.-Y. Lee. Incremental mining of sequential patterns over a stream sliding window. In Proceedings of the International Workshop on Mining Evolving and Streaming Data, joint with 16th IEEE International Conference on Data Mining, pages 677–681, 2006.
[24] Y.-H. Hu, C.-K. Huang, H.-R. Yang, and Y.-L. Chen. On mining multi-time-interval sequential patterns. Data and Knowledge Engineering, 68(10):1112–1127, 2009.
[25] J.-W. Huang, S.-C. Lin, and M.-S. Chen. DPSP: Distributed progressive sequential pattern mining on the cloud. In Proceedings of the 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 27–34, 2010.
[26] J.-W. Huang, C.-Y. Tseng, J.-C. Ou, and M.-S. Chen. A general model for sequential pattern mining with a progressive database. IEEE Transactions on Knowledge and Data Engineering, 20(9):1153–1167, 2008.
[27] L. Hui, Y.-C. Chen, J. T.-Y. Weng, and S.-Y. Lee. Incremental mining of temporal patterns in interval-based database. Knowledge and Information Systems, 46(2):423–448, 2016.
[28] P.-s. Kam and A. W.-C. Fu. Discovering temporal patterns for interval-based events. In International Conference on Data Warehousing and Knowledge discovery, pages 317–326. Springer, 2000.
[29] J. Z. Kolter and M. J. Johnson. Redd: A public data set for energy disaggregation research. In Workshop on Data Mining Applications in Sustainability (SIGKDD), San Diego, CA, volume 25, pages 59–62. Citeseer, 2011.
[30] R. Kosara and S. Miksch. Visualizing complex notions of time. Studies in health technology and informatics, (1):211–215, 2001.
[31] O. Kostakis and P. Papapetrou. Finding the longest common sub-pattern in sequences of temporal intervals. Data Mining and Knowledge Discovery, 29(5):1178–1210, 2015.
[32] O. Kostakis, P. Papapetrou, and J. Hollm´en. Artemis: Assessing the similarity of event- interval sequences. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 229–244. Springer, 2011.
[33] O. K. Kostakis and A. G. Gionis. Subsequence search in event-interval sequences. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 851–854. ACM, 2015.
[34] A. Kotsifakos, I. Karlsson, P. Papapetrou, V. Athitsos, and D. Gunopulos. Embedding based subsequence matching with gaps–range–tolerances: a query-by-humming application. The VLDB Journal, 24(4):519–536, 2015.
[35] A. Kotsifakos, P. Papapetrou, and V. Athitsos. Ibsm: Interval-based sequence matching. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 596–604. SIAM, 2013.
[36] A. Kuzmanic and V. Zanchi. Hand shape classification using dtw and lcss as similarity measures for vision-based gesture recognition system. In EUROCON, 2007. The International Conference on” Computer as a Tool”, pages 264–269. IEEE, 2007.
[37] V. C.-C. Liao and M.-S. Chen. DFSP: a depth-fi spelling algorithm for sequential pattern mining of biological sequences. Knowledge and Information Systems, 38(3):623–639, 2014.
[38] M.-Y. Lin and S.-Y. Lee. Incremental update on sequential patterns in large databases by implicit merging and efficient counting. Information System, 29(5):385–404, 2004.
[39] J. Lonardi and P. Patel. Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining, pages 53–68, 2002.
[40] Y. Mao, W.-L. Chen, Y.-X. Chen, C.-Y. Lu, M. Kollef, and T. Bailey. An integrated data mining approach to real-time clinical monitoring and deterioration warning. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1140–1148, 2012.
[41] F. Masseglia, P. Poncelet, and M. Teisseire. Incremental mining of sequential patterns in large databases. Data and Knowledge Engineering, 46(1):97–121, 2003.
[42] F. Morchen and D. Fradkin. Robust mining of time intervals with semi-interval partial order patterns. In Proceedings of the 20th SIAM International Conference on Data Mining, pages 315–326, 2010.
[43] R. Moskovitch and Y. Shahar. Classification-driven temporal discretization of multivariate time series. Data Mining and Knowledge Discovery, 29(4):871–913, 2015.
[44] M. Mu¨ller. Dtw-based motion comparison and retrieval. Information Retrieval for Music and Motion, pages 211–226, 2007.
[45] M. Mu¨ller, H. Mattes, and F. Kurth. An efficient multiscale approach to audio synchronization. In ISMIR, pages 192–197. Citeseer, 2006.
[46] S. Nguyen and M. Orlowska. Improvements of INCSPAN: Incremental mining of sequential patterns in large database. In Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 442–451, 2005.
[47] F. Pachet, G. Ramalho, and J. Carrive. Representing temporal musical objects and reasoning in the muses system. Journal of new music research, 25(3):252–275, 1996.
[48] P. Papapetrou, G. Kollios, S. Sclaroff and D. Gunopulos. Discovering frequent arrangements of temporal intervals. In Fifth IEEE International Conference on Data Mining (ICDM’05), pages 8–pp. IEEE, 2005.
[49] P. Papapetrou, G. Kollios, S. Sclaroff and D. Gunopulos. Mining frequent arrangements of temporal intervals. Knowledge and Information Systems, 21(2):133–171, 2009.
[50] S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. Proceedings of the 18th ACM International Confereince on Information and Knowledge Management, 1999.
[51] D. Patel, W. Hsu, and M. L. Lee. Mining relationships among interval-based events for classification. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 393–404. ACM, 2008.
[52] J. Pei, J.-W. Han, B. Mortazavi-asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th IEEE International Conference on Data Engineering, pages 215–224, 2001.
[53] N. Pissinou, I. Radev, and K. Makki. Spatio-temporal modeling in video and multimedia geographic information systems. GeoInformatica, 5(4):375–409, 2001.
[54] F. Plouraboue. Exact discovery of time series motifs. 2009.
[55] S. Salvador and P. Chan. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 11(5):561–580, 2007.
[56] I. Shafer, K. Ren, V. Boddeti, Y. Abe, G.-R. Ganger, and C. Faloutsos. RainMon: An integrated approach to mining bursty timeseries monitoring data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1158–1166, 2012.
[57] P. Sondhi, J.-M. Sun, H.-H. Tong, and C.-X. Zhai. SympGraph: A framework for mining clinical notes through symptom relation graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1167–1175, 2012.
[58] R.-B.-V. Subramanyam, A.-S. Rao, R. Karnati, S.-R. Suvvari, and D.-V.-N. Somayajulu. Mining closed sequential patterns in progressive databases. Journal of Information and Knowledge Management, 12(3):1–10, 2013.
[59] Y. Tabei, A. Kishimoto, M. Kotera, and Y. Yamanishi. Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 176–184, 2013.
[60] C. C. Tappert, C. Y. Suen, and T. Wakahara. The state of the art in online handwriting recognition. IEEE Transactions on pattern analysis and machine intelligence, 12(8):787– 808, 1990.
[61] J. Vial, H. No¸cairi, P. Sassiat, S. Mallipatu, G. Cognon, D. Thi´ebaut, B. Teillet, and D. N. Rutledge. Combination of dynamic time warping and multivariate analysis for the comparison of comprehensive two-dimensional gas chromatograms: application to plant extracts. Journal of Chromatography A, 1216(14):2866–2872, 2009.
[62] J.-Y. Wang and J.-W. Han. BIDE: Efficient mining of frequent closed sequences. In Proceedings of the 20th IEEE International Conference on Data Engineering, pages 79–90, 2004.
[63] K. Wang, Y. Xu, and J.-X. Yu. Scalable sequential pattern mining for biological sequences. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pages 178–187, 2004.
[64] J. T.-Y. Weng, S.-H. Wu, Y.-C. Chen, W.-C. Hsu, C.-S. Lee, and A. T.-A. Cheng. Efficient discovery of frequent approximate sequential patterns. In Epigenetic profiling of DNA methylation changes associated with chronic alcohol consumption: a 12-year follow up study, pages 471–476, 2012.
[65] E. Winarko and J. F. Roddick. Armada–an algorithm for discovering richer relative temporal association rules from interval-based data. Data & Knowledge Engineering, 63(1):76–90, 2007.
[66] S.-Y. Wu and Y.-L. Chen. Mining nonambiguous temporal patterns for interval-based events. IEEE transactions on knowledge and data engineering, 19(6):742–758, 2007.
[67] X.-T. Yang, A. Ghoting, Y. Ruan, and S. Parthasarathy. A framework for summarizing and analyzing twitter feeds. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 370–378, 2012.
[68] L. Yuan, Y.-L. Wang, P.-M. Thompson, V.-A. Narayan, and J.-P. Ye. Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1149–1157, 2012.
[69] W.-J. Zhou, H.-X. Jin, and Y. Liu. Community discovery and profi with social messages. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 388–396, 2012.
[70] F. Zhu, X. Yan, J. Han, and S. Y. Philip. Efficient discovery of frequent approximate sequential patterns. In Seventh IEEE International Conference on Data Mining (ICDM 2007), pages 751–756. IEEE, 2007.
校內:2022-03-13公開