| 研究生: |
陳宣旭 Chen, Hsuan-Hsu |
|---|---|
| 論文名稱: |
於多變數時間序列探勘重要等價類以達到早期分類 Early Classification of Multivariate Time Series by Mining Equivalence Classes with Significant Generators |
| 指導教授: |
曾新穆
Tseng, Shin-Mu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 65 |
| 中文關鍵詞: | 早期分類 、多變數時間序列 、可譯性 、等價類 |
| 外文關鍵詞: | early classification, multivariate time series, interpretable, equivalence class |
| 相關次數: | 點閱:131 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
於時間序列中早期分類已被廣泛應用於多樣領域。然而,可譯性之早期分類可幫助使用者從分類結果中得到深度見解。過去有許多分類器已被提出並可得到相當地提早效果與準確率,但幾乎沒有專注於多變數資料分析之研究。此外,使用者預期早期分類器應盡可能地提早類別分類並同時保證此結果具有高機率與往後時間點之分類結果相同。有鑑於此,於本論文當中,我們藉由提出名為SDT之新型分類器來探討如何於異質性多變數時間序列中得到具有可譯性之早期分類結果,並保證其準確率相當於利用完整資訊的時間序列執行分類之準確率。為防止於探勘過程中擷取過多的特徵,我們採用重要等價類探勘之概念以整合特徵挖掘與刪減於同一步驟以進一步加強探勘效率。於多樣的資料與實驗中證明了我們所提出的分類模型可達到比目前最好的方法更佳的效果,並保證其分類模型可於適當時間點作出可靠的早期分類且不失預期之準確率。
Early classification on time series has been extensively applied to several domains. With interpretability, early classification can help domain experts to gain deep insights from the classification results. Although some classifiers have been proposed to achieve good earliness and high accuracy in classification, few researches focused on multivariate data. In addition, an early classifier is expected to be serial, in other words, it should provide a class as early as possible while guaranteeing with high probability that the early class is the same with the classes that would be assigned at later time points. In this thesis, we address the problem of making early classification with interpretability and accuracy comparable to accuracies of full-length classifiers from multivariate time series by a novel classifier, namely SDT (Serial Decision Tree). To avoid generating a large number of features, the notion of equivalence class with significant generators is adopted to integrate the feature extraction and feature pruning into the same phase. Empirical evaluation on various datasets shows that our model outperforms the state-of-the-art method and has the characteristic of being serial.
[1] R. Agrawal, T. Imielinski, and A. N. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 207-216, 1993.
[2] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules in large databases,” in Proceedings of the 20th International Conference on Very Large Data Bases, pages 487-499, 1994.
[3] R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proceedings of the 11th IEEE International Conference on Data Engineering, pages 3-14, 1995.
[4] H. S. Anderson, N. Parrish, K. Tsukida, and M. R. Gupta, “Reliable early classification of time series,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 2073-2076, 2012.
[5] J. Ayres, J. E. Gehrke, T. Yiu, and J. Flannick, “Sequential pattern mining using a bitmap representation,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
[6] K. Bache and M. Lichman (2013), “UCI Machine Learning Repository [http://archive.ics.uci.edu/ml],” Irvine, CA: University of California, School of Information and Computer Science.
[7] S. E. Baranzini, P. Mousavi, J. Rio, S. J. Caillier, A. Stillman, et al., “Transcription-based prediction of response to IFNβ using supervised computational methods,” PLos Biology, 3(1):166-176, 2005.
[8] A. Bregon, M. A. Simon, J. J. Rodriguez, C. J. Alonso, B. P. Junquera, and I. Moro, “Early fault classification in dynamic systems using case-based reasoning,” in CAEPIA, pages 211-220, 2005.
[9] I. Costa, A. Schnhuth, C. Hafemeister, and A. Schliep, “Constrained mixture estimation for analysis and robust classification for clinical time series,” Bioinformatics, 20(12):i6-i14, 2009.
[10] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Querying and mining of time series data: experimental comparison of representations and distance measures,” in Proceedings of Very Large Data Base Endowment, Volume 1, Issue 2, pages 1542-1552, 2008.
[11] C. Gao and J. Wang, “Efficient itemset generator discovery over a stream sliding window,” in Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 355-364, 2009.
[12] M. F. Ghalwash and Z. Obradovic, “Early classification of multivariate temporal observations by extraction of interpretable shapelets,” BMC Bioinformatics, 13:195, 2012.
[13] M. F. Ghalwash, D. Ramljak, and Z. Obradovic, “Early classification of multivariate time series using a hybrid HMM/SVM model,” in Proceedings of IEEE International Conference on Bioinformatics and Biomedicine, pages 113-118, 2012.
[14] M. P. Griffin and J. R. Moorman, “Toward the early diagnosis of neonatal sepsis and sepsis-like illness using novel heart rate analysis,” PEDIATRICS, 107(1):97-104, 2001.
[15] P. Grunwald, I. J. Myung, and M. Pitt, “Advances in Minimum Description Length: Theory and Application,” MIT Press, 2005.
[16] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1-12, 2000.
[17] B. Hartmann and N. Link, “Gesture recognition with inertial sensors and optimized DTW prototypes,” in Proceedings of International Conference on Systems Man and Cybernetics, pages 2102-2109, 2010.
[18] M. W. Kadous, “Temporal classification: Extending the classification paradigm to multivariate time series,” PhD Thesis (draft), School of Computer Science and Engineering, University of New South Wales, 2002.
[19] E. Keogh and S. Kasetty, “On the need for time series data mining benchmarks: A survey and empirical demonstration,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 2002.
[20] C. Lee, J. C. Chen, and V. S. Tseng, “A novel data mining mechanism considering bio-signal and environmental data with applications on asthma monitoring,” Computer Methods and Programs in Biomedicine, 101(1):44-61, 2011.
[21] W. Li, H. Han, and J. Pei, “CMAR: Accurate and efficient classification based on multiple class-association rules,” in Proceedings of the IEEE International Conference on Data Mining, pages 369-376, 2001.
[22] J. Li, H. Li, L. Wang, J. Pei, and G. Dong, “Minimum description length principle: Generators are preferable to closed patterns,” in Proceedings of the Association for the Advancement of Artificial Intelligence, pages 409-414, 2006.
[23] J. Li, G. Liu, and L. Wong, “Mining statistically important equivalence classes and delta-discriminative emerging patterns,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 2007.
[24] T. Lin, N. Kaminski, and Z. Bar-Joseph, “Alignment and classification of time series gene expression in clinical studies,” Bioinformatics, 24(13):i147-i155, 2008.
[25] J. Lines, L. M. Davis, J. Hills, and A. Bagnall, “A shapelet transform for time series classification,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 289-297, 2012.
[26] B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 80-86, 1998.
[27] D. Lo, S. C. Khoo, and J. Li, “Mining and ranking generators of sequential patterns,” in Proceedings of the 8th SIAM International Conference on Data Mining, pages 553-564, 2008.
[28] A. McGovern, D. H. Rosendahl, R. A. Brown, and K. K. Droegemeier, “Identifying predictive multi-dimensional time series motifs: an application to server weather prediction,” Data Mining and Knowledge Discovery, Volume 22, pages 232-258, 2011
[29] A. Mueen, E. Keogh, and N. Young, “Logical-shapelets: An expressive primitive for time series classification,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1154-1162, 2011.
[30] R. T. Olszewski, “Generalized feature extraction for structural pattern recognition in time-series data,” PhD Thesis, School of Computer Science, Carnegie Mellon University Pittsburgh, PA, 2001.
[31] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering frequent closed itemsets for association rules,” in Proceedings of the 4th International Conference on Digital Telecommunications, 1999.
[32] J. Pei, J. Han, B. Mortazavi-Als, H. Pinto, Q. Chen, U. Dayal, and M. –C. Hsu, “PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth,” in Proceedings of the 17th IEEE International Conference on Data Engineering, pages 215-224, 2001.
[33] J. Rissanen, “Modelling by shortest data description,” Automatica, Volume 14, pages 465-471, 1978.
[34] J. J. Rodriguez and C. J. Alonso, “Boosting interval based literals: Variable length and early classification,” Intelligent Data Analysis, 5(3):245-262, 2001.
[35] S. Salzberg, “On comparing classifiers: Pitfalls to avoid and a recommended approach,” Data Mining and Knowledge Discovery, pages 317-328, 1997.
[36] J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the best strategies for mining frequent closed itemsets,” in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 236-245, 2003.
[37] J. Wang and J. Han, “BIDE: Efficient mining of frequent closed sequences,” in Proceedings of the 20th IEEE International Conference on Data Engineering, 2004.
[38] X. Xi, E. Keogh, C. Shelton, and L. Wei, “Fast time series classification using numerosity reduction,” in Proceedings of the 23th International Conference on Machine Learning, pages 1033-1040, 2006.
[39] Z. Xing, J. Pei, G. Dong, and P. S. Yu, “Mining sequence classifiers for early prediction,” in Proceedings of the 8th SIAM International Conference on Data Mining, pages 644-655, 2008.
[40] Z. Xing, J. Pei, and E. Keogh, “A brief survey on sequence classification,” ACM SIGKDD Explorations, Volume 12, Issue 1, pages 40-48, June 2010, ACM Press.
[41] Z. Xing, J. Pei, and P. S. Yu, “Early prediction on time series: A nearest neighbor approach,” in Proceedings of the 21th International Joint Conference on Artificial Intelligence, pages 1297-1302, 2009.
[42] Z. Xing, J. Pei, P. S. Yu, and K. Wang, “Extracting interpretable features for early classification on time series,” in Proceedings of the 11th SIAM International Conference on Data Mining, pages 439-451, 2011.
[43] L. Xu and K. Xie, “An incremental algorithm for mining generators representation,” in Proceedings of the 9th European Conference in Principles and Practice of Knowledge Discovery and Data Mining, pages 701-708, 2005.
[44] X. Yan, J. Han, and R. Afshar, “CloSpan: Mining closed sequential patterns in large datasets,” in Proceedings of the 3rd SIAM International Conference on Data Mining, 2003.
[45] L. Ye and E. Keogh, “Time Series shapelets: a new primitive for data mining,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 947-956, 2009.
[46] M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences,” Machine Learning, 43:31-60, 2001.
[47] M. J. Zaki and C. –J. Hsiao, “CHARM: An efficient algorithm for closed itemset mining,” in Proceedings of the 2nd SIAM International Conference on Data Mining, 2002.
[48] Central Weather Bureau, R. O. C [http://www.cwb.gov.tw/], 2005.
[49] Environmental Protection Administration Executive Yuan, R. O. C. [http://edb.epa.gov.tw/], 2005.
[50] Tainan Asthma Allergic Children Health Association, R. O. C. [http://140.116.58.191/asthma/index.php], 2005.
校內:2018-08-29公開