| 研究生: |
黃柏傑 Huang, Pai-Chieh |
|---|---|
| 論文名稱: |
以群集式基因演算法為基礎的時間序列切割法與型樣之發掘 A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery |
| 指導教授: |
曾新穆
Tseng, Shin-Mu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 68 |
| 中文關鍵詞: | 時間序列切割 、遺傳演算法 、離散小波轉換 、叢集 |
| 外文關鍵詞: | time series segmentation, genetic algorithm, clustering, discrete wavelet transformation |
| 相關次數: | 點閱:121 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
時間序列資料指每隔一定時間將所觀察的目標記錄而成的資料。時間序列之應用層面廣泛,如經濟中的股市交易日收盤價、醫學常用的各種生理訊號、多媒體的影音資料等。因此,時間序列之研究更顯的有趣而重要;針對時間序列資料,現有異常偵測、象徵符號代表方法、相似度測量比對、維度化簡、切割等相關研究。本論文對於切割這個研究方向,結合叢集技術、離散小波轉換與遺傳演算法得以將時間序列自動切割及型樣發掘,進行時間序列切割法的改良,並降低之前的方法在利用離散小波轉換所帶來失真之問題與減少發掘出較無意義型樣。本方法首先將時間序列切割的位置編成染色體,其中兩個基因間所代表的為一線段(segment)。在評估函數中,先對染色體所切割形成的線段利用k-means叢集技術進行分群,之後搭配離散小波轉換計算該切割結果之相似程度與計算該切割結果所形成之合適度(suitability)進行適合度的計算。其中染色體的合適度是用來降低單線段成為一種型樣之機會與離散小波轉換所造成失真問題。因此,本論文所提的方法不僅能有同時進行時間序列切割與自動搜尋型樣之優點,而能更進一步提升型樣之可靠性與減少壓縮失真所帶來的誤差。為了驗證本方法之可行性,我們使用台灣股市中的鋼鐵類股與半導體類股收盤價之時間序列資料進行實驗。實驗結果顯示,本論文所提之方法,能從時間序列中有效的自動切割出已定義之型樣與未被定義之型樣的線段。
A time series is composed of lots of data points, each of which represents a value at a certain time. Many phenomena can be represented by time series, such as electrocardiograms in medical science, gene expressions in biology and video data in multimedia. Time series have thus been an important and interesting research field due to their frequent appearance in different applications. It is related to many research topics, including anomaly detection, similarity measurement, dimension reduction and segmentation, among others. In this thesis, we proposed a time series segmentation approach by combining the clustering technique, the discrete wavelet transformation and the genetic algorithm to automatically find segments and patterns from a time series and reduce the raised problems in previous approach. The first one is that it may cause distortion of segments when using the discrete wavelet transformation (DWT) to adjust the length of the subsequences. The second one is that if a group contains only one segment then it may result in a less meaningful pattern. The proposed approach first divides the segments in a chromosome into k groups according to their slopes by using clustering techniques. In order to deal with these problems, two factors, namely the density factor and the distortion factor, are used to solve them. The distortion factor is used to avoid the distortion of the segments and the density factor is used to avoid generation of meaningless patterns. The fitness value of a chromosome is then evaluated by the distances of segments and these two factors. Experimental results on real financial datasets in Taiwan also show the effectiveness of the proposed approach.
[1] J. Aach and G. Church, “Aligning gene expression time series with time warping algorithms,” Bioinformatics, Vol. 17, pp 495-508, 2001.
[2] B. D. Amir, R. Shamir & Z. Yakhini,Clustering gene expression patterns, Journal of Computational Biology, 1999
[3] R. Bellman, "On the approximation of curves by line segments using dynamic programming," Communications of the ACM, Vol. 4, No. 6, pp. 284, 1961.
[4] J. R. Chen, “Making subsequence time series clustering meaningful,” The IEEE International Conference on Data Mining, pp. 114-121, 2005.
[5] P. Cohen and N. Adams, ”Unsupervised Segmentation of Categorical Time Series into Episodes,” Proceeding of IEEE International Conference on Data Mining, pp. 99-106, 2002.
[6] F. L. Chung, T. C. Fu, R. Luk and V. Ng, “Evolutionary time series segmentation for stock data mining,” The IEEE International Conference on Data Mining, pp. 83 - 90, 2002.
[7] F. L. Chung, T. C. Fu, V. Ng and R. W. P. Luk, “An evolutionary approach to pattern-based time series segmentation,” IEEE Transactions on Evolutionary Computation, Vol. 8, No. 5, pp. 471- 489, 2004.
[8] S.R. Duncan and G. F. Bryant,” A new algorithm for segmenting data from time series,” Proc. 35th IEEE Conf. Decsion Control, 1996.
[9] S. Erdal, O. Ozturk, D. Armbruster, H. Ferhatosmanoglu and W.C. Ray, “A time series analysis of microarray data,” The IEEE Symposium on Bioinformatics and Bioengineering, pp. 366-375, 2004.
[10] T. C. Fu, F. L. Chung, V. Ng and R. Luk, “Evolutionary segmentation of financial time series into subsequences,” The Congress on Evolutionary Computation, Vol. 1, pp. 426 - 430, 2001.
[11] L. Feng, K. Ju and K. H. Chon, "A method for segmentation of switching dynamic modes in time series," IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 35, No. 5, pp. 1058-1064, 2005.
[12] C. L. Fancoua and J. C. Principe, “A Neighborhood Map of Competing One Step Predictors for Piecewise Segmentation and Identification of Time Series,” IEEE Internal Conference on Neural Networks, Vol. 4, pp.1906-1911, 1996.
[13] V. Guralnik and J. Srivastava, "Event detection from time series data," Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 33-42, 1999.
[14] J. Han, G. Dong and Y. Yin, “Efficient Mining of Partial Periodic Patterns in Time Series Database,” The Internal Conference on Data Engineering, pp. 106-115, 1999.
[15] J. Himberg, K. Korpiaho, H. Mannila,J. Tikanmaki and H.T.T. Toivonen, "Time series segmentation for context recognition in mobile devices," The IEEE International Conference on Data Mining, pp. 203-210, 2001.
[16] Y. W. Huang and P. S. Yu, “Adaptive Query Processing for Time-Series Data,” in Proceeding of the 5th Int’l Conference on Knowledge Discovery and Data Mining, pp. 282-286, San Diego, CA, Aug 15-18, 1999.
[17] Jian Jiang, ZHE Zhang, and Huaiqing Wang, “A New Segmentation Algorithm to Stock Time Series based on PIP Approach,” Wireless Communications, Networking and Mobile Computing, 2007. WiCom 2007. International Conference
[18] Anil K. Jain and Richard C. Dubes, ”Algorithms for Clustering Data,” Prentice Hall, 1988.
[19] E. Keogh, “http://www.cs.ucr.edu/~eamonn/discords/”, 2005.
[20] E. Keogh, S. Chu, D. Hart and M. Pazzani, “An online algorithm for segmenting time series,” The IEEE Internal Conference Data Mining, pp. 289-296, 2001.
[21] E. Keogh, J. Lin and A. Fu, “Hot SAX: efficiently finding the most unusual time series subsequence,” The IEEE International Conference on Data Mining, pp. 27-30, 2005.
[22] E. Keogh, J. Lin and W. Truppel, “Clustering of time series subsequences is meaningless: implications for previous and future research,” The IEEE International Conference on Data Mining, pp. 115 – 122, 2003.
[23] A. Lendasse, J. Lee, E. de Bodt, V. Wertz and M. Verleysen, "Dimension reduction of technical indicators for the prediction of financial time series - Application to the BEL20 Market Index," European Journal of Economic and Social Systems, Vol. 15, No. 2, pp. 31-48, 2001.
[24] J. B. McQueen, “Some methods of classification and analysis of mutivariate observations,” The Symposium on Mathematical Satistics and Probability, 1967, pp. 281-297.
[25] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, 1994.
[26] J. J. Oliver, R. A. Baxter and C. S. Wallace, “Minimum Message Length Segmentation,” The. Pacific-Asia Conference Knowledge Discovery Data Mining, pp. 222-233, 1998.
[27] W. Penny and S. Roberts, “Dynamic models for nonstationary signal segmentation,” Computers and Biomedical Research, Vol. 32, No. 6, pp. 483-502, 1999.
[28] Donald B. Percival and Andrew T. Walden. “Wavelet Methods for Time Series Analysis,” Published by Cambridge University Press,24 Jul 2000.
[29] V. S. Tseng, C. H. Chen, C. H. Chen and T. P. Hong, “Segmentation of time series by the clustering and genetic algorithms,” The Workshop on Foundation of Data Mining and Novel Techniques in High Dimensional Structural and Unstructured Data in The Fifth IEEE International Conference on Data Mining, pp. 443-447, 2006.
[30] J. P. C. Valente and I. L. Chavarrias, “Discovering similar patterns in time series,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 497-505, 2000.
[31] X. Y. Wang and Z. O. Wang, “A Structure-Adaptive Piece-Wise Linear Segments Representation for Time Series,” Proceeding of IEEE International Conference on Information Reuse and Integration, pp. 433 - 437, 2004.
[32] http://tw.stock.yahoo.com/