簡易檢索 / 詳目顯示

研究生: 陳建翔
Chen, Chien-Hsiang
論文名稱: 利用叢集和基因演算法之時間序列切割
Segmentation of Time Series with Clustering and Genetic Algorithm
指導教授: 曾新穆
Tseng, Shin-Mu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 57
中文關鍵詞: 切割離散小波轉換基因演算法時間序列叢集
外文關鍵詞: segmentation, genetic algorithm, time series, clustering, discrete wavelet transform
相關次數: 點閱:89下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   時間序列資料指每隔一定時間將所觀察的目標記錄而成的資料。因此有許多現象都可以用時間序列的資料來表示,譬如在醫學上的心電圖、生物學上的基因表現、多媒體上的影音資料等。針對這些時間序列的資料,其中有異常偵測、象徵符號代表方法、相似度測量比對、維度化簡、切割等相關的研究。本論文對於切割這個研究方向,提出一種時間序列切割方法,透過結合叢集技術、離散小波轉換與基因演算法可自動切割出時間序列中所隱含的樣式。本方法首先將時間序列切割的位置編成染色體,其中兩個基因間所代表的即為一切割樣式(子序列)。在適合度評估,首先針對染色體所切割而形成的子序列進行k-means叢集技術,之後以歐幾里得距離評估染色體切割位置的好壞,在計算歐幾里得距離時,如果兩子序列長度不一致時,則以離散小波轉換進行調整,適當的後處理亦被用來過濾不好的子序列。此外,本論文所提的方法具有三個特色,第一個特色是自動切割出時間序列的樣式(利用叢集技術);第二個特色是可將具有放大縮小關係的子序列歸類在同一樣式(利用離散小波轉換);第三個特色是使用者可以知道所切出來的樣式處於時間序列的哪一個位置(根據染色體的切割結果)。為驗證本方法之可行性,我們進行了一系列的實驗,來表現本方法的效能。實驗結果顯示,本研究所提的方法,的確能夠達到很好的效果,並且將時間序列中的樣式正確的切割出來。

      Time series is composed of lots of data points, each of which represents a value at a certain time. Therefore, there are many phenomena can be represented by time series such as electrocardiograms in medical science, gene expressions in biology and video data in multimedia. There are many research fields of time series such as anomaly detection, symbol representation method, similarity measurement, dimension reduction and segmentation. This thesis thus proposes a time series segmentation approach by combining clustering technique, discrete wavelet transform and genetic algorithm to find patterns from time series, automatically. In fitness evaluation, the proposed algorithm first divides subsequences in a chromosome into k clusters by using k-means clustering approach. The Euclidean distance is then used to calculate the distance of each subsequence and evaluate the chromosome. The DWT is used to adjust the length of subsequence since the length of two subsequences is different. Appropriate post-processing is also performed to filter bad subsequences. The proposed method thus has three advantages. It can first find patterns from time series, automatically (by clustering technique) and the pattern that has enlarged relationship can be categorized into the same one (by DWT) secondly. At last, user can know the position of each pattern in time series (from chromosome representation). In order to verify the proposed method is workable, we made a series of experiments to show the performance of the algorithm. The experimental results show that the proposed method can achieve a better effect and find patterns correctly.

    英文摘要 III 中文摘要 V 誌謝 VII 目錄 VIII 表目錄 X 圖目錄 XI 第一章 簡介 1 1.1 研究背景 1 1.2 研究動機 2 1.3 問題描述 3 1.4 研究方法 4 1.5 論文架構 5 第二章 文獻探討 6 2.1 相似度量測方法 6 2.1.1 距離測量 6 2.1.2 相關係數 7 2.2 叢集方法 8 2.2.1 分割式叢集方法 8 2.2.1.1 k-Means之簡介 9 2.2.1.2 k-Medois之簡介 10 2.3 時間序列切割方法 11 2.3.1 動態程式規劃法 12 2.3.2 線性迴歸法 13 2.3.3 進化式演算法 15 第三章 所提之方法 21 3.1 CLAGAL架構 21 3.2 CLAGAL之演算法 22 3.2.1 虛擬碼 22 3.2.2 染色體表示法 24 3.2.3 評估函數 25 3.2.4 選擇運算子 28 3.2.5 基因運算子 29 3.2.5.1 交配運算子 29 3.2.5.2 突變運算子 30 3.2.6 後處理 31 3.3 範例 31 第四章 實驗分析 35 4.1 資料產生器 36 4.2 虛擬時間序列切割結果觀察 37 4.2.1 評估值探討 38 4.2.2 切割結果探討 39 4.3 參數值設定的影響 40 4.3.1 最多樣式壓縮層數 40 4.3.2 最小切割樣式長度 41 4.3.3 切割樣式數 42 4.3.4 交配比率 43 4.3.5 突變比率 44 4.4 其它不同的時間序列資料 45 4.4.1 振幅資料 45 4.4.2 雜訊資料 46 4.4.3 虛擬資料2 46 4.4.4 真實資料 48 4.5 實驗總結 49 第五章 結論與未來研究方向 51 5.1 結論 51 5.2 應用 52 5.3 未來研究方向 52 參考文獻 53 作者自述 57

    [1] R. J. Bayardo, “Efficiently Mining Long Patterns from Databases,” In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data, pages 85-93, Seattle, Washington, June 1998.
    [2] G.. F. Bryant, S. R. Duncan, “A Solution to the Segmentation Problem Based on Dynamic Programming,” Proc. 3rd IEEE Conf. Control Applications, 1994.
    [3] T. Back, U. Hammel and H. P. Schwefel, ”Evolutionary Computation Comments on the History and current state,” IEEE Trans, Evol. Comput., Vol. 1, pp. 3-17, Apr. 1997.
    [4] T. Back, U. Hammel and H. P. Schwefel, “Evolutionary Computation: an Overview,” in Proc. IEEE Int. Conf. Evolutionary Computation, 1996, 99. 20-29.
    [5] K. P. Chan and W. C. Fu, "Efficient Time Series Matching by Wavelets," Proc. 15th International Conference on Data Engineering (ICDE'99), p.126, 1999.
    [6] Fu-Lai Chung, Tak-Chung Fu, Robert W.P. Luck and, Vincent Ng, “Flexible Time Series Pattern Matching Based on Perceptually Important Points,” accepted by Workshop on Learning from Temporal Artificial Intelligence(IJCAI’01), Seattle, Washington, 4-10 August, 2001.
    [7] F. L. Chung, T. C. Fu, Robert W. P. Luck and Vincent Ng, ”Evolutionary Time Series Segmentation for Stock Data Mining,” Proc. IEEE Transactions on Evolutionary Computation, 2002.
    [8] F. L. Chung, T. C. Fu, Vincent Ng and Robert W. P. Luck, ”An Evolutionary Approach to Patterned-Based Time Series Segmentation.” Proc. IEEE Transactions on Evolutionary Computation, Vol. 8, Vol. 5, October 2004.
    [9] P. Cohen and N. Adams, ”Unsupervised Segmentation of Categorical Time Series into Episodes,” Proceeding of IEEE International Conference on Data Mining, 2002.
    [10] S. R. Duncan and G.. F. Bryant, “A new algorithm for segmenting data from time series,” Proc. 35th IEEE Conf. Decision Control, 1996.
    [11] C. S. Daw and C. E. A. Finney, “Symbolic Analysis of Experimental Data,” Review of Scientific Instruments, (2002-07-22).
    [12] M. Ester, H. P. Kriegel, J. Sander and X. Xu, ”A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226-231, Portland, Orgon, 1996.
    [13] C. L. Fancoua and J. C. Principe, “A Neighborhood Map of Competing One Step Predictors for Piecewise Segmentation and Identification of Time Series,” Proc. IEEE Int. Conf. on Neural Networks, Vol. 4, pp.1906-1911, 1996.
    [14] V. Guralnik and J. Srivastava, “Event Detection from Time Series Data,” in Proc. 5th ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pp. 33-42, 1999.
    [15] D. E. Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison Wesley, 1989.
    [16] J. H. Holland. Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.
    [17] J. Han, G. Dong and Y. Yin, “Efficient Mining of Partial Periodic Patterns in Time Series Database,” In Proc. 1999 Int. Conf. Data Engineering (ICDE'99), Sydney, Australia, April 1999.
    [18] J. Himberg, K. Korpiaho, H. Mannila and J. Tikanmaki, ”Time Series Segmentation for Context Recognition in Mobile Devices,” Proceeding of IEEE International Conference on Data Mining, 2001.
    [19] Y. W. Huang and P. S. Yu, “Adaptive Query Processing for Time-Series Data,” in Proceeding of the 5th Int’l Conference on Knowledge Discovery and Data Mining, pp. 282-286, San Diego, CA, Aug 15-18, 1999.
    [20] H. Andre-Jonsson and D. Badal, “Using Signature Files for Querying Time-Series Data,” in proceedings of Principles of Data Mining and Knowledge Discovery, 1st European Symposium, Trondheim, Norway, pp. 211-220, Jun 24-27, 1997.
    [21] Anil K. Jain and Richard C. Dubes, “Algorithms for Clustering Data,” Prentice Hall,1988.
    [22] E. Keough, S. Chu, D. Hart and M. Pazzani, “An Online Algorithm for Segmenting Time Series,” in Proc. IEEE Int. Conf. Data Mining, pp. 289-296, 2001.
    [23] E. Keough and S. Kasetty, “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” in proceedings of the 8th ACM SIGKDD International Confrence on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 102-111, 2002.
    [24] E. Keough, J. Lin and A. Fu, ”Hot Sax:Efficiently Finding the Most Unusual Time Series Subsequence,” In the 5th IEEE International Conference on Data Mining. New Orleans, LA. Nov 27-30, 2005.
    [25] L. Kaufman and P. J. Roussessuw, “Finding Groups in Data: an Introduction to Cluster Analysis,” John Wiley& Sons, 1990.
    [26] J. B. McQueen, “Some Methods of Classification and Analysis of Mutivariate Observations,” Proceedings of the 5th Berkeley Symposium on Mathematical Satistics and Probability, pages 281-297,1967.
    [27] Raymond T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proceedings of the 20th VLDB Conference, pages 144-155, Santiago, Chile, 1994.
    [28] J. J. Oliver, R. A. Baxter and C. S. Wallace, “Minimum Message Length Segmentation,” in Proc. Pacific-Asia Conf. Knowledge Discovery Data Mining, pp. 222-233, 1998.
    [29] Donald B. Percival and Andrew T. Walden, “Wavelet Methods for Time Series Analysis,” Published by Cambridge University Press, 24 Jul 2000.
    [30] G. Reinert, S. Schbath and M. S. Waterman, “Probabilistic and Statistical Properties of Words: An Overview,“ Journal of Computational Biology, Vol. 7, pp. 1-46, 2000.
    [31] V. Wertz and M. Verleysen, “Dimension Reduction of Technical Indicators for the Prediction of Financial Time Series,” European Journal of Economic and Social Systems, 2001.
    [32] X. Y. Wang and Z. O. Wang, “A Structure-Adaptive Piece-Wise Linear Segments Representation for Time Series,” Proceeding of IEEE International Conference on Information Reuse and Integration, 2004.

    下載圖示 校內:2007-08-14公開
    校外:2007-08-14公開
    QR CODE