研究生: |
陳延洛 Chen, Len-Lo |
---|---|
論文名稱: |
基因表現時間序列的叢集分析方法與系統實作 Clustering Time-Series Gene Expression: A New Method and Implementation |
指導教授: |
曾新穆
Tseng, Shin-Mu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2002 |
畢業學年度: | 90 |
語文別: | 中文 |
論文頁數: | 56 |
中文關鍵詞: | 基因微陣列 、相似度量測 、資料探勘 、叢集分析 、時間序列 、基因表現 |
外文關鍵詞: | microarray, similarity measure, data mining, gene expression, time-series, cluster analysis |
相關次數: | 點閱:101 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究提出一個適用於基因表現時間序列資料的叢集分析方法。雖然目前已有一些分析時間序列資料的方法,但它們無法適當地處理基因表現時間序列資料關於偏移量、程度變形、平移以及雜訊等問題。因此我們提出一個新的量測基因表現時間序列相似度的方法,稱之為GETSS。此方法可以解決兩個基因表現時間序列之間關於偏移量、程度變形、平移以及雜訊等問題,以找出兩個基因的相似表現反應部份。由實驗證明,我們的方法確實比一般的相關係數量測方法,更加能指出兩個基因之間的相關性。
在本篇論文裡,我們將GETSS與CAST、K-Medois以及HAC等現存的叢集方法結合,設計並實作一個系統,用來對基因表現時間序列資料進行叢集分析,並提供圖形化介面來呈現叢集分析的結果。透過這個系統,可以讓生物學家更方便且迅速地分析基因表現時間序列資料。
This research presents a new clustering analysis approach that is suitable for analyzing gene expression time-series data. Although some methods have been proposed for dealing with time-series data, they can not handle well the problems of offset, scaling, shift, and noise in gene expression time-series data. Therefore, we propose a new similarity measure named GETSS that can solve offset, scaling, shift, and noise problems in finding similar time-series expression patterns. Through experiments, our approach can reveal the correlation between two gene expression time series more correctly than other measures.
Based on the proposed similarity measuring approach, we also design and implement a system for clustering gene expression time-series data. In this system, the similarity measure GETSS was integrated with representative clustering methods like CAST, K-Medois and HAC. Hence, the biologists can analyze time-series gene expression in a more effective way.
參考文獻
[1] Agrawal, R., Lin, K. I., Sawhney, H. S., and Shim, K., "Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases." In Proc. the 21st Int'l Conf. on Very Large Data Bases, Zurich, Switzerland, pp. 490-501, Sept. 1995.
[2] V Filkov, S Skiena, J Zhi (2001), "Analysis techniques for microarray time-series data", in RECOMB 2001: Proceedings of the Fifth Annual International Conference on Computational Biology, Montreal, Canada, pp. 124-131.
[3] Cho R.J., Campbell M.J., Winzeler E.A., Steinmetz L., Conway A., Wodicka L, Wolfsberg T.G., Gabrielian A.E., Landsman D., Lockhart D., and Davis R.W. “A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle.” Molecular Cell, Vol.2, 65-73, July 1998.
[4] Spellman, PT, Sherlock, G, Zhang, MQ, Iyer, VR, Anders, K, Eisen, MB, Brown, PO, Botstein, D, and Futcher, B. “Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization.” Mol Biol Cell. 9:3273-3297, 1998.
[5] Berndt, D. J., Clifford, J., “Using Dynamic Time Warping to Find Patterns in Time Series.” In KDD-94: AAAI Workshop on Knowledge Discovery in Databases. Pages 359-370, Seattle, Washington, July 1994.
[6] Keogh, Eamonn J. and Pazzani, Michael J. 2001, “Derivative Dynamic Time Warping.” In First SIAM International Conference on Data Mining (SDM'2001), April 5-7, Chicago, IL, USA.
[7] B. Bollobas, Gautam Das, Dimitrios Gunopulos, and H. Mannila., “Time-Series Similarity Problems and Well-Separated Geometric Sets.” In Proceedings of the Association for Computing Machinery Thirteenth Annual Symposium on Computational Geometry, pages 454--476, 1997.
[8] Ewing, B. and P. Green (2000), "Analysis of expressed sequence tags indicates 35,000 human genes". Nature Genetics 25, 232-234, 2000
[9] Brazma, A., and Vilo, J. (2000), “Gene expression data analysis.” FEBS Letters, 480, 17-24. BIOKDD01: Workshop on Data Mining in Bioinformatics (with SIGKDD01, Conference) page 29
[10] Ben-Dor, A. and Z. Yakhini (1999, March). “Clustering gene expression patterns.” In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology, Lyon, France, pages. 33--42
[11] P. Tamayo, D. Slonim, J. Mesirou, Q. Zhu, S. Kitareewan, E. Dmitrovsky, ES. Lander, TR. Golub “Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation.” Proc Natl Acad Sci USA 96:2907, 1999.
[12] Vincent S. M. Tseng, Ching-Pin Kao. “Efficiently Mining Gene Expression Data via Integrated Clustering and Validation Techniques.” Proceedings of the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2002, pages 432-437, Taipei, Taiwan, May 2002.
[13] Eisen, M., Spellman, P. T., Botstein, D., and Brown, P. O. (1998), “Cluster analysis and display of genome-wide expression patterns.” Proceedings of National Academy of Science USA 95:14863—14867
[14] Goldin, D. & Kanellakis, P. (1995) “On similarity queries for time-series data: constraint specification and implementation.” In proceedings of the 1st International Conference on the Principles and Practice of Constraint Programming. Cassis, France, Sept 19-22. pp 137-153.
[15] Donald J. Berndt and James Clifford. “Using Dynamic Time Warping to Find Patterns in Time Series.” In Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases. Pages 359-370, Seattle, Washington, July 1994.
[16] Kruskall, J. B. & Liberman, M. (1983). “The symmetric time warping algorithm: From continuous to discrete.” In Time warps, String Edits and Macromolecules: The Theory and Practice of String Comparison. Addison-Wesley.
[17] Myers, C., Rabiner, L. & Roseneberg, A. (1980). “performance tradeoffs in dynamic time warping algorithms for isolated word recognition.” IEEE Trans. Acoustics, Speech, and Signal Proc., Vol. ASSP-28, 623-635.
[18] Tolga Bozkaya, Nasser Yazdani, and Meral Ozsoyoglu. “Matching and Indexing Sequences of Different Lengths.” In Proceedings of the Association for Computing Machinery Sixth International Conference on Information and Knowledge Management, pages 128--135, Las Vegas, NV, USA, November 1997. ACM.
[19] E. L. Lehmann. “Nonparametrics: Statistical Methods Based on Ranks.” Holden and Day, San Francisco, 1975.
[20] S Raychaudhuri, P D Sutphin, J T Chang, R B Altman (2001), "Basic microarray analysis: Grouping and feature reduction", Trends in Biotechnology, 19(5):189-193.
[21] Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J., and Church, G. M. (1999). “Systematic determination of genetic network architecture.” Nature Genetics, 22(3):281-- 285.
[22] E. M. Voorhees, “Implementing agglomerative hierarchical clustering algorithms for use in document retrieval.” Information Processing & Management, 22:465-476, 1986.
[23] J.B. McQueen, “Some Methods of Classification and Analysis of Multivariate Observations.” Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297, 1976.
[24] L. Kaufman and P.J. Rousseeuw, “Finding groups in data: an Introduction to cluster analysis.” John Wiley & Sons, 1990.
[25] Aach, J. and Church, G. (2001). “Aligning gene expression time series with time warping algorithms.” Bioinformatics. Volume 17, pp 495-508.
[26] Alexander V. Lukashin and Rainer Fuchs. "Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters" Bioinformatics 17: 405-414., 2001.
[27] Z. Bar-Joseph, G. Gerber, D. Gifford, and T. Jaakkola. “A new approach to analyzing gene expression time series data.” In the Sixth Annual International Conference on Research in Computational Molecular Biology, 2002
[28] Mark S. Aldenderfer and Roger K. Blashfield, “Cluster Analysis.” Sage Publications, Inc., 1984
[29] M. Schena, D. Shalon, R. W. Davis and P. O. Brown, (1995) “Quantitative monitoring of gene expression patterns with a complementary DNA microarray.” Science 270:467-470
[30] DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, Trent JM. (1996) “Use of a cDNA microarray to analyze gene expression patterns in human cancer.” Nature Genetics 14(4):457-60
[31] DeRisi, J.L., Iyer, V. and Brown, P.O. (1997) “Exploring the metabolic and genetic control of gene expression on a genomic scale.” Science 278: 680-686.
[32] 高慶斌,“應用於基因表現探勘之高效率叢集方法及其效能評估”,國立成功大學資訊工程研究所,碩士論文,民國九十年六月
[33] 陳健慰,“二十一世紀基因分析的利器:基因微陣列之簡介及其應用” NTU BioMed Bulletin, No2, 2000