簡易檢索 / 詳目顯示

研究生: 何杏華
Ho, Shing-Hua
論文名稱: 適應性聚類演算法應用於生物資訊
Adaptive Clustering Algorithms and Their Applications on Bioinformatics
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 100
中文關鍵詞: 聚類演算法生物資訊
外文關鍵詞: Bioinformatics, Clustering Algorithm
相關次數: 點閱:78下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文針對各種生物資訊領域的應用分別提出四種不同的適應性聚類演算法。第一種是依據粗略集合的特性所衍生之特徵選取法(Rough-based Feature Selection)。此方法不需事先設定聚類類別的個數並可以偵測出接近實際中心的位置。本研究將其結合徑向基函數網路 (Radial Basis Function Neural Network, RBFNN) 並應用於癌症分類 (Cancer Classification)。第二個方法發展出一套相似基因發掘系統 (Similar Genes Discovery System, SGDS),本研究應用Entrez Gene資料庫及基因本體論 (Gene Ontology) 上對基因功能的註解進行語意相似度的量測,進而群聚出功能相近或相似的基因。此外,更將其應用在已知有生物反應路徑的基因上,企圖預測出可能存在的其他生物反應路徑。第三個方法是以支援向量機(Support Vector Machine)為基礎結合相關係數特徵選取(Correlation Coefficient)所形成的癌症診斷系統。最後一個方法是應用在預測微陣列 (Microarray) 與表達序列標籤 (Expressed Sequence Tags, ESTs) 之關係的系統上。所提出的方法是依據網格為基礎的聚類演算法 (Grid-based Clustering),利用網格空間來過濾不考靠的資料以提高預測的可靠性。本論文根據各演算法的應用,設計不同的實驗來驗證所提出方法及系統的效能。

    The dissertation presents four adaptive clustering algorithms to solve various problems of bioinformatics applications. The first one is rough-based feature selection algorithm. It can find the relevant features without requiring the number of clusters to be known a priori and identify the centers which approximate to the correct ones. In this study, we propose a combination method of rough-based feature selection and RBF neural network for the classification of gene expression data. It has demonstrated that the approach is able to reduce the number of genes selected and to increase the classification accuracy rate. The second is Gene Ontology (GO) semantic similarity clustering. We implement a Similar Genes Discovery System (SGDS), based upon semantic similarity measure of GO and Entrez Gene, to identify groups of similar genes. It is our belief that expanding this concept to the well-known pathway, SGDS can discovery some candidate genes with interactions described in the literatures. The third is a diagnosis model which combines the correlation coefficient feature selection algorithm with support vector machines (SVM). A correlation function is employed as the criterion to measure the dependency between features and it can find the relevant features according to the orthogonal method by maximizing the overall dependency. The last algorithm, designed as the grid-based clustering algorithm, is proposed to detect outliers from pairs of gene expression data and expressed sequence tags (ESTs). We implement an intelligent system according to the radial basis function neural network (RBFNN) to learn their nonlinear relationship and predict the ESTs by input cDNA microarray data. The performances of the proposed algorithms are carefully verified by conducting various experiments with different applications.

    Table of Contents 中文摘要……………………………………………………………………………....I ABSTRACT………………………………………………………………………….II ACKNOWLEDGE……………………………………………………………….III TABLE OF CONTENTS…………………………………………………….……. IV LIST OF FIGURES………………………………………………………………..VII LIST OF TABLES………………………………………………………………......IX CHAPTER 1. INTRODUCTION……………………………………………..…….1 1.1 Motivation…………………………………………………………...………..1 1.2 Objectives……………………………………………………………………..2 1.3 Adaptive Clustering Algorithms………………………………………………3 1.3.1 Rough-Based Clustering Algorithm………..……………………………4 1.3.2 Semantic-Based Clustering Algorithm……..……………………………4 1.3.3 Correlation-Coefficient-Based Clustering Algorithm……...……………5 1.3.4 Grid-Based Clustering Algorithm…………………………..……………5 1.4 Organization of Dissertation………………………………………………….6 CHAPTER 2. AN OVERVIEW OF RELATED WORK..………………..………..7 2.1 Clustering Techniques………………………………………………………...7 2.2 Bioinformatics Applications……………………………….………………….8 2.2.1 Gene Expression Data…………………………………………………...8 2.2.2 Feature Selection Methods………………………………………............9 2.2.3 Gene Ontology…………………………………………………....…….10 2.2.4 Semantic Similarity Measure………………………..…………………11 2.2.5 Expressed Sequence Tags…………………..……………………….….12 CHAPTER 3. FEATURE SELECTION BY ROUGH-BASED CLUSTERING..15 3.1 Rough-Based Feature Selection Method…………………………………….15 3.1.1 The Concept of Rough Properties……………………………………...15 3.1.2 The Feature Selection Algorithm…………………………………….....16 3.1.3 Cluster Validation Method…………………………………………...…18 3.1.4 An Example of Synthetic Dataset………………………………………19 3.1.5 The effect of …………………………………………………………..21 3.2 Interpretation of RBF Neural Network……………………………………...22 3.3 Accuracy Rate Evaluation and Comparison…………………………………25 3.3.1 k-fold Cross Validation…………………………………………………25 3.3.2 Feature Selection and Dimensionality Reduction……………………...25 3.3.3 Classifiers………………………………………………………………27 3.4 Experimental Results and Analysis………………………………………….28 3.4.1 Experiment on forest cover dataset……………………………….……28 3.4.2 Comparison with Various Classifiers…………………………….…….31 3.4.3 Comparison with Various Feature Selection Methods…………..……..33 3.4.4 Comparison with More Microarray Datasets…………………………..35 3.5 Summary……………………………………………………………….……37 CHAPTER 4. SIMILAR GENES DISCOVERY BY GO SEMANTIC SIMILARITY CLUSTERING……………………………………………………..39 4.1 Similar Genes Discovery System (SGDS)…………………………….…….39 4.1.1 Data Preprocessing module………………………………………...…..41 4.1.2. Semantic similarity measure module……………………………….….42 4.1.3 Gene quantification module……………………………………........…44 4.2 Experimental results…………………………………………………………45 4.2.1 Experiments of gene expression profiles versus semantic similarities.45 4.2.2 Experiments for system parameters tuning…………………………….46 4.2.3 Experiments of the RON and Lutheran pathways……………………...48 4.3 Summary…………………………………………………………………….52 CHAPTER 5. FEATURE SELECTION BY CORRELATION COEFFICIENT CLUSTERING……………………………………………………………………...54 5.1 Correlation coefficient feature selection algorithm……………………….…55 5.1.1 The main concepts……………………………………………………...55 5.1.2 The proposed algorithm………………………………………………...56 5.2 SVM-based diagnosis model……………………………………………...…57 5.2.1 Basic concept of SVM classifier……………………………………….57 5.2.2 The diagnosis model……………………………………………………59 5.3 Experimental results…………………………………………………………60 5.4 Summary…………………………………………………………………….63 CHAPTER 6. OUTLIER DETECTION BY GRID-BASED CLUSTERING…..64 6.1 The Proposed System………………………………………………………..64 6.1.1 Data Preprocessing………………………………………………..……65 6.1.2 Normalization…………………………………………………….…….67 6.1.3 Outlier removal…………………………………………………………68 6.1.4 Pseudo-sample generation……………………………………………...70 6.1.5 RBF-center determination……………………………………………...72 6.3 System evaluation…………………………………………………………...74 6.4 Experimental Results and analysis…………………………………………..75 6.4.1 Comparison of various feature combinations…………………………..75 6.4.2 Comparison of our proposed system with nonlinear regression……….76 6.4.3 Comparison of statistical method in outlier detection………………….79 6.5 Summary…………………………………………………………………….80 CHAPTER 7. FURTHER WORK…………………………………………………82 REFERENCES…………………………………………………………………..….85 VITA………………………………………………………………………………....98

    [1] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, “Advances in knowledge discovery and data mining,” AAAI/MIT Press, 1996.
    [2] A. Gersho and R.M. Gray, “Vector quantization and signal compression,” Bosten: Kluwer Academic, 1992.
    [3] R.O. Duda and P.E. Hart, “Pattern classification and scene analysis,” New York: John Wiley & Sons, 1973.
    [4] D. Gusfield, “Introduction to the IEEE/ACM transactions on computational and biology and bioinformatics,” IEEE/ACM Transactions on Computational and Biology and Bioinformatics, 1(1), 2-3, 2004.
    [5] N.M. Luscombe, D. Greenbaum and M. Gerstein, “What is bioinformatics? An introduction and overview,” Yearbook of Medical Informatics, pp. 83-99, 2001.
    [6] D. Jiang, C. Tang and A. Zhang, “Cluster anaylsis for gene expression data: A survey,” IEEE Trans. Knowledge and Data Engineering, Vol. 16, No. 11, 2004, pp. 1370-1386.
    [7] M. B. eisen, P. T. Spellman, P.O. Brown and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” Proc. Natl Acad. Sci. USA, 95, 1998, pp. 14863-14868.
    [8] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander and T. R. Golub, “Interpreting patterns of gene expression with self-organizing map: methods and application to hematopoietic differentiation,” Proc. Natl Acad. Sci. USA, 96, 1999, pp. 2907-2912.
    [9] S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho and G. M. Church, ”Systematic determination of genetic network architecture,” Nature Genet., 22, 1999, pp. 281-285.
    [10] E. Hartuv, A. Schmitt, J. Lange, S. Meirer-Ewert, H. Lehrach and R. Shamir, ”An algorithm for clustering cDNAs for gene expression analysis,” In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology. Lyon, France, 1999, pp. 188-197.
    [11] M.P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. S. Furey, M. Ares and D. Haussler, “Knowledge-based analysis of microarray gene expression data using support vector machine,” Proc.Natl Acad. Sci. USA, 97, 2000, pp. 262-267.
    [12] H.J. Zimmermann, Fuzzy Set Theory and its Applications, fourth ed., Kluwer Academic Publishers, Boston, 2001.
    [13] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice-Hall, Upper Saddle River, NJ, 1998.
    [14] Z. Pawlak, Rough Sets: Int. J. Inform. Comput. Sci. 11, 1982, pp.145-172.
    [15] S. Mitra, T. Acharya, Data Mining: Multimedia, Soft Computing, and Bioinformatics, John Wiley, New York, 2003
    [16] P. Lingras, C. West, “Interval set clustering of web users with rough k-means,” J. Intell. Inform. Syst. 23, 2004, pp. 5–16.
    [17] H.A. do Prado, P.M. Engel and H.C. Filho, “Rough clustering: an alternative to find meaningful clusters by using the reducts from a dataset,” Rough Sets and Current Trends in Computing: Third International Conference, RSCTC 2002, Lecture Notes in Computer Science, vol. 2475, Springer, Berlin, 2002, pp. 234–238.
    [18] K.E. Voges, N.K. Pope and M.R. Brown, “A rough cluster analysis of shopping orientation data,” Proceedings of the Australian and New Zealand arketing Academy Conference, Adelaide, 2003, pp. 1625–1631
    [19] G. Peters, “Some refinements of rough k-means clustering,” Pattern Recognition 39, 2006, pp. 1481-1491.
    [20] V. Capoyleas, G. Rote and G. Woegineer, “Geonetric clusterings,” J. Algorithms, vol. 12, pp. 341-356, 1991.
    [21] A.K. Jain and R.C. Dubes, Algorithms for clustering data, Englewood Cliffs, N.J.: Prentice Hall, 1988.
    [22] A.K. Jain, M. N. murty and P.J. Flynn, “Data clustering: A review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
    [23] A.K. Jain, P.W. Duin and J. Mao, “Statistical pattern recognition: A review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
    [24] M. Brown, W. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. Furey, M. Ares Jr, and D. Haussler, “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of National Academy of Science, 97 (1), 2000, pp. 262-267.
    [25] M. Kuramochi and G. Karypis, “Gene classification using gene expression profiles: a feasibility study,” International Journal on Artificial Intelligence Tools, 14(4), 2005, pp. 641-660.
    [26] M.Eisen, P. Spellman, P. Brown, and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” proceedings of National Academy of Science, 95, 1998, pp. 14863-14868.
    [27] A. Blum and P. Langley, “Selection of relevant features and examples in machine learning,” Artificial Intelligence, vol. 97, no, 1-2, 1997, pp.245-271.
    [28] J. khan, J.S. Wei, M. Ringner, L.H. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson and P.S. Meltzer, ”Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature Medicine 2001, 7:673-679.
    [29] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran et al, “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature 2000, 403:503-511.
    [30] M.C. O’Neill and L. Song, “Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect,” BMC Bioinformatics 2003, 4:13.
    [31] P. Mitra, C.A. murthy, and S.K. Pal, “Unsupervised feature selection using feature similarity,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, Mar. 2002, pp. 301-312.
    [32] R. Kohavi and G. H. John, “Wrapper for feature subset selection” Artificial Intelligence,vol. 97, no, 1-2, 1997, pp.273-324, 1997.
    [33] P. Langley, “Selection of relevant features in machine learning,” AAAI Fall Symposium on Relevance, 1994.
    [34] C. Ding, “Analysis of gene expression profiles: class discovery and leaf ordering,” in Proc. RECOMB 2002, pp. 127–136.
    [35] L. Guh, Q. Song, and N. Kasabov, “A novel feature selection method to improve classification of gene expression data,” in Proc. 2nd Asia- Pacific Bioinformatics Conf., 2004, pp. 161–166.
    [36] D. Koller and M. Sahami, “Toward optimal feature selection,” in Proc Int. Conf. Machine Learning, 1996, pp. 284–292.
    [37] E. Kretschmann, W. Fleischmann and R. Apweiler, “Automatic Rule Generation for Protein Annotation with the C4.5 Data Mining Algorithm Applied on SWISS-PROT,” Bioinformatics, vol. 17, no. 10, 2001, pp. 920-926.
    [38] A. Budanitsky and G. Hirst, “Semantic Distances in WordNet: An Experimental, Application-Oriented Evalution of Five Measures,” Proc. Workshop WordNet and Other Lexical Resources, Second Meeting North Am. Chapter Assoc. for Computational Linguistics, June 2001.
    [39] F.M. Couto, M.J. Silva and P. Coutinho, “Implementation of Functional Semantic Similarity Measure between Gene-Products,” FCUL Technical Report DI/FCUL TR 3-29, November 2003.
    [40] Y. Li, Z.A. Bandar and D. McLean, “An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources,” IEEE Transaction on Knowledge and Data Engineering, vol. 15, no. 4, July/August 2003.
    [41] M. McHale, “A Comparison of WordNet and Roget’s Taxonomy for Measuring Semantic Similarity,” Proc. COLING/ACL Workshop Usage of WordNet in Natural Language Processing Systems, 1998.
    [42] P.W. Lord, R.D. Stevens, A. Brass and C.A. Goble, “Investigating Semantic Similarity Measures across the Gene Ontology: the Relationship between Sequence and Annotation,” Bioinformatics, Vol. 19, no. 10, 2003, pp.1275-1283.
    [43] J.W. Zhong, H.P. Zhu, J.M. Li and Y. Yu, “Conceptual graph matching for semantic search,” In Conceptual Structures: Integration and Interfaces, Springer Verlag, London. 2002: 92-106.
    [44] R. Rada, H. Mili, E. Bichnell and M. Blettner, “Development and application of a metric on semantic nets,” IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 1, 1989, pp. 17-30, Jan.
    [45] MeSH web site: http://www.nlm.nih.gov/mesh/meshhome.html
    [46] P. Resnik, “Using information content to evaluate semantic similarity in a taxonomy,” Proc. 14th Int’l Joint Conf. Artificial Intelligenc., 1995.
    [47] J.J. Jiang and D.W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” Proc. Int’l Conf. Research in Computational Linguistics, ROCLING X. 1997.
    [48] D. Lin, “An information-theoretic definition of similarity,” Proc. 15th Int’l Conf. Machine Learning, pp. 296-304, 1998.
    [49] S.F. Altschul, T.L. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res, 25(17), 1997, pp. 389-402.
    [50] O.D. King, J.C. Lee, A.M. Dudley, D.M. Janse, G.M. Church and F.P. Roth, “Predicting phenotype from patterns of annotation. Bioinformatics, vol. 19, pp. 183-189, 2003.
    [51] T. Hvidsten, A. Lagreid, and J. Komorowski, “Learning rule-based models of biological process from gene expression time profiles using Gene Ontology,” Bioinformatics, vol. 19, 2003, pp. 1116-1123.
    [52] J. Chang, S. Raychaudhuri and R. Altman “Including biological literature improves homology search,” Pacific Symposium on Biocomputing, 6, pp. 374-383, 2001.
    [53] R.M. MacCallum, L.A. Kelly and M.J. Sternberg, “SAWTED: structure assignment with text description-enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. Bioinformatics, 16 (2), pp. 125-129, 2000.
    [54] F. Al-Shahrour, R. Diaz-Uriarte and J. Dopazo, “Fatigo: a web tool for finding significant associations of Gene Ontology terms with groups of genes,” Bioinformatics, vol. 20, pp. 578-580, 2004.
    [55] M.S. Boguski, “The turning point in genome research,” Trends Biochem Sci 20, 1995, pp. 295-296.
    [56] J.M. Claverie, “Exploring the vast territory of uncharted ESTs. In Genomes, Molecular Biology and Drug Discovery. Academic Press, Long, 1995, pp. 55-71.
    [57] M.D. Adams, J.M. Kelley, J.D. Gocayne, M. Dubnick, M.H. Polymeropoulos, H. Xiao, C.R. Merril, A. Wu, B. Olde, R.F. Moreno, A.R. Kerlavage, W.R. McCombie and J.C. Venter, “Complementary DNA sequencing: expressed sequence tags and the human genome project,” Science 252, 1991, pp. 1651-1656.
    [58] M.S. Boguski, T.M. Lowe, C.M. Tolstoshec, “dbEST-database for expressed sequence tags,” Nat Genet, 4, 1993, pp. 332-333.
    [59] R. Medzhitov, P. Preston-Hurlburt and C.A. Janeway, “A human homologue of the Drosophila Toll protein signals activation of adaptive immunity,” Nature, 388, 1997, pp. 394-397.
    [60] F. Liang, I. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg and J. Quackenbush, “Gene index analysis of the human genome estimates approximately 120,000 genes,” Nature Genet., 25, 2000, pp. 239-240.
    [61] M.A. Marra, L. Hillier and R.H. Waterston, “Expressed sequence tags-ESTablishing bridges between genomes,” Trends Genet., 14, 1998, pp. 4-7.
    [62] K. Garg, P. Green and D.A. Nickerson, “Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled expressed sequence tags,” Genome Res., 9, 1999, pp. 1087–1092.
    [63] D. Brett, J. Hanke, G. Lehmann, S. Haase, S. Delbruck, S. Krueger, J. Reich and P. Bork, “EST comparison indicates 38% of human mRNAs contain possible alternative splice forms,” FEBS Lett., 474, 2000, pp. 83–86.
    [64] Z. Wei, D.S. Shannon and B. Volker, “Refined Annotation of the Arabidobsis Genome by Complete Expressed Sequence Tag Mapping,” Plant Physiology, 132, 2003, pp. 469-484.
    [65] G.D. Schuler, M.S. Boguski, E.A. Stewart, L.D. Stein, G. Gyapay, K. Rice, R.E. White, P. Rodriguez-Tome, A. Aggarwal and E. Bajorek, “A gene map of the human genome,” Science, 274, 1996, pp. 540–546.
    [66] P. Deloukas, G.D. Schuler, G. Gyapay, E.M. Beasley, C. Soderlund, P. Rodriguez-Tome, L. Hui, T.C. Matise, K.B. McKusick and J.S. Beckmann, “A physical map of 30,000 human genes,” Science, 282, 1998, pp. 744–746.
    [67] M. Kerr and G.A. Churchill, “Analysis of variance for gene expression microarray data,” J. Compu. Bio. 7, 2000, pp. 819-837.
    [68] S. Dudoit, Y.H. Yang, T.P. Speed and M.J. Callow, “Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments,” Stat. Sin. 12(1), 2002, pp. 111-139.
    [69] Y.H. Yang, S. Dudoit, D.M. Luu, V. Peng, J. Ngai and T.P. Speed, “Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation,” Nucleic Acids Research, 2002, 30: e15.
    [70] E.J. Hartman, J.D. Keeler and J.M. Kawalski, “Layered neural network with Gaussian hidden units as universal approximator,” IEEE Trans. Neural Networks, 35(2), 1990, pp. 210-215.
    [71] S. Lee and R.M. Kil, “A Gaussian potential function network with hierarchically self-organizing learning,” IEEE Trans. Neural Networks, 4, 1991, pp. 207-224.
    [72] T. Chen and H. Chen, “Approximation capability to functions of several variables nonlinear functionals and operators by radial basis function neural networks,” IEEE Trans. Neural Networks, 6(4), 1995, pp. 904-910.
    [73] J. W. Park, R. G. Harley, and G. K. Venayagamoorthy, “Indirect adaptive control for synchronous generator: comparison of MLP/RBF neural networks approach with lyapunov stability analysis,” IEEE trans on neural network, vol 15, no. 2, pp. 460-464, march 2004.
    [74] Z. Pawlak, Rough Sets: Int. J. Inform. Comput. Sci. 11, 1982, pp.145-172.
    [75] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Boston, 1992.
    [76] P. Lingras, C. West, “Interval set clustering of web users with rough k-means,” J. Intell. Inform. Syst. 23, 2004, pp. 5–16.
    [77] Y.Y. Yao, X. Li, T.Y. Lin, Q. Liu, “Representation and classification of rough set models,” Proceedings of the Third International Workshop on Rough Sets and Soft Computing, San Jose, CA, 1994, pp. 630–637.
    [78] A. K. Jain and R. C. Dubes, “Algorithms for clustering data,” Prentice Hall, Englewood Cliffs.
    [79] J. Vesanto and E. Alhoniemi, “Clustering of the self-organizing map,” IEEE Transactions on Neural Networks 11 (3), 2000, pp. 586-600.
    [80] D.L. Davies, and D.W. Bouldin, “A cluster separation measure,” IEEE Trans. Pattern Anal. Mach. Intell. 1, 1979, pp. 224-227.
    [81] U. Alon, N. Barkai, D.A. Notterman, et al. “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays,” PNAS, 96(12), 1999, pp.6745-6750.
    [82] F. Poggio, “Regularization theory, radial basis functions and networks” In: From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series, No.136, pp. 83 – 104, 1994.
    [83] S. Haykin, Neural Networks: A comprehensive Foundation, Prentice-Hall International, Inc. 1999, ISBN 0-13-908385-5.
    [84] M. A. Abido and Y. L. Abdel-Magid, “On-line identification of synchronous machines using radial basis function neural networks,” IEEE Trans. on Power Systems, Vol.12, No.4, pp. 1500-1506, November 1997.
    [85] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
    [86] S. Raychaudhuri, J. M. Stuart, and R. B. Altman, “Principal components analysis to summarize microarray experiments: application to sporulation time series,” Pacific Symposium on Biocomputing, 5, 2000, pp. 452-463.
    [87] P. Domingos and M. Pazzani, “feature selection and transduction for prediction of molecular bioactivity for drug design,” Machine Learning 29, 1997, pp. 103-130.
    [88] V. Vapnik, The nature of statistical learning theory. 1995, New york: Springer.
    [89] I. Guyon, J. Weston, and S. Barnill, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, 46, 2002, pp. 389-422.
    [90] T. R. Golub, D. K. Slonim, P. Tamayo, et al. “Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring,” Science, 286(5439), 1999, pp. 531-537.
    [91] G. J. Gordon, R. V. Jensen, L. L. Hsiao, S. R. Gullans, J. E. Blumenstock, S. Ramaswamy, W. G. Richards, D. J. Sugarbaker, and R. Bueno, “Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma,” Cancer Research vol 62, pp. 4963-4967, 2002.
    [92] D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, and W. R. Sellers, “ Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell vol 1, pp. 203-209, 2002
    [93] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning Volume 46, Issue 1-3, pp. 389-422, 2002
    [94] F. Li and Yiming Yang, “Analysis of recursive gene selection approaches from microarray data,” Bioinformatics, 2005, vol. 21, no. 19, pp. 3741-3747.
    [95] J. Li, S.K. Ng, and L. Wong, “Discovery of significant rules for classifying cancer disgnosis data,” Bioinformatics 19, 2003, pp. 1061-1069.
    [96] Y.C. Tang, Y.Q. Zhang, Z. Huang and X. Tony Hu, “Granular SVM-RFE gene selection algorithm for reliable prostate cancer classification,” Proc. of the fifth IEEE Symposium on Bioinformatics & Bioengineering (BIBE 2005), Minneapolis, 2005, pp. 290-293.
    [97] T. Ono, H. Hishigaki, A. Tanigami and T. Takagi, “Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature,” Bioinformatics, vol.17, no.2 , 2001, pp. 155-161.
    [98] S. Raychaudhuri, J.T. Chang, P.D. Sutphin and R.B. Altman, “Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical Literature,” Genome Research, 2001, 12:203-214.
    [99] S. Raychaudhuri, and R.B. Altman, “A Literature-Based Method for Assessing Functional Coherence of a Gene Group,” Bioinformatics, vol.19, no. 3, 2003, pp. 396-401.
    [100] J.E. Beasley and F.J. Planes, “Recovering metabolic pathways via optimization,” Bioinformatics, vol. 23, no. 1, 2007, pp. 92-98.
    [101] K. Fukuda and T. Takagi, “Knowledge Representation and Signal Transduction Pathways,” Bioinformatics, vol. 17, no.9, 2003, pp. 829-837.
    [102] Z. Lei and Y. Dai, “Assessing protein similarity with gene ontology and its use in subnuclear localization prediction,” BMC Bioinformatics, 7:491, 2006.
    [103] A.I. Su, M.P. Cooke, K.A. Ching, Y. Hakak, J.R. Walker, T. Wiltshire, A.P. Orth, R.G. Vega, L.M. Sapinoso, A. Moqrich, A. Patapoutian, G.M. Hampton, P.G. Schultz and J.B. Hogenesch, “Large-scale analysis of the human and mouse transcriptomes,” Proc. Nat’l Academy of Science, vol. 99, 2002, pp.4465-4470.
    [104] J.M. Stuart, E. Segal, D. Koller and S.K. Kim, “A gene-coexpression network for global discovery of conserved genetic modules” Science, vol. 302, no. 5643, 2003, pp. 249-255.
    [105] H.K. Lee, A.K. Hsu, J. Sajdak, J. Qin and P. Pavlidis, P, “Coexpression analysis of human genes across many microarray data sets,” Genome Research, vol. 14, 2004, pp. 1085-1094.
    [106] L.A. Martinez-Cruz, A. Rubio, M.L. Martinez-Chantar, A. Labarga, I. Barrio, A. Podhorski, V. Segura, J.L. Sevilla, M.A. Avila and J.M. Mato, “GARBAN: genomic analysis and rapid biological annotation of cDNA microarray and proteomic data,” Bioinformatics, vol. 19, 2003, pp. 2158-2160.
    [107] W. Haiying, A. Francisco, B. Olivier and D. Joaquin, “Gene expression correlation and gene ontology-based similarity: An assessment of quantitative relationships,” CIBCB’2004, pp. 25-31.
    [108] M. Wills-Karp and S.L. Ewar, “Time to draw breath: asthma susceptibility genes are identified,” Nature Rev. Genetics 5, 2004, pp. 376-387.
    [109] RNA Abundance Database: http://www.cbil.upenn.edu/RAD/
    [110] A. Danilkovitch and E.J. Leonard, “Kinases involved in MSP/RON signaling,” Journal of Leukocyte Biology, vol 65, March 1999.
    [111] V.N. Vapnik. The nature of statistical learning theory. New York: Springer-Verlag, 1995.
    [112] M. Pontil and A. Verri, “Support vector machines for 3D object recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 6, pp. 637-646, 1998.
    [113] G.X. Yu, G. Ostrouchov, A. Geist and N.F. Samatova, “An SVM-based algorithm for identification of photosynthesis-specific genome features. In 2nd IEEE computer society bioinformatics conference, CA, USA, pp. 235-243, 2003.
    [114] T. Joachims, “Text categorization with support vector machines,” In Proceedings of European conference on machine learning (ECML), Chemintz, DE, pp. 137-142, 1998.
    [115] Y.K. Lee and C.K. Lee, “Classification of multiple cancer types by multicategory support vector machines using gene expression data,” Bioinformatics, 19(9), pp. 1132-1139, 2003.
    [116] Y.-J. Lee, O.L. Mangasarian and W.H. Wolberg, “In Breast cancer survival and chemotherapy: a support vector machine analysis,” DIMACS series in discrete mathematics and theoretical computer science, vol. 55, pp. 1-20, 2000.
    [117] H.X. Liu., et al. “Diagnosing breast cancer based on support vector machines,” Journal of Chemical Information and Computer Sciences, 43(3), pp. 900-907, 2003.
    [118] H. Frohlich and O. Chhapelle, “Feature selection for support vector machines by means of generatic algorithms,” In Proceedings of the 15th IEEE international conference on tools with artificial intelligence, Sacramento, California, USA, pp. 142-148, 2003.
    [119] M. Korenberg, S.A. Billings, Y.P. Liu, and P.J. Mcllroy, “Orthogonal parameter estimation algorithm for non-linear stochastic systems,” Int’l J. Control, vol. 48, pp. 193-210, 1988.
    [120] S.A. Billings, S. Chen, and M.J. korenbrg, “Identification of MIMO non-linear systems suing a forward regression orthogonal estimator,” Int’l J. Control, vol. 49, pp. 2157-2189, 1989.
    [121] V. Kecman, Learning and soft computing. Cambridge, MA: The MIT Press, 2001.
    [122] B. Scholkopf and A.J. Smola, Statistical learning and kernel methods. Cambridge, MA: The MIT Press, 2000.
    [123] N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines. Cambridge: Cambridge University Press, 2000.
    [124] C.W. Hsu, C.C. Chang, and C.J. Lin, “A practical guide to support vector machine,” Available: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
    [125] S.L. Salzberg, “On comparing classifiers: pitfalls to avoid and a recommended approach,” Data mining and knowledge discovery, 1, pp. 317-327, 1997.
    [126] V.A. Funari, D. Leyfer, D.R. Tolan, “Expression profiling using the expressed sequence tag (EST) database for comparative physiology and metabolism,” Recent Research Developments & Physiology, vol. 1, 2000, pp. 13-30.
    [127] J-P Girard, E.S. Baekkevold and F. Amalric. FASEB J., 12, 1998, pp. 603-612.
    [128] M.J. Lercher, A.O. Urrutia, A. Pavlicek and L.D. Hurst, “A unification of mosaic structures in the human genome,” Human Molecular Genetics, 12, 2003, pp. 2411-2415.
    [129] M. Sémon, D. Mouchiroud and L. Duret, “Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance,” Human Molecular Genetics, 14, 2005, pp. 421-427.
    [130] Y.H. Yang, M.J. Buckley, S. Dudoit and T.P. Speed, “Comparison of methods for image analysis on cDNA microarray data,” J. Comput. Graph. Stat. 11(1), 2002, pp. 108-136.
    [131] C. Workman, L.J. Jensen, H. Jarmer, R. Berka, L. Gautier, H.B. Nielser, H.H. Saxild, C. Nielsen, S. Brunak and S. Knudsen, “A new non-linear normalization method for reducing variability in DNA microarray experiments,” Genome Biol, 2002, 3: research 0048.
    [132] T. Wang, J. Lu, R. Lee, Z. Gu and R. Clarke, “Iterative normalization of cDNA microarray data,” IEEE Trans Inf Technol Biomed, 6, 2002, pp. 29-37.
    [133] Y.J. Chen, R. Kodell, F. Sistare, K.L. Thompson, S. Morris and J.J. Chen, “Normalization methods for analysis of microarray gene-expression data,” J. Biopharm Stat, 13, 2003, pp. 57-74.
    [134] D. Yoon, S.G. Yi, J.H. Kim and T. Park, “Two-stage normalization using background intensities in cDNA microarray data,” BMC Bioinformatics, 2004, 5:97.
    [135] V. Ganti, R. Ramakrishnan, J. Gehrke, A. Powell and J. French, “Clustering large datasets in arbitrary metric spaces,” Proc. 15th Int’1 Conf. Data Engineering, 1999.
    [136] S. Haykin, Adaptive filter Theory. New Jersey: Prentice Hall. 1996.
    [137] F. Poggio, “Regularization theory, radial basis functions and networks,” In: From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series, No.136, 1994, pp. 83 – 104.
    [138] M.A. Abido and Y.L. Abdel-Magid, “On-line identification of synchronous machines using radial basis function neural networks,” IEEE Trans. on Power Systems, Vol.12, No.4, 1997, pp. 1500-1506.
    [139] S. Haykin, “Neural Networks: A Comprehensive Foundation,” New Jersey: Prentice Hall. 1999.
    [140] J.E. Moody and C.J. Darken, “Fast learning in networks of locally-tuned processing units,” Neural Computat., vol. 1, 1989, pp. 281-294.
    [141] S. Chen, S.A. Billings and P.M. Grant, “Recursive hybrid algorithm for nonlinear system identification using radial basis function networks,” Int. J. Control, 55(5), 1992, pp. 1051-1070.
    [142] A. Papoulis, Probability, random variables and stochastic process. Third edition. McGraw-Hill. 1991.

    下載圖示 校內:2009-01-22公開
    校外:2010-01-22公開
    QR CODE