簡易檢索 / 詳目顯示

研究生: 周俞均
Chou, Yu-Chun
論文名稱: 植基於加權圖形的微陣列基因表現分類器
A Weighted Graph-based Classifier for Microarray Gene Expression Data
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 56
中文關鍵詞: 微陣列矩陣圖形理論分類基因表現生物資訊
外文關鍵詞: microarray, graph theory, classification, gene expression, bioinformatics
相關次數: 點閱:103下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 使用微陣列技術進行癌症的診斷有很多的優點,但將微陣列應用於醫療診斷研究上仍有許多需要克服的部分。現階段在微陣列分類技術上主要有兩項需突破的限制:一為是否有可信任的資料庫當作訓練分類器的資料,二為分類器效能表現。
    目前現有利用DNA微陣列資料進行癌症分類醫療診斷研究的演算方法,實作上大多以預測癌症疾病分類之準確率為主,少有偵測誤診狀況的部分,實不符真實醫療診斷之狀況,並無法實際應用於醫療診斷上。而少數具有偵測誤診狀況之演算方法,卻又因DNA微陣列資料過於龐大,使得進行分類上會花費過多的時間,雖可降低誤診狀況,但時間上的耗費亦不符合效益。
    因此我們提出一種演算方法,假以權重的方式遴選出各類訓練資料中具指標性意義之基因,減少大量不相關的基因,並用圖形理論的方法進行資料的比對與分類,既可達成一定程度疾病分類預測之準確率,且可以適當偵測誤診的狀況,並降低誤診率,也減少分類時所花費的時間,以達到實際之應用之最高效益。

    Discovering cancer molecular profiles have a lot of good advantages, but how to use microarray technology to routine clinical diagnostics is still a big challenge. There are two main limitations in the classification of microarrays' data: one is too less reliable dataset to build the classifiers, another is the classifier's performances. Current practices algorithms of microarray classification usually produce a high rate of false positive that is unacceptable in real diagnostic application. Some algorithms can detect false positive cases, but they spend too much computation time. To address this problem, this paper improves GEG-based algorithm to decrease computation time. This algorithm filters out some genes by edge's weight to get the significance of genes, then it based on graph theory to comparison and classification. It can be the properly detect false positive, and spent less time to reach better performance. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between GEG-based algorithm and the proposed classifier.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Bioinfomatics Background 5 2.1 The Process of Microarray . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 The types of Microarray Database . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Database Systems for Local Installation . . . . . . . . . . . . . 8 2.2.2 Database for Public Data Deposition and/or Queries . . . . . . 9 2.3 The cDNA Stanford Microarray Database . . . . . . . . . . . . . . . . 10 2.4 The Data of Pathologies . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Diffuse Large B-cell Lymphoma . . . . . . . . . . . . . . . . . . 13 2.4.2 B-cell Chronic Lymphocytic Leukemia . . . . . . . . . . . . . . 13 2.4.3 B-cell Chronic Lymphocytic Leukemia Wait and Watch . . . . . 14 2.4.4 Acute Lymphoblastic Leukemia . . . . . . . . . . . . . . . . . . 14 2.4.5 Healthy Blood . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.6 Follicular Lymphoma . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.7 Cutaneous B-Cell Lymphomas . . . . . . . . . . . . . . . . . . . 16 2.4.8 Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.9 Core Binding Factor Acute Myeloid Leukemia . . . . . . . . . . 17 2.4.10 Solid Lung Tumor . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Preliminaries 26 3.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Classifiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Method 36 4.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5 Experimental Results 40 5.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Data source and Data set . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.1 Classifiable Sample . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.2 Out-of-Class Sample . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.3 Computation Time . . . . . . . . . . . . . . . . . . . . . . . . . 49 6 Conclusion 51 Bibliography 52

    [1] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J.J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," in Nature, vol. 403, no. 6769, pp.503-511, Feb. 2000.
    [2] D.B. Allison, X. Cui, G.P. Page, and M. Sabripour, "Microarray Data Analysis: From Disarray to Consolidation to Consensus," in Nature Rev.: Genetics, vol. 7, no. 1, pp.55-65, May 2006.
    [3] F. Azuaje, "A Computational Neural Approach to Support the Discovery of Gene Function and Classes of Cancer," in IEEE Transaction on Biomedical Engineering, vol. 48, no. 3, pp.332-339, Mar. 2001.
    [4] A. Benso, S. Di Carlo, and G. Politano, "A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory," in IEEE/ACM Transcations on Computational Biology and Bioinformatics, vol.8, no.3, pp. 577-591, Jun. 2011.
    [5] J. Breiman, L. ad Friedman, C.J. Stone, and R. Olshen, "Classification and Regression Trees," in Talyor and Francis, Inc, 1984.
    [6] C. Cheadle, M.P. Vawter, W.J. Freed, and K.G. Becker, "Analysis of Microarray Data Using Z Score Transformation," in J Molecular Diagnostics, vol. 5, no. 2, pp.73-81, 2003.
    [7] HY. Chuang, H. Liu, S. Brown, C. McMunn-Coffran, CY. Kao, and D.F. Hsu, "Identifying Significant Genes from Microarray Data," in Proceeding of the Fourth IEEE Symposium on Bioinformatics and Bioengineering, 2004.
    [8] M.K. Deyholos and D.W. Galbraith, "High-Density Microarrays for Gene Expression Analysis," in Cytometry, vol. 43, pp.229-238, Aug. 2001.
    [9] E.R. Dougherty, "The Fundamental Role of Pattern Recognition for Gene- Expression/Microarray Data in Bioinformatics," in Pattern Recognition, vol. 38, no. 12, pp.2226-2228, Dec. 2005.
    [10] B. Dost, C. Wu, A. Su, and V. Bafna, "TCLUST: A Fast Method for Clustering Genome-Scale Expression Data," in IEEE/ACM Transcations on Computational Biology and Bioinformatics, vol. 8, no. 3, pp.808-818, May/June 2011.
    [11] D.J. Duggan, M. Bittner, Y. Chen, P. Meltzer and J.M. Trent, "Expression Profiling Using cDNA Microarrays," in Nature Genetics Supplement, vol. 21, pp.10-14, Jan. 1999.
    [12] T. Forster, D. Roy and P. Ghazal, "Experiments Using Microarray Technology: Limitations and Standard Operating Procedures," in Journal of Endocrinology, vol. 178, pp.195-204, 2003.
    [13] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Sample Using Microarray Expression Data," in Bioinformatics, vol. 16, no. 10, pp.906-914, 2000.
    [14] M. Gardiner-Garden, T.G. Littlejohn, "The Comparison of Microarray Database," in Briefings in Bioinformatics, vol. 2, no. 2, pp.143{158, 2001.
    [15] K.K. Jain, "Applications of Biochips: From Diagnostics to Personalized Medicine," in Current Opinion in Drug Discovery and Development, vol. 7, no. 3, pp.285-289, May 2004.
    [16] M.K. Kerr, M. Martin, and G.A. Churchill, "Analysis of Variance for Gene Expression Microarray Data," in J. Computational Biology, vol. 7, no. 6, pp.819-837, Dec. 2000.
    [17] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," in Nature Medicine, vol. 7, no. 6, pp.673-679, June 2001.
    [18] J.H. Kim, "Bioinformatics and Genomic Medicine," in Supplement, vol. 4, no. 6, pp.62-65, Nov. 2002.
    [19] L. Li, C.R. Weinberg, T.A. Darden, and L.G. Pedersen, "Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method, in Bioinformatics, vol. 17, no. 12, pp.1131-1142, 2001.
    [20] Lymphoma's introduction, http://lymphoma.about.com/, 2012.
    [21] Lymphoma, http://www.lymphoma.org/, 2012.
    [22] D. Nguyen and D. Rocke, "Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data," in Bioinformatics, vol. 18, no. 1, pp.39-50, Jan. 2002.
    [23] C. Palmer, M. Diehn, A. Alizadeh, and P.O. Browncorresponding, "Cell-Type Specific Gene Expression Profiles of Leukocytes in Human Peripheral Blood," in BMC Genomics, vol. 7, no. 115, 2006.
    [24] G. Sherlock, T. Hernandez-Boussard, A. Kasarskis, G. Binkley, J.C. Matese, S.S. Dwight, M. Kaloper, S. Wang, H. Jin, C.A. Ball, M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein and J.M. Cherry, "The Stanford Microarray Database," in Nucleic Acids Research, vol. 29, no. 1, pp.152-155, 2001.
    [25] cDNA Stanford's Microarray Database, http://genome-www.stanford.edu/, 2012.
    [26] Q. Song, J. Ni, and G. Wang, "A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data," in IEEE Transcations on Knowledge and Data Engineering, pp.1-14, 2011.
    [27] A. Statnikov, L. Wang, and C. Aliferis, "A Comprehensive Comparison of Random Forests and Support Vector Machines for Microarray-Based Cancer Classification," in BMC Bioinformatics, vol. 9, no. 1, pp.319, 2008.
    [28] J. Stuart, E. Segal, D. Koller, and S. Kom, "A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules," in Science, vol. 302, no. 5643, pp.249-255, Oct. 2003.
    [29] Unigene, http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene/, 2012.
    [30] D.M. Witten and R. Tibshirani, "A Comparison of Fold-Change and the t-Statistic for Microarray Data Analysis", http://www-stat.stanford.edu/tibs/ftp/FCTComparison.pdf, 2009.
    [31] P. Xu, G.N. Brock, and R.S. Parrish, "Modified Linear Discriminant Analysis Approaches for Classification of High-Dimensional Microarray Data," in Computational Statistics and Data Analysis, vol.53, no. 5, pp.1674-1687, 2009.
    [32] Y. Yoon, S. Bien, and S. Park, "Microarray Data Classifier Consisting of k-Top Rank-Comparsion Decision Rules With a Variable Number of Genes," in IEEE/ACM Transcations on Systems, Man, and Cybernetics-Part C: Application and Reviews, vol.40, no.2, pp.216-226, March. 2010.
    [33] H. Zhang, C.-Y. Yu, and B. Singer, "Cell and Tumor Classification Using Gene Expression Data: Construction of Forests," in Proc. Nat'l Academy of Science USA, vol. 100, no. 7, pp.4168-4172, Apr. 2003.

    無法下載圖示 校內:2017-09-11公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE