簡易檢索 / 詳目顯示

研究生: 詹士瑤
Zhan, Shi-Yao
論文名稱: 微陣列資料缺值填補方法綜合比較之研究
A comprehensive study on comparison of missing value imputation methods for microarray data
指導教授: 吳謂勝
Wu, Wei-Sheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 34
中文關鍵詞: 微陣列資料演算法性能指標公平且客觀地評估
外文關鍵詞: microarray, algorithm, measure, comprehensive
相關次數: 點閱:80下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 微陣列資料(microarray data)常常因許多的原因而產生缺失值(missing values)。然而,後續的分析(downstream analysis)往往都需要完整的資料才能進行分析。因此,為了提升後續的分析效果,已有許多方法被發展出來並且宣稱可以有效地解決缺失值的問題。在過去幾年,對於演算法的性能比較已經有一些初步的評估,但是這些評估卻存在著許多問題。在此研究中,我們使用大量的微陣列資料對各現存演算法進行多次的模擬,並且利用不同類型的性能指標來公平且客觀地評估各演算法的性能。研究的一開始,我們評估各演算法在不同類型資料的性能與表現。接下來探討不同的物種對演算法的性能的影響。其中,為了公平且客觀地評估演算法的性能,不僅使用統計指標,也使用兩個具有生物意義的指標來進行性能評估。我們認為以local-least-squares的方法改良與發展的演算法(例如:LLS、ILLS和SLLS)以及LS演算法在測試的微陣列資料當中的普遍表現較佳。在這研究當中,我們進行客觀且全面性的分析與比較其演算法填補缺失值的性能。有了此客觀且全面性的分析研究結果,讓相關領域之研究人員更方便地依照不同的需求來挑選最佳的演算法。除此之外,開發新式缺失值估測演算法時,能以此研究之策略來與既有演算法進行比較。

    Microarray data frequently contain missing values due to various reasons. However, most downstream analyses for microarray data require complete datasets. Therefore, algorithms for missing value estimation must be developed to improve downstream analysis. Since 2001, lots of algorithms have been proposed, but the comparison performance among different algorithms is always insufficient in the numbers of benchmark datasets, the number of algorithms included and performance measure used, and the rounds of simulation performed. In this research, we used (I) nine algorithms, (II) thirteen microarray datasets, (III) 110 independent runs of the simulation procedure, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. In addition to the statistical measure, two other indices with more biological meanings are useful to reflect the impact of missing value imputation on downstream data analysis. Through our studies, we suggest that local-least-squares-based and least-squares methods would be better choices to handle missing values for most of datasets. In this work, we carried out a comprehensive comparison of algorithms of microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose an optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with other existing algorithms using this comparison strategy as a standard protocol in the future.

    中文摘要……………………………………………………………………………………I 英文摘要…………………………………………………………………………………II 致謝…………………………………………………………………………………III 目錄…………………………………………………………………………………IV 表目錄……………………………………………………………………………V 圖目錄……………………………………………………………………………VI 第一章. 緒論 1 1.1研究背景 1 1.2研究動機與目的 3 第二章. 使用的資料、方法與指標 6 2.1 資料集之出處 6 2.2 方法介紹 9 2.3 模擬設定 11 2.4 指標介紹 12 第三章. 結果與分析討論 15 3.1在不同類別資料之演算法性能 15 3.2討論演算法在不同物種資料集間的強健性 28 3.3與初步分析研究的比較 28 第四章. 結論與展望 30 第五章. 參考文獻 32

    Alizadeh, A. A., et al. (2000). "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling." Nature 403: 503--511.

    Bø, T. H., et al. (2004). "LSimpute: accurate estimation of missing values in microarray data with least squares methods." Nucleic acids research 32: e34--e34.

    Baldwin, D. N., et al. (2003). "A gene-expression program reflecting the innate immune response of cultured intestinal epithelial cells to infection by Listeria monocytogenes." Genome Biol 4: R2.

    Bohen, S. P., et al. (2003). "Variation in gene expression patterns in follicular lymphoma and the response to rituximab." Proceedings of the National Academy of Sciences 100: 1926--1930.

    Brás, L. i. g. P. and J. e. C. Menezes (2007). "Improving cluster-based missing value estimation of DNA microarray data." Biomolecular engineering 24: 273--282.

    Brauer, M. J., et al. (2005). "Homeostatic adjustment and metabolic remodeling in glucose-limited yeast cultures." Molecular biology of the cell 16: 2503--2517.

    Brock, G. N., et al. (2008). "Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes." BMC bioinformatics 9: 12.

    Cai, Z., et al. (2006). "Iterated local least squares microarray missing value imputation." Journal of Bioinformatics and Computational Biology 4: 935--957.

    Celton, M., et al. (2010). "Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments." BMC genomics 11: 15.

    G, d. B. A., et al. (2004). "Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering." BMC bioinformatics 5: 114.

    Houshdaran, S., et al. (2010). "DNA methylation profiles of ovarian epithelial carcinoma tumors and cell lines." PLoS One 5: e9359.

    Kim, H., et al. (2005). "Missing value estimation for DNA microarray gene expression data: local least squares imputation." Bioinformatics 21: 187--198.

    Kim, K.-Y., et al. (2004). "Reuse of imputed data in microarray analysis increases imputation efficiency." BMC bioinformatics 5: 160.

    Muhammad Shoaib B, S., et al. (2009). "How to Improve Postgenomic Knowledge Discovery Using Imputation." EURASIP Journal on Bioinformatics and Systems Biology 2009.

    Oba, S., et al. (2003). "A Bayesian missing value estimation method for gene expression profile data." Bioinformatics 19: 2088--2096.

    Ogawa, N., et al. (2000). "New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiaerevealed by genomic expression analysis." Molecular biology of the cell 11: 4309--4321.

    Oh, S., et al. (2011). "Biological impact of missing-value imputation on downstream analyses of gene expression profiles." Bioinformatics 27: 78--86.

    Ouyang, M., et al. (2004). "Gaussian mixture clustering and imputation of microarray data." Bioinformatics 20: 917--923.

    Ronen, M. and D. Botstein (2006). "Transcriptional response of steady-state yeast cultures to transient perturbations in carbon source." Proceedings of the National Academy of Sciences of the United States of America 103: 389--394.

    Schenk, et al. (2012). "La-motif--dependent mRNA association with Slf1 promotes copper detoxification in yeast." RNA 18: 449--461.

    Scherrer, T., et al. (2010). "A screen for RNA-binding proteins in yeast indicates dual functions for many enzymes." PLoS One 5: e15499.

    Shapira, M., et al. (2004). "Disruption of yeast forkhead-associated cell cycle transcription by oxidative stress." Molecular biology of the cell 15: 5659--5669.

    Shi, Y., et al. (2008). "Classification accuracy based microarray missing value imputation." Bioinformatics Algorithms: Techniques and Applications: 303--328.

    Spellman, P. T., et al. (1998). "Comprehensive identification of cell cycle--regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization." Molecular biology of the cell 9: 3273--3297.

    Troyanskaya, O., et al. (2001). "Missing value estimation methods for DNA microarrays." Bioinformatics 17: 520--525.

    Tuikkala, J., et al. (2008). "Missing value imputation improves clustering and interpretation of gene expression microarray data." BMC bioinformatics 9: 202.

    van Baarsen, L. G., et al. (2010). "Research article Regulation of IFN respon se gene activity during infliximab treatment in rheumatoid arthritis is associated with clinical response to treatment."

    Wang, D., et al. (2006). "Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules." Bioinformatics 22: 2883--2889.

    Whitfield, M. L., et al. (2002). "Identification of genes periodically expressed in the human cell cycle and their expression in tumors." Molecular biology of the cell 13: 1977--2000.

    Yoshimoto, H., et al. (2002). "Genome-wide analysis of gene expression regulated by the calcineurin/Crz1p signaling pathway in Saccharomyces cerevisiae." Journal of Biological Chemistry 277: 31079--31088.

    Zhang, X. a. S., Xiaofeng and Wang, Huinan and Zhang, Huanping (2008). "Sequential local least squares imputation estimating missing value of microarray data." Computers in Biology and Medicine 38: 1112--1120.

    Zhang, Y., et al. (2009). "Reverse engineering module networks by PSO-RNN hybrid modeling." BMC genomics 10: S15.

    下載圖示 校內:2015-08-28公開
    校外:2015-08-28公開
    QR CODE