| 研究生: |
程瀚德 Cheng, Han-De |
|---|---|
| 論文名稱: |
藉由混和最小平方解演算法來修復微陣列基因序列資料的遺失值 Missing Value Estimation for Microarray Gene Expression Data by Hybrid Local Least Squares Imputation |
| 指導教授: |
莊哲男
Juang, Jer-Nan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 微陣列 、缺失值 、權重最小平方法 、語意相似度 、基因本體註解 |
| 外文關鍵詞: | Microarray, missing values, weighted least-squares, semantic similarity, gene ontology annotation |
| 相關次數: | 點閱:164 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基因表現微陣列資料已經被廣泛的應用在生物實驗上的分析。然而由於總總原因,微陣列資料經常有遺失值出現,進而導致大部分需要利用完整微陣列資料來分析基因表現的演算法受到影響,例如: 叢集分析、分類法以及建立生物網絡。因此,如何重建微陣列資料與提升估計值的準確率是一個重要的問題。大部分演算法的過程可分成兩步驟,第一步先選出沒有缺失值且與欲計算的目標基因相似度最高的前k個基因,第二步再利用這些相似基因結合不同的演算法來計算出缺失值。在本論文中,我們首先利用每個基因註解的功能 (gene ontology annotations) 算出的基因語意相似度 (semantic similarity) 來挑選每一個有遺失值的目標基因所對應的相似基因群,並提出了一個方法結合了 Iterated Local Least Squares (ILLSimpute) 以及 Sequential Local Least Squares (SLLSimpute) 的概念來估計遺失值。實驗用了四個微陣列資料,結果顯示此方法比起其他已有真正在使用的演算法有著更良好的準確率。
Gene expression microarray data have been used widely for biological analyses. However, it usually contains missing values resulted from various reasons and affects most of the gene expression data analysis algorithms, such as clustering, classification and network design, which require complete information. Therefore how to reconstruct microarray data and to improve accuracy is an important issue. The procedure of most algorithms is mainly separated into two steps. In the first step, a specific number of top similar genes without missing values are chosen. In the second step, the chosen genes are used to estimate missing values with different methods. In this thesis, we first use semantic similarity originating from gene ontology annotations to select similar genes for every target gene containing missing values and propose a new method that uses the important features of iterated local least-squares and sequential local least-squares imputation methods to estimate missing values. The numerical simulations in four microarray datasets show that the performance of our method is better than other imputation methods currently used.
[1] J. L. DeRisi, V. R. Iyer, and P. O. Brown, "Exploring the metabolic and genetic control of gene expression on a genomic scale," Science, vol. 278, no. 5338, pp. 680-686, 1997.
[2] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfeld, and E. S. Lander, "Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[3] J. Han and M. Kamber, Data mining: concepts and techniques. Morgan Kaufmann, 2006.
[4] M. Ouyang, W. J. Welsh, and P. Georgopoulos, "Gaussian mixture clustering and imputation of microarray data," Bioinformatics, vol. 20, no. 6, pp. 917-923, 2004.
[5] Q. Xiang, X. Dai, Y. Deng, C. He, J. Wang, J. Feng, and Z. Dai, "Missing value imputation for microarray gene expression data using histone acetylation information," BMC Bioinformatics, vol. 9, no. 252, 2008.
[6] H. Kim, G. H. Golub, and H. Park, "Missing value estimation for DNA microarray gene expression data: local least squares imputation," Bioinformatics, vol. 21, no. 2, pp. 187-198, 2005.
[7] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, "Missing value estimation methods for DNA microarrays," Bioinformatics, vol. 17, no. 6, pp. 520-525, 2001.
[8] O. Alter, P. O. Brown, and D. Botstein, "Singular value decomposition for genome-wide expression data processing and modeling," Proc. Natl Acad. Sci., vol. 97, no. 18, pp. 10101-10106, 2000.
[9] S. Oba, M. aki Sato, I. Takemasa, M. Monden, K. ichi Matsubara, and S. Ishii, "A Bayesian missing value estimation method for gene expression pro le data,"
Bioinformatics, vol. 19, no. 16, pp. 2088-2096, 2003.
[10] T. H. B , B. Dysvik, and I. Jonassen, "LSimpute: accurate estimation of missing values in microarray data with least squares methods," Nucleic Acids Res., vol. 32, no. 3, e34, 2004.
[11] X. Zhang, X. Song, H. Wang, and H. Zhang, "Sequential local least squares imputation estimating missing value of microarray data," Computers in Biology and Medicine, vol. 38, pp. 1112-1120, 2008.
[12] Z. CAI, M. HEYDARI, and G. LIN, "Iterated local least squares microarray missing value imputation," J Bioinform Comput Biol., vol. 4, no. 5, pp. 935-957, 2006.
[13] J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio, "Improving missing value estimation in microarray data with gene ontology," Bioinformatics, vol. 22, no. 5, pp. 566-572, 2006.
[14] P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, "Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation," Bioinformatics, vol. 19, no. 10, pp. 1275-1283, 2003.
[15] C. Fellbaum, "Wordnet," An electronic lexical database, 1998.
[16] P. Resnik, "Using information content to evaluate semantic similarity in a taxonomy," Proc. of the 14th International Joint Conference on Artificial Intelli-gence, 1995.
[17] N. Ogawa, J. DeRisi, and P. O. Brown, "New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiae revealed by genomic expression analysis," Mol. Biol. Cell, vol. 11, pp. 4309-4321, 2000.
[18] A. P. Gasch, M. Huang, S. Metzner, D. Botstein, S. J. Elledge, and P. O. Brown, "Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Meclp," Mol. Biol. Cell, vol. 12, pp. 2987-3003, 2001.
[19] P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher, "Comprehensive identi cation of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization," Mol. Biol. Cell, vol. 9, pp. 3273-3297, 1998.
[20] D. T. Ross, U. Scherf, M. B. Eisen, C. M. Perou, C. Rees, P. Spellman, V. Iyer, S. S. Jeffrey, M. V. de Rijn, M. Waltham, A. Pergamenschikov, J. C. Lee, D. Lashkari, D. Shalon, T. G. Myers, J. N. Weinstein, D. Botstein, and P. O.
Brown, "Systematic variation in gene expression patterns in human cancer cell lines," Nature Genetics, vol. 24, pp. 227-235, 2000.
[21] D. J. Allocco, I. S. Kohane, and A. J. Butte, "Quantifying the relationship between co-expression, co-regulation and gene function," BMC Bioinformatics, vol. 5, no. 18, 2004.
[22] 林榆嘉,"以生物知識特性修復微陣列資料遺失值," 國立成功大學 資訊工程研究所, 2007.
[23] J. Hu, H. Li, M. S. Waterman, and X. J. Zhou, "Integrative missing value estimation for microarray data," BMC Bioinformatics, vol. 7, no. 449, 2006.