| 研究生: |
遲明倫 Chr, Ming-Lun |
|---|---|
| 論文名稱: |
結合向量角度與歐基裡德距離計算微陣列資料缺失值 Missing value estimation in Microarray data with combining vector angle and Euclidean distance |
| 指導教授: |
莊哲男
Juang, Jer-Nan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 微陣列 、缺失值 、向量角度 、歐基里德距離 、權重最小平方法 |
| 外文關鍵詞: | Microarry, Missing values, vector angle, Euclidean distance, weighted least squares |
| 相關次數: | 點閱:144 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
缺值計算在微陣列資料中是非常重要的。很多微陣列資料包含了很多缺失值如基因表現資料,其中下游分析微陣列資料必須一個完整的數據以及分析的結果非常依賴資料中的數據。在現今的文獻中也提出了很多方法計算缺失值,而且效果也越來越顯著。如今,我們找到許多最佳化的方法(Local Least Squares Imputation (LLS)、 Iterated Local Least Squares(ILLS)、 Sequential Local Least Squares(SLLS)、 LSimpute(LS) and Bayesian Principle Component Analyses (BPCA))
除了BPCA,以上這些方法都是根據最小平方法則去計算。而我們的理論基礎也是由SLLS改變鄰居選取的方法。相關係數、歐基裡德距離還有角度是為了選擇相似基因。大部分的演算法都是選擇其中之一去選取鄰居。我們斷定選取鄰居是很重要的步驟,而且會有很正面的影顯模擬結果。所以,我們結合三種選擇鄰居的方法去尋找相似的鄰居,在配合權重最小平方法計算缺失值,利用normalized root mean squared error(NRMSE)比較上述的方法。模擬結果顯示我們的方法比其他演算法有較好的表現。
Missing value estimation is important in DNA microarray data. Some data such as gene expression
data frequently contain missing values, and most of its downstream analyses for microarray experiments require a complete dataset or their results are significantly dependent on the quality of such estimates. Since biological studies require a complete matrix of gene array, missing values must be estimated before further analysis.
In open literature for microarray missing estimation, there exist several optimal methods such as Local Least Squares Imputation (LLS), Iterated Local Least Squares (ILLS), Sequential Local Least Squares (SLLS), LSimpute (LS) and Bayesian Principle Component Analyses (BPCA). These methods are based on the least squares principle except BPCA. We propose a method similar to the SLLS to improve the selection of gene neighbors. Three approaches of finding similar genes (neighbors) include Pearson correlation, Euclidean distance, and vector angle. Most existing optimal methods use only one of the three approaches. It is known from the past experience that choosing gene neighbors is vital for missing value estimation.
Our proposed method integrates the three approaches to select similar genes for later use in the weighted minimum-norm estimation of missing values. Normalized root mean squared error (NRMSE) is used to compare our method with other optimal methods, and the merit of our performance is shown.
[1] I. Miew Keen Choong, Menber, I. Maurice Charbit, Menber, and I. Hong Yan,Fellow, "Autoregressive-model-based missing value estimation for dna microar- ray time series data," IEEE TRANSACTIONS ON INFORMATION TECH- NOLOGY IN BIOMEDICINE, vol. 13, no. 1, pp. 1089-7771, 2009.
[2] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, "Missing value estimation methods for dna microarray," BIOINFORMATICS, vol. 17, no. 6, pp. 520-525, 2001.
[3] S. Oba, M. aki Sato, I. Takemasa, M. Monden, K. ichi Matsubara, and S. Ishii, "A bayesian missing value estimation method for gene expression pro le data," BIOINFORMATICS, vol. 19, no. 16, pp. 2088-2096, 2003.
[4] T. H. B , B. Dysvik, and I. Jonassen, "Lsimpute:accurate estimation of missing values in microarray data with least squares methods," Nucleic Acids Research, vol. 32, no. 3, p. e34, 2004.
[5] H. Kim, G. H. Golub, and H. Park, "Missing value estimation for dna microar- ray gene expression data:local least squares imputation," BIOINFORMATICS, vol. 21, no. 2, pp. 187-198, 2005.
[6] M. H. ZHIPENG CAI and G. LIN, "Iterated local least squares microarray missing value imputation," Imperial College Press, vol. 4, no. 5, pp. 935-957, 2006.
[7] X. Zhang, X. Song, and H. Z. Huinan Wang, "Sequential local least squares imputation estimating missing value of microarray data," Computers in Biology and Medicine, vol. 38, pp. 1112-1120, 2006.
[8] P. T. Spellman, G. Sherlock, V. R. I. Michael Q. Zhang, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, , and B. Futcher, "Comprehensive identi cation of cell cycle regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization," Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.
[9] N. Ogawa, J. DeRisi, , and P. O. Brown, "New components of a system for phos- phate accumulation and polyphosphate metabolism in saccharomyces cerevisiae revealed by genomic expression analysis," Molecular Biology of the Cell, vol. 11, pp. 4309-4321, 2000.
[10] A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown, "Genomic expression programs in the response of yeast cells to environmental changes," Molecular Biology of the Cell, vol. 11, pp. 4241-4257, 2000.
[11] D. T. Ross, U. Scherf, M. B. Eisen, C. M. Perou, C. Rees, P. Spellman, V. Iyer, S. S. Je rey, M. V. de Rijn, M. Waltham, A. Pergamenschikov, J. C. Lee, D. Lashkari, D. Shalon, T. G. Myers, J. N. Weinstein, and D. B. O. Brown, "Systematic variation in gene expression patterns in human cancer cell lines," Molecular Biology of the Cell, vol. 24, pp. 227-235, 2000.
[12] A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown, "Genomic expression programs in the response of yeast cells to environmental changes," Molecular Biology of the Cell, vol. 11, pp. 4241-4257, 2000.