| 研究生: |
黃聖文 Huang, Sheng-Wen |
|---|---|
| 論文名稱: |
利用權重最小平方法估測微陣列資料的缺失值 Using Weighted Least Squares Method to Estimate Microarray Missing Values |
| 指導教授: |
莊哲男
Juang, Jer-Nan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 43 |
| 中文關鍵詞: | 微陣列 、缺失值 、向量角度 、長度 、權重最小平方法 |
| 外文關鍵詞: | Microarray, missing values, length, weighted least squares |
| 相關次數: | 點閱:87 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
微陣列(Microarray)資料提供了生物領域的學者,一些有用的參考價值。但微陣列資料通常會在實驗的過程中,因為人為操作的不當或者是晶片本身的缺陷,進而導致微陣列資料產生了缺失值(Missing Values),而缺失值的存在會影響到生物學家在接下來資料的分析上產生問題,所以在進行分析真正資料之前,都會先估測出其缺失值。
大概在十年前出現了第一個估測的方法,從此之後各式各樣的新方法就快速的成長,而每一種方法其會因資料來源的不同而有不同的表現。而關於缺失值的估測,主要有兩個步驟,第一個是找出k個與要估測的基因最相似的基因群,第二就是利用這些基因群結合演算法來估測出缺失值。
而本論文主要是提供一個新的估測演算法,主要也是分成兩個步驟,第一步利用向量角度與長度的觀念找最相似的k個鄰居,第二步則是利用權重最小平方法來估測出缺失值。接著再進行幾種不同的微陣列資料,結果顯示我們所提出的方法都比其他方法更準確或者是相差不遠。
Microarray data provided some useful information for biologists. However, many microarrays suffer from missing values, due to human factors or defects in the chip itself. Missing values will affect biologists doing the following analysis, so currently there are many estimation method for solving the problem. Before we do the analysis with the real gene microarray data, we will estimate the missing values firstly.
Since the first method was introduced nearly a decade ago, new and improved methods have been developed continuously. However, every method has different performance according to different data resource. The way of estimating missing values is commonly divided into two steps. The first is to choose the most similar k-th genes with the target gene; the second is to use these genes combined with some least-squares algorithms to estimate missing values.
This thesis introduces a new estimation method which is also divided into two steps. First, we use the concept of vector's angle and length to choose the most similar k-th neighbors. Second, we use the weighted least-squares method to estimate the missing values. Several different microarray datasets are used to verify the proposed method and we also make comparison with other algorithms. The results show that our method is more accurate than the other methods, or not far.
[1] T. Aittokallio, ``Dealing with missing values in large-scale studies: microarray data
imputation and beyond,' Briefings in Bioinformatics, vol. 11, no. 2, pp. 253--264, 2010.
[2] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein,
and R. B. Altman, ``Missing value estimation methods for dna microarrays,'
Bioinformatics, vol. 17, no. 6, pp. 520--525, 2001.
[3] T. H. Bø¸, B. Dysvik, and I. Jonassen, ``Lsimpute: accurate estimation of missing values
in microarray data with least squares methods,' Nucleic Acids Research, vol. 32, no. 3,
p. e34, 2004.
[4] H. Kim, G. H. Golub, and H. Park, ``Missing value estimation for dna microarray gene
expression data: local least squares imputation,' Bioinformatics, vol. 21, no. 2, pp. 187-
-198, 2005.
[5] S. Oba, M.-a. Sato, I. Takemasa, M. Monden, K.-i. Matsubara, and S. Ishii, ``A bayesian
missing value estimation method for gene expression profile data,' Bioinformatics,
vol. 19, no. 16, pp. 2088--2096, 2003.
[6] C. ZHIPENG, M. HEYDARI, and L. GUOHUI, ``Iterated local least squares microarray
missing value imputation.,' Journal of Bioinformatics & Computational Biology, vol. 4,
no. 5, pp. 935 -- 957, 2006.
[7] X. Zhang, X. Song, H. Wang, and H. Zhang, ``Sequential local least squares imputation
estimating missing value of microarray data,' Computers in Biology and Medicine,
vol. 38, no. 10, pp. 1112 -- 1120, 2008.
[8] P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O.
Brown, D. Botstein, and B. Futcher, ``Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization,'
Mol. Biol. Cell, vol. 9, no. 12, pp. 3273--3297, 1998.
[9] T. Pramila, W. Wu, S. Miles, W. S. Noble, and L. L. Breeden, ``The forkhead transcription
factor hcm1 regulates chromosome segregation genes and fills the s-phase gap in
the transcriptional circuitry of the cell cycle,' Genes & Development, vol. 20, no. 16,
pp. 2266--2278, 2006.
[10] A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein,
and P. O. Brown, ``Genomic expression programs in the response of yeast cells
to environmental changes,' Mol. Biol. Cell, vol. 11, no. 12, pp. 4241--4257, 2000.
[11] J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio, ``Improving missing value estimation
in microarray data with gene ontology,' Bioinformatics, vol. 22, no. 5, pp. 566-
-572, 2006.
[12] N. Ogawa, J. DeRisi, and P. O. Brown, ``New components of a system for phosphate
accumulation and polyphosphate metabolism in saccharomyces cerevisiae revealed by
genomic expression analysis,' Mol. Biol. Cell, vol. 11, no. 12, pp. 4309--4321, 2000.
[13] S. L. Klein, A. Cernetich, S. Hilmer, E. P. Hoffman, A. L. Scott, and G. E. Glass, ``Differential
expression of immunoregulatory genes in male and female norway rats following
infection with seoul virus,' Journal of Medical Virology, vol. 74, no. 1, pp. 180--190,
2004.
[14] O. Alter, P. O. Brown, and D. Botstein, ``Singular value decomposition for genomewide
expression data processing and modeling,' Proceedings of the National Academy
of Sciences, vol. 97, no. 18, pp. 10101--10106, 2000.
[15] J. L. DeRisi, V. R. Iyer, and P. O. Brown, ``Exploring the metabolic and genetic control
of gene expression on a genomic scale,' Science, vol. 278, no. 5338, pp. 680--686, 1997.
[16] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov,
H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S.
Lander, ``Molecular classification of cancer: Class discovery and class prediction by
gene expression monitoring,' Science, vol. 286, no. 5439, pp. 531--537, 1999.
[17] J. Han and M. Kamber, ``Data mining: Concepts and techniques,' 2000.
[18] K.-Y. Kim, B.-J. Kim, and G.-S. Yi, ``Reuse of imputed data in microarray analysis
increases imputation efficiency,' BMC Bioinformatics, vol. 5, no. 1, p. 160, 2004.
[19] Y.-C. Lin, ``Missing value estimation by using biological knowledge in dna microarray
datasets,' Master's thesis, National Cheng-Kung University, 2007.
[20] M. Ouyang, W. J. Welsh, and P. Georgopoulos, ``Gaussian mixture clustering and imputation
of microarray data,' Bioinformatics, vol. 20, no. 6, pp. 917--923, 2004.