簡易檢索 / 詳目顯示

研究生: 黃聖文
Huang, Sheng-Wen
論文名稱: 利用權重最小平方法估測微陣列資料的缺失值
Using Weighted Least Squares Method to Estimate Microarray Missing Values
指導教授: 莊哲男
Juang, Jer-Nan
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 43
中文關鍵詞: 微陣列缺失值向量角度長度權重最小平方法
外文關鍵詞: Microarray, missing values, length, weighted least squares
相關次數: 點閱:87下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 微陣列(Microarray)資料提供了生物領域的學者,一些有用的參考價值。但微陣列資料通常會在實驗的過程中,因為人為操作的不當或者是晶片本身的缺陷,進而導致微陣列資料產生了缺失值(Missing Values),而缺失值的存在會影響到生物學家在接下來資料的分析上產生問題,所以在進行分析真正資料之前,都會先估測出其缺失值。
    大概在十年前出現了第一個估測的方法,從此之後各式各樣的新方法就快速的成長,而每一種方法其會因資料來源的不同而有不同的表現。而關於缺失值的估測,主要有兩個步驟,第一個是找出k個與要估測的基因最相似的基因群,第二就是利用這些基因群結合演算法來估測出缺失值。
    而本論文主要是提供一個新的估測演算法,主要也是分成兩個步驟,第一步利用向量角度與長度的觀念找最相似的k個鄰居,第二步則是利用權重最小平方法來估測出缺失值。接著再進行幾種不同的微陣列資料,結果顯示我們所提出的方法都比其他方法更準確或者是相差不遠。

    Microarray data provided some useful information for biologists. However, many microarrays suffer from missing values, due to human factors or defects in the chip itself. Missing values will affect biologists doing the following analysis, so currently there are many estimation method for solving the problem. Before we do the analysis with the real gene microarray data, we will estimate the missing values firstly.
    Since the first method was introduced nearly a decade ago, new and improved methods have been developed continuously. However, every method has different performance according to different data resource. The way of estimating missing values is commonly divided into two steps. The first is to choose the most similar k-th genes with the target gene; the second is to use these genes combined with some least-squares algorithms to estimate missing values.
    This thesis introduces a new estimation method which is also divided into two steps. First, we use the concept of vector's angle and length to choose the most similar k-th neighbors. Second, we use the weighted least-squares method to estimate the missing values. Several different microarray datasets are used to verify the proposed method and we also make comparison with other algorithms. The results show that our method is more accurate than the other methods, or not far.

    中文摘要 i Abstract ii Acknowledgements iii Contents iv List of Tables vii List of Figures viii Symbols x 1 Introduction 1 1.1 Research Background 1 1.2 Motivation and Objective 2 1.3 Research Method 2 1.4 Thesis Contribution 3 1.5 Thesis Structure 4 2 Literature Review 5 2.1 Definition of Missing Values 5 2.2 Conventional Methods 5 2.2.1 Traditional Methods 6 2.2.2 Definition of Common Symbols 6 2.2.3 KNN (K'th Nearest Neighbors) 7 2.2.4 SVD (Singular Values Decomposition) 9 2.2.5 BPCA (Bayesian Principal Component Analysis Imputation) 10 2.3 Regression-Based Methods 13 2.3.1 LS (Least Squares Imputation) 13 2.3.2 LLS (Local Least Squares Imputation) 16 2.3.3 ILLS (Iterated Local Least Squares Imputation) 17 2.3.4 SLLS (Sequential Local Least Squares Imputation) 19 3 Weighted Least Squares 20 3.1 Determination of k nearest neighbors 20 3.2 Null Space Method 22 3.3 Least-Squares Solution 23 3.4 Weighted Least-Squares Solution 23 3.5 Minimum-Norm Solution 24 3.6 Summary 25 4 Numerical Simulation and Analysis 26 4.1 Data Used for Simulation 26 4.2 Result and Analysis 28 4.2.1 Analysis of Selecting k Value 28 4.2.2 Analysis of Selecting Weighting Matrix 28 4.2.3 Simulation in Different Datasets 32 5 Concluding Remarks 39 References 40 Vita 43

    [1] T. Aittokallio, ``Dealing with missing values in large-scale studies: microarray data
    imputation and beyond,' Briefings in Bioinformatics, vol. 11, no. 2, pp. 253--264, 2010.
    [2] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein,
    and R. B. Altman, ``Missing value estimation methods for dna microarrays,'
    Bioinformatics, vol. 17, no. 6, pp. 520--525, 2001.
    [3] T. H. Bø¸, B. Dysvik, and I. Jonassen, ``Lsimpute: accurate estimation of missing values
    in microarray data with least squares methods,' Nucleic Acids Research, vol. 32, no. 3,
    p. e34, 2004.
    [4] H. Kim, G. H. Golub, and H. Park, ``Missing value estimation for dna microarray gene
    expression data: local least squares imputation,' Bioinformatics, vol. 21, no. 2, pp. 187-
    -198, 2005.
    [5] S. Oba, M.-a. Sato, I. Takemasa, M. Monden, K.-i. Matsubara, and S. Ishii, ``A bayesian
    missing value estimation method for gene expression profile data,' Bioinformatics,
    vol. 19, no. 16, pp. 2088--2096, 2003.
    [6] C. ZHIPENG, M. HEYDARI, and L. GUOHUI, ``Iterated local least squares microarray
    missing value imputation.,' Journal of Bioinformatics & Computational Biology, vol. 4,
    no. 5, pp. 935 -- 957, 2006.
    [7] X. Zhang, X. Song, H. Wang, and H. Zhang, ``Sequential local least squares imputation
    estimating missing value of microarray data,' Computers in Biology and Medicine,
    vol. 38, no. 10, pp. 1112 -- 1120, 2008.
    [8] P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O.
    Brown, D. Botstein, and B. Futcher, ``Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization,'
    Mol. Biol. Cell, vol. 9, no. 12, pp. 3273--3297, 1998.
    [9] T. Pramila, W. Wu, S. Miles, W. S. Noble, and L. L. Breeden, ``The forkhead transcription
    factor hcm1 regulates chromosome segregation genes and fills the s-phase gap in
    the transcriptional circuitry of the cell cycle,' Genes & Development, vol. 20, no. 16,
    pp. 2266--2278, 2006.
    [10] A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein,
    and P. O. Brown, ``Genomic expression programs in the response of yeast cells
    to environmental changes,' Mol. Biol. Cell, vol. 11, no. 12, pp. 4241--4257, 2000.
    [11] J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio, ``Improving missing value estimation
    in microarray data with gene ontology,' Bioinformatics, vol. 22, no. 5, pp. 566-
    -572, 2006.
    [12] N. Ogawa, J. DeRisi, and P. O. Brown, ``New components of a system for phosphate
    accumulation and polyphosphate metabolism in saccharomyces cerevisiae revealed by
    genomic expression analysis,' Mol. Biol. Cell, vol. 11, no. 12, pp. 4309--4321, 2000.
    [13] S. L. Klein, A. Cernetich, S. Hilmer, E. P. Hoffman, A. L. Scott, and G. E. Glass, ``Differential
    expression of immunoregulatory genes in male and female norway rats following
    infection with seoul virus,' Journal of Medical Virology, vol. 74, no. 1, pp. 180--190,
    2004.
    [14] O. Alter, P. O. Brown, and D. Botstein, ``Singular value decomposition for genomewide
    expression data processing and modeling,' Proceedings of the National Academy
    of Sciences, vol. 97, no. 18, pp. 10101--10106, 2000.
    [15] J. L. DeRisi, V. R. Iyer, and P. O. Brown, ``Exploring the metabolic and genetic control
    of gene expression on a genomic scale,' Science, vol. 278, no. 5338, pp. 680--686, 1997.
    [16] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov,
    H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S.
    Lander, ``Molecular classification of cancer: Class discovery and class prediction by
    gene expression monitoring,' Science, vol. 286, no. 5439, pp. 531--537, 1999.
    [17] J. Han and M. Kamber, ``Data mining: Concepts and techniques,' 2000.
    [18] K.-Y. Kim, B.-J. Kim, and G.-S. Yi, ``Reuse of imputed data in microarray analysis
    increases imputation efficiency,' BMC Bioinformatics, vol. 5, no. 1, p. 160, 2004.
    [19] Y.-C. Lin, ``Missing value estimation by using biological knowledge in dna microarray
    datasets,' Master's thesis, National Cheng-Kung University, 2007.
    [20] M. Ouyang, W. J. Welsh, and P. Georgopoulos, ``Gaussian mixture clustering and imputation
    of microarray data,' Bioinformatics, vol. 20, no. 6, pp. 917--923, 2004.

    下載圖示 校內:2013-07-26公開
    校外:2013-07-26公開
    QR CODE