成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃聖文 Huang, Sheng-Wen
論文名稱：	利用權重最小平方法估測微陣列資料的缺失值 Using Weighted Least Squares Method to Estimate Microarray Missing Values
指導教授：	莊哲男 Juang, Jer-Nan
學位類別：	碩士 Master
系所名稱：	工學院 - 工程科學系 Department of Engineering Science
論文出版年：	2011
畢業學年度：	99
語文別：	英文
論文頁數：	43
中文關鍵詞：	微陣列、缺失值、向量角度、長度、權重最小平方法
外文關鍵詞：	Microarray, missing values, length, weighted least squares
相關次數：	點閱：156 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

微陣列(Microarray)資料提供了生物領域的學者，一些有用的參考價值。但微陣列資料通常會在實驗的過程中，因為人為操作的不當或者是晶片本身的缺陷，進而導致微陣列資料產生了缺失值(Missing Values)，而缺失值的存在會影響到生物學家在接下來資料的分析上產生問題，所以在進行分析真正資料之前，都會先估測出其缺失值。
大概在十年前出現了第一個估測的方法，從此之後各式各樣的新方法就快速的成長，而每一種方法其會因資料來源的不同而有不同的表現。而關於缺失值的估測，主要有兩個步驟，第一個是找出k個與要估測的基因最相似的基因群，第二就是利用這些基因群結合演算法來估測出缺失值。
而本論文主要是提供一個新的估測演算法，主要也是分成兩個步驟，第一步利用向量角度與長度的觀念找最相似的k個鄰居，第二步則是利用權重最小平方法來估測出缺失值。接著再進行幾種不同的微陣列資料，結果顯示我們所提出的方法都比其他方法更準確或者是相差不遠。

Microarray data provided some useful information for biologists. However, many microarrays suffer from missing values, due to human factors or defects in the chip itself. Missing values will affect biologists doing the following analysis, so currently there are many estimation method for solving the problem. Before we do the analysis with the real gene microarray data, we will estimate the missing values firstly.
Since the first method was introduced nearly a decade ago, new and improved methods have been developed continuously. However, every method has different performance according to different data resource. The way of estimating missing values is commonly divided into two steps. The first is to choose the most similar k-th genes with the target gene; the second is to use these genes combined with some least-squares algorithms to estimate missing values.
This thesis introduces a new estimation method which is also divided into two steps. First, we use the concept of vector's angle and length to choose the most similar k-th neighbors. Second, we use the weighted least-squares method to estimate the missing values. Several different microarray datasets are used to verify the proposed method and we also make comparison with other algorithms. The results show that our method is more accurate than the other methods, or not far.

中文摘要 i
Abstract ii
Acknowledgements iii
Contents iv
List of Tables vii
List of Figures viii
Symbols x
Introduction 1
1 Research Background 1
2 Motivation and Objective 2
3 Research Method 2
4 Thesis Contribution 3
5 Thesis Structure 4
Literature Review 5
1 Definition of Missing Values 5
2 Conventional Methods 5
2.1 Traditional Methods 6
2.2 Definition of Common Symbols 6
2.3 KNN (K'th Nearest Neighbors) 7
2.4 SVD (Singular Values Decomposition) 9
2.5 BPCA (Bayesian Principal Component Analysis Imputation) 10
3 Regression-Based Methods 13
3.1 LS (Least Squares Imputation) 13
3.2 LLS (Local Least Squares Imputation) 16
3.3 ILLS (Iterated Local Least Squares Imputation) 17
3.4 SLLS (Sequential Local Least Squares Imputation) 19
Weighted Least Squares 20
1 Determination of k nearest neighbors 20
2 Null Space Method 22
3 Least-Squares Solution 23
4 Weighted Least-Squares Solution 23
5 Minimum-Norm Solution 24
6 Summary 25
Numerical Simulation and Analysis 26
1 Data Used for Simulation 26
2 Result and Analysis 28
2.1 Analysis of Selecting k Value 28
2.2 Analysis of Selecting Weighting Matrix 28
2.3 Simulation in Different Datasets 32
Concluding Remarks 39
References 40
Vita 43

                                    

[1] T. Aittokallio, ``Dealing with missing values in large-scale studies: microarray data
imputation and beyond,' Briefings in Bioinformatics, vol. 11, no. 2, pp. 253--264, 2010.
[2] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein,
and R. B. Altman, ``Missing value estimation methods for dna microarrays,'
Bioinformatics, vol. 17, no. 6, pp. 520--525, 2001.
[3] T. H. Bø¸, B. Dysvik, and I. Jonassen, ``Lsimpute: accurate estimation of missing values
in microarray data with least squares methods,' Nucleic Acids Research, vol. 32, no. 3,
p. e34, 2004.
[4] H. Kim, G. H. Golub, and H. Park, ``Missing value estimation for dna microarray gene
expression data: local least squares imputation,' Bioinformatics, vol. 21, no. 2, pp. 187-
-198, 2005.
[5] S. Oba, M.-a. Sato, I. Takemasa, M. Monden, K.-i. Matsubara, and S. Ishii, ``A bayesian
missing value estimation method for gene expression profile data,' Bioinformatics,
vol. 19, no. 16, pp. 2088--2096, 2003.
[6] C. ZHIPENG, M. HEYDARI, and L. GUOHUI, ``Iterated local least squares microarray
missing value imputation.,' Journal of Bioinformatics & Computational Biology, vol. 4,
no. 5, pp. 935 -- 957, 2006.
[7] X. Zhang, X. Song, H. Wang, and H. Zhang, ``Sequential local least squares imputation
estimating missing value of microarray data,' Computers in Biology and Medicine,
vol. 38, no. 10, pp. 1112 -- 1120, 2008.
[8] P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O.
Brown, D. Botstein, and B. Futcher, ``Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization,'
Mol. Biol. Cell, vol. 9, no. 12, pp. 3273--3297, 1998.
[9] T. Pramila, W. Wu, S. Miles, W. S. Noble, and L. L. Breeden, ``The forkhead transcription
factor hcm1 regulates chromosome segregation genes and fills the s-phase gap in
the transcriptional circuitry of the cell cycle,' Genes & Development, vol. 20, no. 16,
pp. 2266--2278, 2006.
[10] A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein,
and P. O. Brown, ``Genomic expression programs in the response of yeast cells
to environmental changes,' Mol. Biol. Cell, vol. 11, no. 12, pp. 4241--4257, 2000.
[11] J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio, ``Improving missing value estimation
in microarray data with gene ontology,' Bioinformatics, vol. 22, no. 5, pp. 566-
-572, 2006.
[12] N. Ogawa, J. DeRisi, and P. O. Brown, ``New components of a system for phosphate
accumulation and polyphosphate metabolism in saccharomyces cerevisiae revealed by
genomic expression analysis,' Mol. Biol. Cell, vol. 11, no. 12, pp. 4309--4321, 2000.
[13] S. L. Klein, A. Cernetich, S. Hilmer, E. P. Hoffman, A. L. Scott, and G. E. Glass, ``Differential
expression of immunoregulatory genes in male and female norway rats following
infection with seoul virus,' Journal of Medical Virology, vol. 74, no. 1, pp. 180--190,
2004.
[14] O. Alter, P. O. Brown, and D. Botstein, ``Singular value decomposition for genomewide
expression data processing and modeling,' Proceedings of the National Academy
of Sciences, vol. 97, no. 18, pp. 10101--10106, 2000.
[15] J. L. DeRisi, V. R. Iyer, and P. O. Brown, ``Exploring the metabolic and genetic control
of gene expression on a genomic scale,' Science, vol. 278, no. 5338, pp. 680--686, 1997.
[16] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov,
H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S.
Lander, ``Molecular classification of cancer: Class discovery and class prediction by
gene expression monitoring,' Science, vol. 286, no. 5439, pp. 531--537, 1999.
[17] J. Han and M. Kamber, ``Data mining: Concepts and techniques,' 2000.
[18] K.-Y. Kim, B.-J. Kim, and G.-S. Yi, ``Reuse of imputed data in microarray analysis
increases imputation efficiency,' BMC Bioinformatics, vol. 5, no. 1, p. 160, 2004.
[19] Y.-C. Lin, ``Missing value estimation by using biological knowledge in dna microarray
datasets,' Master's thesis, National Cheng-Kung University, 2007.
[20] M. Ouyang, W. J. Welsh, and P. Georgopoulos, ``Gaussian mixture clustering and imputation
of microarray data,' Bioinformatics, vol. 20, no. 6, pp. 917--923, 2004.

2013-07-26公開

簡易檢索 / 詳目顯示

相關論文