| 研究生: |
鄭凱峰 Cheng, kai-feng |
|---|---|
| 論文名稱: |
小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資料癌症分類為例 |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2004 |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 基因選取 、基因微陣列 、癌症分類 |
| 外文關鍵詞: | cancer classification, gene selection, microarray |
| 相關次數: | 點閱:52 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著資訊技術的發展,基因表現資料成為研究癌症分類的一大關鍵,而近年來最引人注目的即是基因微陣列(DNA microarray)這項可同時針對上千個基因進行分析的技術,也使得對於癌症分類的研究邁入一個新的紀元。由於擁有如此豐富的基因表現資料,衍生出處理小樣本高維度資料的二階段分類法,因此許多研究學者紛紛提出解決癌症分類的各種方法,但在這些方法中,我們無法很確切的了解哪一個方法在癌症分類問題的表現上是較佳的。為了對癌症分類問題有更深的了解,本篇論文將針對一些學者所提出的解決方法和相關研究做一概括性的說明,最重要的部分則是發展一個評估的準則來衡量這些方法的效能,而這樣的評估法可以考慮到分類法的預測精確度及執行時間。
本篇論文中,針對小樣本高維度資料的二階段分類法應用在癌症分類上,作廣泛的相關文獻的探討,並且發展出重複10次的3-fold交互認證法則衡量腫瘤類別的預測精確度,再加入計算執行時間的部份,獲得以下結論:1.簡單的基因選取法縮短執行時間,卻降低了預測精確度;2.複雜的基因選取法雖然花費較高的時間,也確實的改善了預測精確度;3.並非選取的變數越多,分類法所得的精確度就越高;4.細胞內大部分的基因對癌症分類的預測來說都是雜訊;5.複雜的基因選取在精確度的改良上不明顯;6.降低維度會造成資訊的損失。最後對兩種不同的二階段分類法做交叉的分析,在這樣的過程中可以有一套標準去衡量分類法的效能。
none
中文
陳順宇 (2000),多變量分析,華泰書局 二版。
陳健尉 (2000),基因微陣列顯色分析法之簡介及其應用: 二十一世紀基因分析的利器,生物醫學報導,第二期。
英文
Albrecht, A., Vinterbo, S. A., and Ohno-Machado, L. (2003). An Epicurean Learning approach to gene-expression data classification, Artificial Intelligence in Medicine, 28, 75-87.
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine. A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy Sciences of the United States of America, 96, 6745-6750.
Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Broldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Husdson, J. J., Lu,L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Botstein, D., Brown, P. O., and Staudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511.
Antoniadis, A., Lambert-Lacroix, S., and Leblanc, F. (2003). Effective dimension reduction methods for tumor classification using gene expression data, Bioinformatics, 19, 563-570.
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. (2000). Tissue classification with gene expression profiles, Journal of Computational Biology, 7, 559-583.
Breiman, L. (1996). Bagging Predictors, Machine Learning, 24, 123-140.
Cook, R. D. and Lee, H. (1999). Dimension reduction in regressions with a binary response, Journal of American Statistics Association, 94, 1187-1200.
Dudoit, S., Fridlyand, J., and Speed, T. (2002). Comparison of discrimination methods for the classification of tumor using gene expression data, Journal of American Statistical Association, 97, 77-87.
Dudoit, S., Laan, M., Keles, S., and Cornec, M. (2003). Unified cross-validation methodology for estimator selection and application to genomic, Bulletin of the International Statistical Institute, 54th Session Proceedings, Vol. LX, Book 2, 412-415.
Davies K. (2001). Cracking the Genome Inside the Race to Unlock Human DNA, Simon & Schuster Inc.
Friedman, N., Linial, M., Nachman, I., and Pe'er, D. (2000). Using Bayesian networks to analyze expression data, Journal of Computational Biology, 7, 601-620.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537.
Guyon, I., Weston, J., and Barnhill, M. D. (2000). Gene selection for cancer classification using support vector machines, Machine Learning, 46, 389-422.
Jörnsten, R. and Yu, B. (2003). Simultaneous gene clustering and subset selection for sample classification via MDL, Bioinformatics, 19, 1100-1109.
Khan, J., Wei, J. S., Ringnér, M., Saal, L. H., Ladanyi, M., Westermann, Frank., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, 7, 673-679.
Koller, D. and Sahami, M. (1996). Towards optimal feature selection, Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 284-292
Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M., and Mallick, B. K. (2003). Gene selection: a Bayesian variable selection approach, Bioinfromatics, 19, 90-97.
Li, L., Weinberg, R. C., Darden, T. A., and Pedersen, L. G.. (2001). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics, 17, 1131-1142.
Liu, H., Li, J., and Wong, J. (2002). A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Informatics, 13, 51-60.
Lu, Y. and Han, J. (2003). Cancer classification using gene expression data, Information Systems, 28, 243-268.
Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 39-50.
Park, P., Pagano, M., and Bonetti, M. (2001). A nonparametric scoring algorithm for identifying informative genes from microarray data, Proceedings of the Pacific Symposium on Biocomputing, 6, 52-63.
Quackenbush, J. (2001). Computational analysis of microarray data, Nature Review Genetic, 2, 418-427.
Xia, Y., Tong, H., Li, W. K., and Zhu, L. X. (2002). An adaptive estimation of dimension reduction space, Journal of The Royal Statistical Society Series B, 64, 364-410.
Zhang, H., Yu, C., Singer, B., and Xiong, M. (2001). Recursive partitioning for tumor classification with gene expression microarray data, Proceedings of the National Academy Sciences of the United States of America, 98, 6730-6735.