簡易檢索 / 詳目顯示

研究生: 陳丁群
Chen, Ding-Qun
論文名稱: 以致病基因集為先驗資訊的基因選取方法之研究
A gene selection method based on risk gene set using microarray data
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 78
中文關鍵詞: 基因選取群集分析癌症分類基因微陣列資料致病基因
外文關鍵詞: Cancer classification, clustering analysis, risk gene, gene selection, microarray data
相關次數: 點閱:96下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 基因晶片,是一塊帶有DNA微陣列塗層的特殊玻璃片,研究人員應用基因晶片就可以在一次實驗中分析大量(成千上萬)的基因表現值,這樣的方法有助於生物學家們藉由微陣列的資料了解數萬個基因所代表的意義與其相互關係。然而因為基因微陣列資料具有高維度、小樣本特性卻也是進一步分析研究的困難處,因此有許多學者提出挑選出重要基因集(縮小維度)的基因選取法,不過這些基因選取法大多並未先考慮到有任何與疾病相關的基因可供基因選取參考。而在許多醫學研究不斷的努力下,已經陸續有許多基因被提出與某些特定疾病相關,而且這些致病基因有許多也都可以在基因微陣列資料內找出。綜合以上原因,本研究首先會探討現行的基因選取法是否能選出致病基因與致病基因的分類能力,再來則是將致病基因加入於現行的基因選取法內,以透過致病基因集的先驗資訊減少後續基因挑選上的運算複雜度、提高預測精確度。本研究的研究方法在基因挑選的機制上,將會分兩階段進行,第一階段先進行致病基因集與其它基因的相似度計算,藉此將與致病基因功能性相似的基因去除;第二階段則針對未在第一階段時剔除的其它基因進行挑選與替換,首先會利用群集分析對其它基因做結構性的分群,然後在各群集選出代表性的基因加入最後的基因集或進行替換動作。本研究利用所提出的基因選取法應用在2種特定疾病的4個資料檔中,進行基因選取與分類預測。結果在4個資料檔中,皆有改善或是保持與其它研究結果相當的分類正確率,顯示利用致病基因集來進行縮減資料維度來減少後續基因挑選的運算複雜度是可行的。

    To analyze microarray data, gene selection and clustering analysis are usually applied. The advantages of gene selection are to reduce the time complexity in building classifiers, improve the classification accuracy, and find significant genes for diseases. Clustering analysis can discover co-expressed genes which are likely to have the same biological function. Many gene selection methods have been proposed, but most of them do not consider the risk genes which have been presented in biological study. Our proposed method will consider the risk gene set as prior information for gene selection. It can be divided into two stages. At the first stage, we collect the risk genes from biological reports as the initial candidate gene subset, and remove the highly correlated genes with the risk genes. At the second stage, we apply the quality threshold clustering (QT clustering) on the remaining genes of the first stage, and select the significant genes of every stage in QT clustering to join the candidate gene subset. The final candidate gene subset is then applied into two machine learning classifiers, KNN and SVM, to evaluate its performance. This approach is tested on 4 well-known gene expression data sets for breast cancer and prostate cancer. The experimental results show that our gene selection method outperforms or has similar performance to the methods proposed by previous researches in prediction accuracy.

    摘要 I Abstract II 誌謝 III 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究架構與步驟 3 第二章 文獻探討 4 2.1 基因微陣列 4 2.1.1 基因微陣列資料型態 5 2.1.2 基因微陣列資料的應用 6 2.2 基因選取 6 2.2.1 個別基因排序法 7 2.2.2 組合基因排序法 8 2.3 群集分析 8 2.3.1 基因微陣列資料與群集分析 9 2.3.2 QT分群演算法 10 2.4 利用醫學文獻於基因微陣列資料分析 12 2.5 與疾病相關的致病基因 13 2.5.1 乳癌致病基因 13 2.5.2 前列腺癌致病基因 15 2.6 交互認證法則(cross-validation) 16 第三章 致病基因測試 17 3.1 個別基因排序法 17 3.2 組合基因選取法 18 3.3 K鄰近點分析 19 3.4 實驗結果 19 3.5 小結 25 第四章 研究方法 26 4.1 名詞定義 26 4.2 基因選取架構與描述 27 4.2.1 第一階段去除與致病基因相似的多餘基因 28 4.2.1.1 致病基因取得來源與流程 29 4.2.1.2 基因相似度的衡量 31 4.2.2 第二階段基因選取 32 4.2.2.1 DM計量值 32 4.2.2.2 個別基因排序法 33 4.2.2.3 利用致病基因集的QT分群法與基因選取法 33 4.3 分類演算法 37 4.4 評估流程 37 第五章 實證研究 39 5.1 資料收集與整理 39 5.2 參數設定 41 5.3 實證結果 42 5.3.1 第一階段去除多餘基因結果 42 5.3.2 第二階段基因選取結果 44 5.4 使用不同個別基因排序法與分類器的結果比較 54 5.5 分類正確率比較 58 5.6 不利用致病基因進行基因選取 59 5.7 基因選取結果的重要基因 62 5.8 小結 64 第六章 結論與建議 66 6.1 結論 66 6.2 建議 68 參考文獻 69

    中文
    周正中 (2005),基因微陣列數據分析簡介,台灣醫學,第9卷第5期,622-627。
    張雅芳、黃正仲 (2004),微陣列生物科技,科學發展,第381期,34-41。
    蘇怡寧 (2003),新世紀之基因診斷-由BRCA1/BRCA2乳癌基因檢測談起,台大醫網,1月。
    鄭凱峰 (2004),小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資料癌症分類為例,國立成功大學工業與資訊管理學系碩士班碩士論文。
    許景涵 (2005),以基因微陣列資料探討基因選取方法對分類正確率之影響,國立成功大學工業與資訊管理學系碩士班碩士論文。
    程中慧 (2006),無歸納偏置影響因素的基因選取之研究,國立成功大學資訊管理研究所碩士班碩士論文。
    劉冠良 (2007),以叢集分析與距離測度為基礎之基因選取法,國立成功大學資訊管理研究所碩士班論文


    英文
    Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. (2000). Tissue classification with gene expression profiles, Proceedings of the Fourth Annual International Conference on Computational Molecular Biology , 54-64.
    Boulton, S. J. (2006). Cellular functions of the BRCA tumour-suppressor proteins, Biochemical Society Transactions, 34(5), 633-645.
    Berthon, P., Valeri, A., Cohen-Akenine, A., Drelon, E., Paiss, T., Wöhr, G., Latil, A., Millasseau, P., Mellah, I., Cohen, N., Blanché, H., Bellané-Chantelot, C., Demenais, F., Teillac , P., Duc, A. L., Petriconi , R., Hautmann, R., Chumakov, I., Bachner, L., Maitland, N. J., Lidereau , R., Vogel, W., Fournier, G., Mangin, P., and Cussenot, O. (1998). Predisposing gene for early-onset prostate cancer, localized on chromosome 1q42.2-43, The American Journal of Human Genetics, 62(6), 1416-1424
    Breiman, L. (1996). Bagging predictors, Machine Learning, 24, 123-140.
    Datta, S. and Datta, S. (2006). Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes, BMC Bioinformatics, 7(397).
    DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686.
    Dudoit, S., Fridlyand, J., Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 77-87.
    Dudoit, S., Laan, M., Keles, S., and Cornec, M. (2003). Unified cross-validation methodology for estimator selection and application to genomic, Bulletin of the International Statistical Institute, 54h Session Proceedings, Vol. LX, Book 2, 412-415.
    D’haeseleer, P. (2005). How does gene expression clustering work, Natural Biotechnology, 23(12), 1499-1501.
    Ganguly, A., Leahy, K. ,Marshall, A., Dhulipala, R., Godmilow, L., and Ganguly, T. (1997). Genetic testing for breast cancer susceptibility: frequency of BRCA1 and BRCA2 mutations, Genet Test, 1, 85-90.
    Gormley, M., Dampier, W., Ertel, A., Karacali, B., Tozeren, A. (2007). Prediction potential of candidate biomarker sets idenfified and validated on gene expression data from multiple datasets, published online by BMC Bioinformatics.
    Guyon, I., Weston, J., and Barnhill, S. (2002). Gene selection for cancer classification using support vector machines, Machine Learning, 46, 389-422.
    Hanczar, B., Courtine, M., Benis, A., Hennegar, C., Clement, K., and Zucker, J. D. (2003). Improving classification of microarray data using prototype-based feature selection, ACM SIGKDD Explorations Newsletter, 5, 23-30.
    Heyer, L. J., Kruglyak, S., and Yooseph, S. (1999). Exploring expression data: identification and analysis of coexpressed genes, Genome Research, 9, 1106-1115.
    Huetra, E. B., Duval, B., and Hao, J. K. (2006). A hybrid GA/SVM approach for gene selection and classification of microarray data, Lecture Notes in Computer Science, 3907, 34-44.
    Huusko, P., Ponciano-Jackson, D., Wolf, M., Kiefer, J. A., Azorsar, D. O., Tuzmen, S., Weaver, D., Robbins, C., Moses, T., Allinen, M., Hautaniemi, S., Chen, Y., Elkahloun, A., Basik, M., Bova, G. S., Bubendorf, L., Lugli, A., Sauter, G., Schleutker, J., Ozcelik, H., Elowe, S., Pawson, T., Trent, J. M., Carpten, J. D., Kallioniemi, O., and Mousses, S. (2004). Tumor Suppressor Activity of the EphB2 Receptor in Prostate Cancer, Nature Genetic, 36(9), 979-983.
    Jiang, D., Tang, C., and Zhang, A. (2004). Cluster Analysis for gene expression data: A survey, IEEE Transactions on Knowledge and Data Engineering, 16(11), 1370-1386.
    Jörnsten, R. and Yu, B. (2003). Simultaneous gene clustering and subset selection for sample classification via MDL, Bioinformatics, 19, 1100-1109.
    Kantardzic, M. (2002). Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley and IEEE, New York.
    Kelemen, J. Z., Kertész-Farkas, A., Kocsor, A., and Puskás, L. G. (2006). Kalman filtering for disease-state estimation from microarray data, Bioinformatics, 22(24), 3047-3053
    Lee, M. P.,and Feinberg, A. P. (1997). Aberrant splicing but not mutations of TSG101 in human breast cancer, Cancer Research, 57(15), 3131-3134.
    Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M., and Mallick, B. K. (2003). Gene selection: a Baysian variable selection approach, Bioinformatics, 19, 90-97.
    Li, J., Zhang, C., and Ogihara, M. (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression, Bioinformatics, 20(5), 2429-2437.
    Li, L., Weinberg, R. C., Darden, T. A., and Pedersen, L. G. (2001). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics, 17, 1131-1142.
    Liu, H., Li, J., and Wong, L. (2002). A comparative study of feature selection and multiclass classification methods using gene expression profiles and proteomic patterns, Genome Informatics, 13, 51-60.
    Liu, X., Krishnan, A., and Mondry, A. (2005) . An entropy-based gene selection method for cancer classification using microarray data, BMC Bioinformatics, 6(76).
    Lu, Y. and Han, J. (2003). Cancer classification using gene expression data, Information Systems, 28, 243-268.
    Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 39-50.
    Pujana, M. A., Han , J. D. J., Starita, L. M., Stevens, K. N., Tewari, M., Ahn, J. S., Rennert, G., Moreno, V., Kirchhoff, T., Gold, B., Assmann, V., Elshamy, W. M., Rual, J. F., Levine, D., Rozek, L. S., Gelman, R. S., Gunsalus, K. C., Grennberg, R. A., Sobhian, B., Bertin, N., Venkatesan, K., Guedehoussou, N. A., Sole, X., Hernandez, P., Lazaro, C., Nathanson, K. L., Weber, B. L., Cusick, M. E., Hill, D. E., Offit, K., Livingston, D. M., Gruber, S. B., Parvin, J. D., and Vidal, M. (2007). Network modeling links breast cancer susceptibility and centrosome dysfunction, Natural Genetics, 39, 1338-1349.
    Renwick, A., Thompson, D., Seal, S., Kelly, P., Chagtai, T., Ahmed, M., North, B., Jayatilake, H., Barfoot, R., Spanova, K., McGuffog, L., Evans, D. G., Eccles, D., Easton, D. F., Stratton, M. R., The Breast Cancer Susceptibility Collaboration (UK), and Rahman, N. (2006). ATM mutations that cause ataxia-telangiectasia are breast cancer susceptibility alleles, Nature Genetics, 38,873-875.
    Rożnowski, K., Januszkiewicz-Lewandowska, D., Mosor, M., Pernak, M., Litwiniuk, M., and Nowak, J. (2007). I171V germline mutation in the NBS1 gene significantly increases risk of breast cancer, published online by Breast Cancer Research and Treatment.
    Seal, S., Thompson, D., Renwick, A., Elliott, A., Kelly, P., Barfoot, R., Chagtai, T., Jayatilake, H., Ahmed, M., Spanova, K., North, B., McGuffog, L., Evans, D. G., Eccles, D., Easton, D. F., Stratton, M. R., Rahman, N., and The Breast Cancer Susceptibility Collaboration (UK) (2006). Truncating mutations in the fanconi anemia J gene BRIP1 are low-penetrance breast cancer susceptibility alleles, Natural Genetics, 38(11), 1239-1241.
    Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D’Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff, P. W., Golub, T. R., and Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 1, 203-209.
    Sliwinski, T., Krupa, R., Majsterek, I., Rykala, J., Kolacinska, A., Morawiec, Z., Drzewoski, J., Zadrozny, M., and Blasiak, J. (2005). Polymorphisms of the BRCA2 and RAD51 genes in breast cancer, Breast Cancer Research and Treatment, 94, 105-109.
    Sobhian, B., Shao, G., Lilli, D. R., Culhane, A. C., Moreau, L. A., Xia, B., Livingston, D. M., Greenberg, R. A. (2007). Rap80 targets BRCA1 to specific ubiquitin structures at DNA damage sites, Science, 316, 1198-1202.
    Stacey, S. N., Sulem, P., Johannsson, O. T., Helgason, A., Gudmundsson, J., Kostic, J. P., Kristjansson, K., Jonsdottir, T., Sigurdsson, H., Hrafnkelsson, J., Johannsson, J., Sveinsson, T., Myrdal, G., Grimsson, H. N., Bergthorsson, J. T., Amundadottir, L. T., Gulcher, J. R., Thorsteinsdottir, U., Kong , A., and Stefansson, K. (2006). The BARD1 cys557ser variant and breast cancer risk in Iceland, PLoS Medicine, 3(7), 1103-1113.
    Stanford, J. L., Noonan, E. A., Iwasaki, L., Kolb, S., Chadwick, R. B., Feng, Z., and Ostrander, E. A. (2002). A polymorphism in the CYP17 gene and risk of prostate cancer, Cancer Epidemiology Biomarkers & Prevention, 11(3), 243-247.
    Stuart , R. O., Wachsman, W., Berry, C. C., Wang-Rodriguez, J., Wasserman, L., Klacansky, I., Masys, D., Arden, K., Goodison, S., McClelland, M., Wang, Y., Sawyers, A., Kalcheva, I., Tarin, D., and Mercola, D. (2004). In silico dissection of cell-type-associated patterns of gene expression in prostate cancer, PNAS, 101(2), 615-620
    Tai, F., and Pan, W. (2007). Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data, Bioinformatics, 23(23), 3170-3177.
    Tan, A. C., Naiman, D. Q., Xu, L., Winslow, R. L., and German, D. (2005). Simple decision rules for classifying human cancers from gene expression profiles, Bioinformatics, 21(20), 3896-3904.
    Thomas-Kaskel, A. K., Zeiser, R., Jochim, R., Robbel, C., Schultze-Seemann, W., Waller, C. F., and Veelken, H. (2006). Vaccination of advanced prostate cancer patients with PSCA and PSA peptide-loaded dendritic cells induces DTH responses that correlate with superior overall survival, International Journal of Cancer, 119(10), 2428-2434.
    Tomlins, S. A., Rhodes, D. R., Perner, S., Dhanasekaran, S. M., Mehra, R., Sun, X. W., Varambally, S., Cao, X., Tchinda, J., Kuefer, R., Lee, C., Montie, J. E., Shah, R. B., Pienta, K. J., Rubin, M. A., and Chinnaiyan, A. M. (2005). Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer, Science, 310(5748), 644-648.
    Tryggvadóttir, L., Vidarsdóttir, L., Thorgeirsson, T., Jonasson, J. G., Ólafsdóttir, E. J., Ólafsdóttir, G. H., Rafnar, T., Thorlacius, S., Jonsson, E., Eyfjord, J. E. and Tulinius, H. (2007). Prostate Cancer Progression and Survival in BRCA2 Mutation Carriers, Journal of National Cancer Institute, 99, 929 - 935.
    Su, Y., Murali, T., Pavlovic, V., Schaffer, M., and Kasif, S. (2003). RankGene: identification of diagnostics genes based on expression data, Bioinformatics, 19, 1578-1579.
    Veer, L. J., Dai, H., Vijver, M. J. V., He., Y. D., Hart, A. A., Mao, M., Peterse, H. L., Kooy, K. V. D., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Rober, C., Linsley, P. S., Bernards, R., and Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer, Nature, 415, 530-536.
    Wang, J. N. (2003). A study of multiclass support vector machines, Master Degree Thesis, Department of Information Management, Yuan-Ze University.
    West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J. A., Jr., Marks, J. R., and Nevins, J. R. (2001). Predicting the clinical status of human breast cancer by using expression profiles, Proc Natl Acad Sci USA, 98(20), 11462-11467
    Wong, T. T. and Hsu, C. H. (2008). Two-stage classification methods for microarray data, Expert Systems with Applications, 34(1), 375-383.
    Wooster, R., Bignell, G., Lancaster, J., Swift, S., Seal, S., Mangion, J., Collins, N., Gregory, S., gumbs, C., Micklem, G., Barfoot, R., Hamoudi, R, Patel, S., Rice, C., Biggs, P., Hashim, Y., Smith, A., Connor, F., Arason, A., Gudmundsson, J., Ficenec, D., Kelsell, D., Ford, D., Tonin, P., Bishop, D. T., Spurr, N. K., Ponder, B. A. J., Eeles, R., Peto, J., Devilee, P., Cornlisse, C., Lynch, H., Narod, S., Lenoir, G., Egilsson, V., Barkadottir, R. B., Easton, D. F., Barkadottir, R. B., Easton, D. F., Bentley, D. R., Futreal, P. A., Ashworth, A.,and Stratton, M. R. (1995). Identification of the breast cancer susceptibility gene BRCA2, Nature, 378, 789-792.

    下載圖示 校內:2009-07-11公開
    校外:2009-07-11公開
    QR CODE