簡易檢索 / 詳目顯示

研究生: 詹湘漢
Chan, Hsiang-Han
論文名稱: 利用生物資訊學的分析來找尋新穎癌症相關基因之研究
Identification of novel tumor-associated gene (TAG) by bioinformatics analysis
指導教授: 孫孝芳
Sun, H. Sunny
學位類別: 碩士
Master
系所名稱: 醫學院 - 分子醫學研究所
Institute of Molecular Medicine
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 69
中文關鍵詞: 生物資訊致癌基因抑癌基因癌症相關基因資料庫癌症相關基因
外文關鍵詞: oncogene, tumor-associated gene, tumor suppressor gene, TAG database, bioinformatics
相關次數: 點閱:135下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人類基因體序列草稿的完成,使得科學家可以利用電腦計算的方法來預測及分析人們所關注的基因。因而現今在研究基因對影響人類各式各樣的疾病和癌症中,生物資訊法已是其中一項不可缺的工具。利用已知的癌症相關基因之物理特性及功能性區域,可以被用來研究這些基因在癌化過成中所扮演的角色。這個研究的目的是要利用電腦計算搜尋的策略,來辨認新穎的癌症相關基因。在第一階段的研究中,利用PubMed資料庫透過資料探戡的方式來尋找及辨識已知的致癌及抑癌基因。然後再利用一個半自動的資訊擷取系統,到幾個特定資料庫收集目標基因的特定資訊,並儲存到為本次研究所建立的癌症相關基因(TAG)資料庫中。在第二階段的研究中,我們為TAG資料庫設計便利的網頁介面,提供方便的功能幫助使用者搜尋TAG資料庫裏的資訊及分析蛋白質可能與癌化相關的資訊。在第三階段的研究中,我們利用所收集到的癌症相關蛋白質之功能性區域資訊,建立一個加權值量表。最後藉由搜尋比對在人類資料庫中完整的cDNA序列,配合已建立的功能性區域之加權值量表,來辨認出新穎的癌症相關基因。目前在TAG資料庫中總共收集到476個癌症相關基因,並且已經提供了完善的網頁界面。在辨認新穎的癌症相關基因研究中,在比對分析完整的cDNA序列後,有18個侯選基因的加權值大於已建立的致癌基因訓練樣本之最大加權值(19.45)。其中6個是已收集到的致癌基因;1個是已知的致癌基因,但未收錄到TAG資料庫中。其餘11個中,有5個基因是屬於非受體蛋白酪氨酸激酶家族成員。利用這個方法我們可以找出更多跟癌症有相關的基因。

    The completion of human genome sequences is applied to predict and analyze genes of interest through using computational approach. It is now well recognized that bioinformatics is one of the essential tool in studying genes involved in various human diseases including cancer. The available annotations including physical characterization and functional domains of known tumor-related genes thus can be used to study the role of genes involved in carcinogenesis. This study aims to identify novel tumor-associated genes (TAGs) using a computational searching strategy. In the first phase of this study, target genes were identified though text-mining approach in the PubMed database. A semi-automatic information retrieving engine was designed to collect specific information of the target genes from various web resources and store in the TAG database. In the second phase of this study, a user-friendly web interface was designed to provide functions in searching and analysis of TAG data. In the third phase of this study, the TAG information is analyzed by building up domain weighting profiles that can specify the feature of particular type of oncoproteins. Finally, the profile can be used to identify novel TAGs by searching against currently available cDNA sequences in the human genome database. We have collected 476 TAGs in the database and the web interface that provides user-friendly environment for searching information in the TAG database has been built. In finding novel gene of this study, there are 18 candidate genes domain weight profile score higher than maximum score of TAG training set of oncogene (19.45) after analysis cDNA sequence. That including 6 known oncogenes within TAG database, 1 gene is oncogene without TAG database, and 11 known genes without TAG database that including 5 genes belong to Tec family kinase protein. Using this method can help us to finding more tumor-related genes.

    中文摘要 I ABSTRACT II 誌謝 III TABLE OF CONTENT V TABLES VIII FIGURES IX 1. INTRODUCTION 1 2. MATERIALS AND METHODS 7 2.1 TEXT-MIMING 7 2.2 INFORMATION RETRIEVAL SYSTEM 7 2.3 BUILDING TUMOR-ASSOCIATED GENE DATABASE 8 2.3.1 Software 8 2.3.2 Database structure 8 2.3.2.1 Tumor-Associated Gene table 9 2.3.2.2 Gene Ontology table 10 2.3.2.3 Protein Domain table 10 2.3.2.4 Disease-related table 10 2.3.2.5 Domain Weight-Matrix table 11 2.3.2.6 Full Length cDNA table 11 2.4 THE ONCOGENIC DOMAIN ANALYSIS FUNCTION 11 2.4.1 Domain weight-matrix table 11 2.4.2 Calculation of protein weight score 12 2.5 IDENTIFICATION OF NEW TAGS 12 2.5.1 The translation tool 12 2.5.2 The domain analysis tool 13 2.5.3 The training set 13 3. RESULT 14 3.1 TAG DATABASE 14 3.1.1 Tumor-associated gene table 14 3.1.2 Protein domain table 14 3.1.3 Gene ontology table 14 3.1.4 Disease-related table 14 3.1.5 Domain weight-matrix table 15 3.1.6 Full length cDNA table 15 3.2 TAG SEARCHING SYSTEM 15 3.3 TAG ANALYSIS SYSTEM 17 3.3.1 Consensus analysis function 17 3.3.2 Oncogenic domain analysis function 17 3.4 NON-TAG GENE SEARCH 18 3.5 NOVEL TUMOR-ASSOCIATED GENE 18 3.5.1 Identification of novel TAGs 18 3.5.2 The analysis of collected full length cDNA 19 3.5.3 The TAG training set 19 3.5.4 The identification of novel gene searching result 20 3.5.5 Validation of ITK known gene 20 4. DISCUSSION 22 5. REFERENCE 25 6. APPENDIX 42 6.1 TAG USER MENU 42 6.2 DOMAIN WEIGHT-MATRIX TABLE 62

    Apweiler, R., T. K. Attwood, et al. (2001). "The InterPro database, an integrated documentation resource for protein families, domains and functional sites." Nucleic Acids Res 29(1): 37-40.
    Attwood, T. K. (2002). "The PRINTS database: a resource for identification of protein families." Brief Bioinform 3(3): 252-63.
    Baasiri, R. A., S. R. Glasser, et al. (1999). "The breast cancer gene database: a collaborative information resource." Oncogene 18(56): 7958-65.
    Bairoch, A. (1991). "PROSITE: a dictionary of sites and patterns in proteins." Nucleic Acids Res 19 Suppl: 2241-5.
    Barbacid, M. (1987). "ras genes." Annu Rev Biochem 56: 779-827.
    Barker, W. C., F. Pfeiffer, et al. (1996). "Superfamily classification in PIR-International Protein Sequence Database." Methods Enzymol 266: 59-71.
    Bos, J. L. (1989). "ras oncogenes in human cancer: a review." Cancer Res 49(17): 4682-9.
    Brunati, A. M., A. Donella-Deana, et al. (1991). "Stimulation by NaCl, polylysine and heparin of two forms of spleen tyrosine protein kinase immunologically related with the protein expressed by lyn oncogene." Biochim Biophys Acta 1091(1): 123-6.
    Chen, Y., Y. H. Zhao, et al. (2004). "Genome-wide search and identification of a novel gel-forming mucin MUC19/Muc19 in glandular tissues." Am J Respir Cell Mol Biol 30(2): 155-65.
    Corpet, F., J. Gouzy, et al. (1998). "The ProDom database of protein domain families." Nucleic Acids Res 26(1): 323-6.
    Gough, J., K. Karplus, et al. (2001). "Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure." J Mol Biol 313(4): 903-19.
    Haft, D. H., B. J. Loftus, et al. (2001). "TIGRFAMs: a protein family resource for the functional identification of proteins." Nucleic Acids Res 29(1): 41-3.
    Hainaut, P., T. Hernandez, et al. (1998). "IARC Database of p53 gene mutations in human tumors and cell lines: updated compilation, revised formats and new visualisation tools." Nucleic Acids Res 26(1): 205-13.
    Henikoff, J. G., E. A. Greene, et al. (2000). "Increased coverage of protein families with the blocks database servers." Nucleic Acids Res 28(1): 228-30.
    Homma, N., G. Tamura, et al. (2006). "Spreading of methylation within RUNX3 CpG island in gastric cancer." Cancer Sci 97(1): 51-6.
    Kaukonen, J., E. R. Savolainen, et al. (1999). "Human Emt tyrosine kinase is specifically expressed both in mature T-lymphocytes and T-cell associated hematopoietic malignancies." Leuk Lymphoma 32(5-6): 513-22.
    Kim, N., P. Kim, et al. (2006). "ChimerDB--a knowledgebase for fusion sequences." Nucleic Acids Res 34(Database issue): D21-4.
    Knutsen, T., V. Gobu, et al. (2005). "The interactive online SKY/M-FISH & CGH database and the Entrez cancer chromosomes search database: linkage of chromosomal aberrations with the genome sequence." Genes Chromosomes Cancer 44(1): 52-64.
    Kohl, N. E., C. E. Gee, et al. (1984). "Activated expression of the N-myc gene in human neuroblastomas and related tumors." Science 226(4680): 1335-7.
    Lander, E. S., L. M. Linton, et al. (2001). "Initial sequencing and analysis of the human genome." Nature 409(6822): 860-921.
    Landis, C. A., S. B. Masters, et al. (1989). "GTPase inhibiting mutations activate the alpha chain of Gs and stimulate adenylyl cyclase in human pituitary tumours." Nature 340(6236): 692-6.
    Levine, A. E. and D. L. Steffen (2001). "OrCGDB: a database of genes involved in oral cancer." Nucleic Acids Res 29(1): 300-2.
    Li, Q. L., K. Ito, et al. (2002). "Causal relationship between the loss of RUNX3 expression and gastric cancer." Cell 109(1): 113-24.
    Nore, B. F., P. T. Mattsson, et al. (2003). "Identification of phosphorylation sites within the SH3 domains of Tec family tyrosine kinases." Biochim Biophys Acta 1645(2): 123-32.
    Packer, B. R., M. Yeager, et al. (2004). "SNP500Cancer: a public resource for sequence validation and assay development for genetic variation in candidate genes." Nucleic Acids Res 32(Database issue): D528-32.
    Porter, A. C. and R. R. Vaillancourt (1998). "Tyrosine kinase receptor-activated signal transduction pathways which lead to oncogenesis." Oncogene 17(11 Reviews): 1343-52.
    Raetz, E. A., S. L. Perkins, et al. (2006). "Gene expression profiling reveals intrinsic differences between T-cell acute lymphoblastic leukemia and T-cell lymphoblastic lymphoma." Pediatr Blood Cancer 47(2): 130-40.
    Robertson, S. C., J. Tynan, et al. (2000). "RTK mutations and human syndromes: when good receptors turn bad." Trends Genet 16(8): 368.
    Rowley, J. D. (1973). "Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining." Nature 243(5405): 290-3.
    Schultz, J., F. Milpetz, et al. (1998). "SMART, a simple modular architecture research tool: identification of signaling domains." Proc Natl Acad Sci U S A 95(11): 5857-64.
    Schutte, B. C., J. P. Mitros, et al. (2002). "Discovery of five conserved beta -defensin gene clusters using a computational search strategy." Proc Natl Acad Sci U S A 99(4): 2129-33.
    Sedlacek, Z., R. Kodet, et al. (1998). "A database of germline p53 mutations in cancer-prone families." Nucleic Acids Res 26(1): 214-5.
    Sonnhammer, E. L., S. R. Eddy, et al. (1997). "Pfam: a comprehensive database of protein domain families based on seed alignments." Proteins 28(3): 405-20.
    Steffen, D. L., A. E. Levine, et al. (2000). "Digital reviews in molecular biology: approaches to structured digital publication." Bioinformatics 16(7): 639-49.
    Summy, J. M. and G. E. Gallick (2003). "Src family kinases in tumor progression and metastasis." Cancer Metastasis Rev 22(4): 337-58.
    Superti-Furga, G. and S. A. Courtneidge (1995). "Structure-function relationships in Src family and related protein tyrosine kinases." Bioessays 17(4): 321-30.
    Tanaka, N., H. Asao, et al. (1993). "A novel human tyrosine kinase gene inducible in T cells by interleukin 2." FEBS Lett 324(1): 1-5.
    Venkatachalam, S., Y. P. Shi, et al. (1998). "Retention of wild-type p53 in tumors from p53 heterozygous mice: reduction of p53 dosage can promote cancer formation." Embo J 17(16): 4657-67.
    Vihinen, M., P. T. Mattsson, et al. (2000). "Bruton tyrosine kinase (BTK) in X-linked agammaglobulinemia (XLA)." Front Biosci 5: D917-28.
    Wodarz, D. and N. Komarova (2005). Computational biology of cancer: lecture notes and mathematical modeling, World Scientific Publishing.
    Zdobnov, E. M. and R. Apweiler (2001). "InterProScan--an integration platform for the signature-recognition methods in InterPro." Bioinformatics 17(9): 847-8.

    下載圖示 校內:2007-07-31公開
    校外:2007-07-31公開
    QR CODE