簡易檢索 / 詳目顯示

研究生: 簡立銘
Chien, Li-Ming
論文名稱: 蛋白質結合親合度與癌症病人臨床資料之關係
The Relationship between Protein Binding Affinity and the Clinical Data of Cancer Patients
指導教授: 蔣榮先
Chiang, Jung-Hsien
共同指導教授: 林鵬展
Lin, Peng-Chan
楊士德
Yang, Hsih-Te
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 35
中文關鍵詞: 次世代定序單核苷酸變異癌症蛋白質模擬蛋白質交互作用臨床資料
外文關鍵詞: NGS, SNP, cancer, protein-protein interaction, clinical
相關次數: 點閱:115下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在次世代定序技術(NGS)逐漸成熟的現在,基於NGS資料的研究成果如雨後春筍。有些研究在DNA序列資料中尋找單核苷酸多態性(SNP)並利用統計或機器學習的方法將這些突變跟疾病作關聯。或者更進一步的,將SNP轉成突變特徵(mutational signature),企圖從這些特徵中解析疾病的突變路徑。當然,除SNP以外,也有基於其他突變形式的研究(Ex: CNV, INDEL, Structural Variation)。
    然而,我們知道在人體內產生各種化學反應、訊號傳遞的重要角色是蛋白質,這些蛋白質在人體內由DNA生成。而受DNA突變影響,在結構上或序列上產生變化的蛋白質被許多研究發現跟癌症產生、甚至是癌症惡化有很密切的關聯。
    於是,此研究將專注在蒐集蛋白質編碼區域(protein coding region)的SNP,並透過蛋白質結構模擬、蛋白質接合模擬還原蛋白質在三維結構上的交互作用關係的改變。並且將這些變化量化,以建立基於蛋白質交互作用(Protein-Protein Interaction)的病人簡歷。期待可以利用計算的方式找出這個簡歷與病人表現型或病例數據的關連。

    Because Next Generation Sequencing (NGS) technique gets mature these years, there are more and more accomplishments of research based on NGS data analyzing. Some researches correlate disease with Single Nucleotide Polymorphism (SNP), which are found in NGS data. Or, furthermore, they transform these SNPs into Mutational Signature, and try to explain the mutation route of some kinds of diseases. Besides the strategies mentioned above, there are research based on other mutation types, ex: CNV, INDEL, Structural Variation.
    Chemical reaction and physiological signal transmission rely on the attendance of proteins, and these proteins are built according to our DNA sequences. Thus, many researches told us that the occurrence or progression of cancer are strongly related to the structural or sequential alterations on proteins, which are attributed to the mutations on DNA sequences.
    This research will focus on collecting SNP on protein coding regions and showing the changes of relationship among proteins 3D structures with protein structure simulation and proteins docking. These changes will be quantized to build a Protein-Protein Interaction (PPI) profile for each patient. These profiles are expected to discover the relationship to clinical status or phenotypes of patients by methods of in-silico evaluations.

    Chapter 1: Introduction .................................................................................................. 1 1.1 Background ............................................................................................................................. 1 1.2 Aims ........................................................................................................................................ 2 1.3 Organization ............................................................................................................................ 2 Chapter 2: Related Works .................................................................................................... 3 2.1 Previous Studies about Endometrial Carcinoma ..................................................................... 3 2.2 InterPred .................................................................................................................................. 4 2.3 Related Databases ................................................................................................................... 6 2.3.1 STRING ........................................................................................................................... 6 2.3.2 RCSB PDB ....................................................................................................................... 6 2.3.3 NCI GDC ......................................................................................................................... 6 Chapter 3: Methods and Materials ....................................................................................... 7 3.1 Overview ................................................................................................................................. 7 3.2 PPI Retrieving Pipeline ........................................................................................................... 9 3.3 Patients Information .............................................................................................................. 10 3.3.1 Genome Variant for patient ............................................................................................ 10 3.3.2 Patient Clinical Data ....................................................................................................... 10 3.4 Protein Simulation Pipeline ................................................................................................... 11 3.4.1 Basic simulation pipeline ............................................................................................... 11 3.4.2 BLAST ........................................................................................................................... 11 3.4.3 Modeller ......................................................................................................................... 12 3.4.4 TM-align ........................................................................................................................ 12 3.4.5 FiberDock ....................................................................................................................... 13 3.4.6 PPI score and PPI profile ............................................................................................... 13 3.4.7 Wild-type PPI score and the Mutated PPI score ............................................................ 14 3.4.8 PPI score transformation ................................................................................................ 14 3.5 Analysis strategies ................................................................................................................. 15 3.5.1 Fisher’s Exact Test ......................................................................................................... 15 3.5.2 Kaplan-Meier Estimator ................................................................................................. 15 3.5.3 Weka Feature Selection Module .................................................................................... 15 3.5.4 Cluster Patients with K-means ....................................................................................... 16 Chapter 4: Experimental Results ........................................................................................ 17 4.1 Overview of Patient and PPI Panel ....................................................................................... 17 4.2 PPI Mutations and Histology type ........................................................................................ 24 4.3 PPI Mutations and Recurrence Free Survival ....................................................................... 27 4.4 Re-group Patients with Multiple PPI ..................................................................................... 29 Chapter 5: Conclusions and future work ............................................................................ 32 5.1 Conclusions ........................................................................................................................... 32 5.2 Future work ........................................................................................................................... 33 Reference ..............................................................................................34  

    1. Board, P.D.Q.A.T.E., Endometrial Cancer Treatment (PDQ(R)): Patient Version, in PDQ Cancer Information Summaries. 2002, National Cancer Institute (US): Bethesda (MD).
    2. Kong, A., et al., Adjuvant radiotherapy for stage I endometrial cancer. Cochrane Database Syst Rev, 2012(4): p. Cd003916.
    3. Colombo, N., et al., Endometrial cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol, 2013. 24 Suppl 6: p. vi33-8.
    4. Kandoth, C., et al., Integrated genomic characterization of endometrial carcinoma. Nature, 2013. 497(7447): p. 67-73.
    5. Mirabello, C. and B. Wallner, InterPred: A pipeline to identify and model protein-protein interactions. Proteins, 2017. 85(6): p. 1159-1170.
    6. Remmert, M., et al., HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods, 2011. 9(2): p. 173-5.
    7. Webb, B. and A. Sali, Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics, 2014. 47: p. 5.6.1-32.
    8. Zhang, Y. and J. Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res, 2005. 33(7): p. 2302-9.
    9. Gray, J.J., et al., Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J Mol Biol, 2003. 331(1): p. 281-99.
    10. Lensink, M.F., R. Mendez, and S.J. Wodak, Docking and scoring protein complexes: CAPRI 3rd Edition. Proteins, 2007. 69(4): p. 704-18.
    11. Szklarczyk, D., et al., The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res, 2017. 45(D1): p. D362-d368.
    12. Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42.
    13. Grossman, R.L., et al., Toward a Shared Vision for Cancer Genomic Data. N Engl J Med, 2016. 375(12): p. 1109-12.
    14. GDC. Available from: https://portal.gdc.cancer.gov/.
    15. Hovelson, D.H., et al., Development and validation of a scalable next-generation sequencing system for assessing relevant somatic variants in solid tumors. Neoplasia, 2015. 17(4): p. 385-99.
    16. Wang, K., M. Li, and H. Hakonarson, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 2010. 38(16): p. e164-e164.
    17. Mashiach, E., R. Nussinov, and H.J. Wolfson, FiberDock: Flexible induced-fit backbone refinement in molecular docking. Proteins, 2010. 78(6): p. 1503-19.
    18. Mashiach, E., R. Nussinov, and H.J. Wolfson, FiberDock: a web server for flexible induced-fit backbone refinement in molecular docking. Nucleic Acids Res, 2010. 38(Web Server issue): p. W457-61.
    19. Danecek, P., et al., The variant call format and VCFtools. Bioinformatics, 2011. 27(15): p. 2156-8.
    20. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol, 1990. 215(3): p. 403-10.
    21. Download BLAST Software and Databases Documentation. Available from: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download.
    22. BLAST databases. Available from: ftp://ftp.ncbi.nlm.nih.gov/blast/db/.
    23. Webb, B. and A. Sali, Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics, 2016. 54: p. 5.6.1-5.6.37.
    24. Modeller 9.15 Release Notes. Available from: https://salilab.org/modeller/9.15/release.html.
    25. Peto, R. and J. Peto, Asymptotically Efficient Rank Invariant Test Procedures. Journal of the Royal Statistical Society. Series A (General), 1972. 135(2): p. 185-207.
    26. Mantel, N., Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep, 1966. 50(3): p. 163-70.
    27. Linear Rank Tests in Survival Analysis, in Encyclopedia of Biostatistics.
    28. Frank, E., M.A. Hall, and I.H. Witten, The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
    29. Hall, M.A., Correlation-based Feature Selection for Machine Learning. 1999.
    30. Assie, G., et al., Integrated genomic characterization of adrenocortical carcinoma. Nat Genet, 2014. 46(6): p. 607-12.
    31. Sherry, S.T., et al., dbSNP: the NCBI database of genetic variation. Nucleic Acids Res, 2001. 29(1): p. 308-11.
    32. Cheng, D.T., et al., Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J Mol Diagn, 2015. 17(3): p. 251-64.

    下載圖示 校內:2019-09-01公開
    校外:2019-09-01公開
    QR CODE