| 研究生: |
顏宏宇 Yan, Hung-Yu |
|---|---|
| 論文名稱: |
基於演化樹的相似性比較預測結直腸癌患者的預後 Using a Similarity Comparison of Evolutionary Trees to Predict Prognosis for Colorectal Cancer Patients |
| 指導教授: |
謝孫源
Hsieh, Sun-Yuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 41 |
| 中文關鍵詞: | 癌症演化樹 、結直腸癌 、演化結構樹 、規範的形式轉換 、樹的相似性比較 |
| 外文關鍵詞: | Cancer evolutionary trees, Colorectal cancer, Evolutionary structure trees, Canonical-form transformation, Tree similarity comparison |
| 相關次數: | 點閱:84 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於結構必較的樹的相似性歷史悠久,且已經在多個不同的領域進行了研究。我們將其應用在癌症基因組的演化樹。代表癌症多樣性的癌症演化樹不僅提供了克隆演化的資訊,而且為癌症病患的臨床結果提供了一個視角。在本篇研究,我們考慮了107位接受了癌症組織深度目標測序的結直腸癌患者。基於點突變和序列的插入或刪除的變異等位基因頻率,從基因組測序資料重建了個別癌症病患的演化樹。本研究的主要目的是預測癌症的復發。我們將癌症演化樹的結構映射到有根樹,並開發了規範的形式轉換來解決樹的同構現象,以使每個病患都有獨特的樹結構。我們提供了一種演算法,用於通過計算演化結構樹之間的成本來比較樹的相似性。成本的計算是依照節點的位置,樹的大小(或是節點數),樹的高度,節點的深度,節點的後代數量(以該節點為根的子樹的大小),以及該節點與其他節點的關係。在樹的相似性比較之後,通過K-Means將癌症病患分成兩群。分群資訊表明,演化結構樹與性別和腫瘤侵犯分期有關。包括Random Forest,SVM,Bagging和Boosting等幾種機器學習的策略可用來預測這些癌症病患的復發。其中Boosting模型的最佳準確度為0.667。我們的結果顯示,與僅使用臨床資料相比,結合演化結構樹的分群資訊可以提高性能,且樹的相似性比較可以幫助癌症病患進行預後分析。
The measurement of tree similarity based on structure comparison has been long used in diverse fields. We applied the evolutionary tree method to study the cancer genome. Cancer evolutionary trees, representing cancer diversity, provide information on the clonal evolution and the clinical outcome of cancer patients. This study considered 107 colorectal cancer (CRC) patients who received deep targeted sequencing of cancer tissues. The evolutionary trees of individual cancer patients were reconstructed from genome sequencing data based on variant allele frequencies (VAFs) of point mutations and small insertions or deletions (indels). The main purpose of this study was to predict cancer recurrence. We mapped the structure of a cancer evolutionary tree to a rooted tree and developed a canonical-form transformation for solving tree isomorphism to ensure that each patient has a unique tree structure. We proposed an algorithm for comparing tree similarity in terms of cost calculation between evolutionary structure trees. The cost was calculated using the node position, tree size(or number of nodes), tree height, node depth, number of descendants of the node (the size of the subtree with the node as a root), and relationship of the node with other nodes. After tree similarity comparison, the cancer patients were clustered into two groups through k-means clustering. The clustering information indicated that the evolutionary structure trees were associated with gender and tumor invasion stage. Several machine-learning strategies including random forest, support vector machine (SVM), bagging, and boosting were used to predict cancer recurrence in these patients. The boosting model yielded the highest prediction accuracy of 0.667. Our results revealed that combining the clustering information of evolutionary structure trees increased the prediction performance compared with using clinical information alone, and tree similarity comparison can help in the prognostic analysis of cancer patients.
[1] P. Nowell, “The clonal evolution of tumor cell populations,” Science, vol. 194, no.4260, pp. 23–28, Jan. 1976.
[2] C. Swanton, “Cancer evolution: the final frontier of precision medicine?,” Annals ofOncology, vol. 25, no. 3, pp. 549–551, 2014.
[3] R. Schwartz and A. A. Sch ̈affer, “The evolution of tumour phylogenetics: principlesand practice,” Nature Reviews Genetics, vol. 18, no. 4, pp. 213–229, 2017.
[4] P. J. Campbell, E. D. Pleasance, P. J. Stephens, E. Dicks, R. Rance, I. Goodhead, G.A. Follows, A. R. Green, P. A. Futreal, and M. R. Stratton, “Subclonal phylogeneticstructures in cancer revealed by ultra-deep sequencing,” Proceedings of the NationalAcademy of Sciences, vol. 105, no. 35, pp. 13081–13086, 2008.
[5] A. Schuh, J. Becq, S. Humphray, A. Alexa, A. Burns, R. Clifford, S. M. Feller,R. Grocock, S. Henderson, I. Khrebtukova, Z. Kingsbury, S. Luo, D. Mcbride, L.Murray, T. Menju, A. Timbs, M. Ross, J. Taylor, and D. Bentley, “Monitoring chroniclymphocytic leukemia progression by whole genome sequencing reveals heterogeneousclonal evolution patterns,” Blood, vol. 120, no. 20, pp. 4191–4196, 2012.
[6] M. El-Kebir, L. Oesper, H. Acheson-Field, and B. J. Raphael, “Reconstruction ofclonal trees and tumor composition from multi-sample sequencing data,” Bioinfor-matics, vol. 31, no. 12, pp. i62–i70, 2015.
[7] S. Malikic, A. W. Mcpherson, N. Donmez, and C. S. Sahinalp, “Clonality infer-ence in multiple tumor samples using phylogeny,” Bioinformatics, vol. 31, no. 9, pp. 1349–1356, 2015.
[8] W. Jiao, S. Vembu, A. G. Deshwar, L. Stein, and Q. Morris, “Inferring clonal evolu-tion of tumors from single nucleotide somatic mutations,” BMC Bioinformatics, vol.15, no. 1, p. 35, 2014.
[9] K. Jahn, J. Kuipers, and N. Beerenwinkel, “Tree inference for single-cell data,” 2016.
[10] M. El-Kebir, “SPhyR: tumor phylogeny estimation from single-cell sequencing dataunder loss and error,” Bioinformatics, vol. 34, no. 17, pp. i671–i679, 2018.
[11] S. Miura, L. A. Huuki, T. Buturla, T. Vu, K. Gomez, and S. Kumar, “Computationalenhancement of single-cell sequences for inferring tumor evolution,” Bioinformatics,vol. 34, no. 17, pp. i917–i926, 2018.
[12] E. Letouz ́e, Y. Allory, M. A. Bollet, F. Radvanyi, and F. Guyon, “Analysis of thecopy number profiles of several tumor samples from the same patient reveals thesuccessive steps in tumorigenesis,” Genome Biology, vol. 11, no. Suppl 1, 2010.
[13] H. Zare, J. Wang, A. Hu, K. Weber, J. Smith, D. Nickerson, C. Song, D. Witten, C.A. Blau, and W. S. Noble, “Inferring Clonal Composition from Multiple Sections ofa Breast Cancer,” PLoS Computational Biology, vol. 10, no. 7, 2014.
[14] A. G. Deshwar, S. Vembu, C. K. Yung, G. H. Jang, L. Stein, and Q. Morris, “Phy-loWGS: Reconstructing subclonal composition and evolution from whole-genome se-quencing of tumors,” Genome Biology, vol. 16, no. 1, 2015.
[15] J. S. Farris, “Methods for Computing Wagner Trees,” Systematic Biology, vol. 19,no. 1, pp. 83–92, 1970.
[16] W. M. Fitch, “Toward Defining the Course of Evolution: Minimum Change for aSpecific Tree Topology,” Systematic Zoology, vol. 20, no. 4, p. 406, 1971.
[17] D. Penny, “Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates,Sunderland, Massachusetts.,” Systematic Biology, vol. 53, no. 4, pp. 669–670, 2004.
[18] T. Stijnen, “Maximum Likelihood Estimation Methods,” Encyclopedia of MedicalDecision Making.
[19] N. T. Hobbs and M. B. Hooten, “Markov Chain Monte Carlo,” Bayesian Models,2015.
[20] I. Hajirasouliha, A. Mahmoody, and B. J. Raphael, “A combinatorial approach foranalyzing intra-tumor heterogeneity from high-throughput sequencing data,” Bioin-formatics, vol. 30, no. 12, pp. i78–i86, 2014.
[21] C. A. Miller, B. S. White, N. D. Dees, M. Griffith, J. S. Welch, O. L. Griffith, R.Vij, M. H. Tomasson, T. A. Graubert, M. J. Walter, M. J. Ellis, W. Schierding,J. F. Dipersio, T. J. Ley, E. R. Mardis, R. K. Wilson, and L. Ding, “SciClone:Inferring Clonal Architecture and Tracking the Spatial and Temporal Patterns ofTumor Evolution,” PLoS Computational Biology, vol. 10, no. 8, 2014.
[22] V. Popic, R. Salari, I. Hajirasouliha, D. Kashef-Haghighi, R. B. West, and S. Bat-zoglou, “Fast and scalable inference of multi-sample cancer lineages,” Genome Biol-ogy, vol. 16, no. 1, 2015.
[23] N. Beerenwinkel, R. F. Schwarz, M. Gerstung, and F. Markowetz, “Cancer Evolution:Mathematical Models and Computational Inference,” Systematic Biology, vol. 64, no.1, 2014.
[24] Y. Matsui, A. Niida, R. Uchi, K. Mimori, S. Miyano, and T. Shimamura, “phyC:Clustering cancer evolutionary trees,” PLOS Computational Biology, vol. 13, no. 5,2017.
[25] M. Gerlinger, S. Horswell, J. Larkin, A. J. Rowan, M. P. Salm, I. Varela, R. Fisher,N. Mcgranahan, N. Matthews, C. R. Santos, P. Martinez, B. Phillimore, S. Begum,A. Rabinowitz, B. Spencer-Dene, S. Gulati, P. A. Bates, G. Stamp, L. Pickering, M.Gore, D. L. Nicol, S. Hazell, P. A. Futreal, A. Stewart, and C. Swanton, “Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregionsequencing,” Nature Genetics, vol. 46, no. 3, pp. 225–233, 2014.
[26] J. Zhang, J. Fujimoto, J. Zhang, D. C. Wedge, X. Song, J. Zhang, S. Seth, C.-W.Chow, Y. Cao, C. Gumbs, K. A. Gold, N. Kalhor, L. Little, H. Mahadeshwar, C.Moran, A. Protopopov, H. Sun, J. Tang, X. Wu, Y. Ye, W. N. William, J. J. Lee,J. V. Heymach, W. K. Hong, S. Swisher, I. I. Wistuba, and P. A. Futreal, “In-tratumor heterogeneity in localized lung adenocarcinomas delineated by multiregionsequencing,” Science, vol. 346, no. 6206, pp. 256–259, 2014.
[27] S. Cohen, “Indexing for subtree similarity-search using edit distance,” Proceedingsof the 2013 international conference on Management of data - SIGMOD 13, 2013.
[28] J. Allali and M.-F. Sagot, “Novel Tree Edit Operations for RNA Secondary StructureComparison,” Lecture Notes in Computer Science Algorithms in Bioinformatics, pp.412–425, 2004.
[29] S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu, “Approximate XMLjoins,” Proceedings of the 2002 ACM SIGMOD international conference on Manage-ment of data - SIGMOD 02, 2002.
[30] M. Gerstung, C. Beisel, M. Rechsteiner, P. Wild, P. Schraml, H. Moch, and N.Beerenwinkel, “Reliable detection of subclonal single-nucleotide variants in tumourcell populations,” Nature Communications, vol. 3, no. 1, 2012.
[31] K. Wang, M. Li, and H. Hakonarson, “ANNOVAR: functional annotation of geneticvariants from high-throughput sequencing data,” Nucleic Acids Research, vol. 38, no.16, Mar. 2010.
[32] Y. Xue, C. Wang, H. H. Ghenniwa, and W. Shen, “A new tree similarity measur-ing method and its application to ontology comparison,” 2008 12th InternationalConference on Computer Supported Cooperative Work in Design, 2008.
[33] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validationof cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp.53–65, 1987.
[34] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: SyntheticMinority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol.16, pp. 321–357, Jan. 2002.
[35] T. K. Ho, “Random decision forests,” Proceedings of 3rd International Conferenceon Document Analysis and Recognition.
[36] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140,1996.
[37] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-LineLearning and an Application to Boosting,” Journal of Computer and System Sciences,vol. 55, no. 1, pp. 119–139, 1997.
[38] F. Conforti, L. Pala, V. Bagnardi, T. D. Pas, M. Martinetti, G. Viale, R. D. Gelber,and A. Goldhirsch, “Cancer immunotherapy efficacy and patients sex: a systematicreview and meta-analysis,” The Lancet Oncology, vol. 19, no. 6, pp. 737–746, 2018.
[39] F. Rampen, “Malignant melanoma: Sex differences in response to chemotherapy?,”European Journal of Cancer and Clinical Oncology, vol. 18, no. 1, pp. 107–110, 1982.
[40] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2016,” CA: A CancerJournal for Clinicians, vol. 66, no. 1, pp. 7–30, 2016.