簡易檢索 / 詳目顯示

研究生: 蔡沛蓁
Tsai, Pei-Chen
論文名稱: 基於病理切片預測大腸癌患者之預後
Predicting Colon Adenocarcinoma Prognosis by Histopathology
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 50
中文關鍵詞: 存活分析病理切片基因表現卷積神經網絡聚類分析統計分析
外文關鍵詞: Survival Analysis, Histopathology, Gene expression, Machine Learning, Convolutional Neural Networks, Statistics, Clustering
相關次數: 點閱:90下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 惡性腫瘤為現今台灣十大死因之首,其中大腸直腸癌發生人數又蟬聯了十三年之冠。對此,大腸直腸癌的早期診斷與治療,乃是目前醫界努力的目標。在臨床診斷上,醫師主要藉由患者的病理切片推斷癌症期別,進而推估病患預估的存活天數。近年來已有許多研究發現,大腸直腸癌發生原因及存活主要受基因變異及環境因子所影響,但卻沒有任何一篇研究專攻於大腸直腸癌。因此,本實驗希望能透過病理切片影像及基因相關資訊,一同預測患者的存活天數。而本實驗以改良傳統卷積神經網路的模型為基礎,並使用基於統計學上韋伯分布的理論去定義損失函數,最後搭配一致性指數、對數秩檢定及K-M估計器為模型進行評估。如此一來,此方法能克服傳統統計上族群泛化困難的弊病,又能解決癌症期別推斷存活天數的不足,並且集成機器學習及統計學上的存活分析,成為一種新興的生存天數預測的衡量指標。同時,本方法也為未來其他癌症的存活分析開始了一扇大門。
    我們在實驗中也發現到,若只使用病理切片作為模型的訓練資料,存活天數的預測就能達到一定的效果。因此,本實驗還針對基因相關資訊進行聚類,去找出它與病理切片之間的關聯性並加以分析。最終,我們從實驗結果發現,在某些基因資訊的表徵是能從病理切片上觀測出來的。

    Cancer is the first of the ten leading causes of death. Although the colorectal cancers have been the highest for thirteen years, the early diagnosis and treatment of colorectal cancer is important in medicine. In clinical diagnosis, doctors mainly infer the cancer stage from the patient's pathology images, and then estimate the patient's survival days. In recent years, many studies have shown that genetic variation and environmental factors lead to cancer a lot, but there is no paper focused on colorectal cancer to predict survival. Therefore, our research hopes to predict the survival days of patients through the pathology image and genetic information. Our research applies the Xception to be the backbone model. Besides, using the Weibull distribution theory to define the loss function and three kinds of metrics, including c-index, logrank test and the Kaplan-Meier curve to evaluate the model. In this way, it can not only overcome the difficulty of generalization of ethnic groups, but also solve the deficiency of inferring survival days solely by cancer stage. Moreover, it integrates both machine learning and statistical methods to predict survival. Finally, it becomes a new way to predict survival and contribute to different cancers.
    At the same time, we find out that if we only use pathology images to train the model, the survival prediction can be achieved well enough. Therefore, our research also clusters omics data to find out and analyze the correlation between it and pathology images. In the end, we find out that some omics data can be observed from pathology images.

    中文摘要 I Abstract II 誌謝 IV Contents V List of Figures VII List of Tables IX Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Research Objectives 3 1.4 Thesis Organization 4 Chapter 2 Related Work 5 2.1 Survival Analysis 5 2.1.1 Life Table 6 2.1.2 Weibull Distribution (2-parameter) 6 2.1.3 Proportional Hazards Model 7 2.2 Preprocessing in Pathology Image 8 2.3 Machine Learning in Pathology Image 9 2.4 Machine Learning in Omics Data 9 2.5 Pathology Image Predict Survival 10 Chapter 3 Survival model with deep learning-based and statistical methods 12 3.1 Overview 12 3.2 Preprocessing of Pathology Image 13 3.3 Feature Extraction from Pathology Image 15 3.4 Feature Extraction from Omics Data 17 3.4.1 Proteomics 17 3.4.2 Transcriptomics 18 3.5 Preprocessing of Clinical Data 20 3.6 Weibull Log-Likelihood 20 3.7 Survival Prediction 23 Chapter 4 Predict Cluster of Omics Data from Pathology Image 24 4.1 K-means Clustering 24 4.2 Proteomics 25 4.3 Transcriptomics 25 Chapter 5 Experimental Design and Results 27 5.1 Dataset 27 5.1.1 The Cancer Genome Atlas (TCGA) 27 5.1.2 PLCO (Prostate, Lung, Colorectal and Ovarian) Dataset 27 5.2 Evaluation Metrics 28 5.2.1 Concordance Index (C-index) 28 5.2.2 The Logrank Test 29 5.2.3 Kaplan–Meier Estimator (K-M Curve) 30 5.2.4 The Chi Square Test 31 5.3 Baseline Methods 31 5.4 Experimental Results and Analysis 33 5.4.1 Selecting Feature Extraction Model 33 5.4.2 Overall Survival 35 5.4.3 Disease-Free Survival 40 5.4.4 Predict Cluster of Omics Data from Pathology Image 41 Chapter 6 Conclusion and Future Work 43 6.1 Conclusion 43 6.2 Future Work 44 Reference 45

    [1] A. Madabhushi and G. Lee, “Image analysis and machine learning in digital pathology: Challenges and opportunities,” Medical Image Analysis, vol. 33, pp. 170–175, 2016.
    [2] S. Wang, R. Rong, D. M. Yang, J. Fujimoto, S. Yan, L. Cai, L. Yang, D. Luo, C. Behrens, E. R. Parra, B. Yao, L. Xu, T. Wang, X. Zhan, I. I. Wistuba, J. Minna, Y. Xie, and G. Xiao, “Computational Staining of Pathology Images to Study the Tumor Microenvironment in Lung Cancer,” Cancer Research, vol. 80, no. 10, pp. 2056–2066, 2020.
    [3] J. J. Havel, D. Chowell, and T. A. Chan, “The evolving landscape of biomarkers for checkpoint inhibitor immunotherapy,” Nature Reviews Cancer, vol. 19, no. 3, pp. 133–150, 2019.
    [4] M. F. Roizen, “Hallmarks of cancer: The next generation,” Yearbook of Anesthesiology and Pain Management, vol. 2012, p. 13, 2012.
    [5] W. Weibull, “A Statistical Distribution Function of Wide Applicability,” Journal of Applied Mechanics, vol. 18, no. 3, pp. 293–297, 1951.
    [6] D. G. Kleinbaum, “Introduction to Survival Analysis,” Survival Analysis, pp. 1–44, 1996.
    [7] D. G. Kleinbaum and M. Klein, Survival analysis: a self-learning text. Springer, 2012.
    [8] A. V. Gerasimova, N. V. Maximovich, and N. A. Filippova, “Cohort life tables for a population of the soft-shell clam, Mya arenaria L., in the White Sea,” Helgoland Marine Research, vol. 69, no. 2, pp. 147–158, 2014.
    [9] E. Martinsson, Wtte-rnn: Weibull time to event recurrent neural network, 2017.
    [10] D. R. Cox, “Regression Models and Life-Tables,” Springer Series in Statistics, pp. 527–541, 1992.
    [11] W. Saafin and G. Schaefer, “Pre-processing Techniques for Colour Digital Pathology Image Analysis,” Communications in Computer and Information Science, pp. 551–560, 2017.
    [12] M. Titford, “The long history of hematoxylin,” Biotechnic & Histochemistry, vol. 80, no. 2, pp. 73–78, 2005.
    [13] S. Roy, A. kumar Jain, S. Lal, and J. Kini, “A study about color normalization methods for histopathology images,” Micron, vol. 114, pp. 42–61, 2018.
    [14] P. A. Bautista, N. Hashimoto, and Y. Yagi, “Color standardization in whole slide imaging using a color calibration slide,” Journal of Pathology Informatics, vol. 5, no. 1, p. 4, 2014.
    [15] F. G. Zanjani, S. Zinger, B. E. Bejnordi, J. A. van der Laak, and P. H. de With, “Stain normalization of histopathology images using generative adversarial networks,” 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 2018.
    [16] J. A. Sidey-Gibbons and C. J. Sidey-Gibbons, “Machine learning in medicine: a practical introduction,” BMC Medical Research Methodology, vol. 19, 2019.
    [17] K. Bera, K. A. Schalper, D. L. Rimm, V. Velcheti, and A. Madabhushi, “Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology,” Nature Reviews Clinical Oncology, vol. 16, no. 11, 2019.
    [18] J. N. Kather, A. T. Pearson, N. Halama, D. Jäger, J. Krause, S. H. Loosen, A. Marx, P. Boor, F. Tacke, U. P. Neumann, H. I. Grabsch, T. Yoshikawa, H. Brenner, J. Chang-Claude, M. Hoffmeister, C. Trautwein, and T. Luedde, “Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer,” Nature Medicine, vol. 25, no. 7, pp. 1054–1056, 2019.
    [19] E. Marostica, R. Barber, T. Denize, I. S. Kohane, S. Signoretti, J. A. Golden, and K.-H. Yu, “Development of a Histopathology Informatics Pipeline for Classification and Prediction of Clinical Outcomes in Subtypes of Renal Cell Carcinoma,” Clinical Cancer Research, vol. 27, no. 10, pp. 2868–2878, 2021.
    [20] F. Zhang, S. Yao, Z. Li, C. Liang, K. Zhao, Y. Huang, Y. Gao, J. Qu, Z. Li, and Z. Liu, “Predicting treatment response to neoadjuvant chemoradiotherapy in local advanced rectal cancer by biopsy digital pathology image features,” Clinical and Translational Medicine, vol. 10, no. 2, 2020.
    [21] D. J. Ho, D. V. K. Yarlagadda, T. M. D’Alfonso, M. G. Hanna, A. Grabenstetter, P. Ntiamoah, E. Brogi, L. K. Tan, and T. J. Fuchs, “Deep Multi-Magnification Networks for multi-class breast cancer image segmentation,” Computerized Medical Imaging and Graphics, vol. 88, p. 101866, 2021.
    [22] K.-H. Yu, F. Wang, G. J. Berry, C. Ré, R. B. Altman, M. Snyder, and I. S. Kohane, “Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks,” Journal of the American Medical Informatics Association, vol. 27, no. 5, pp. 757–769, 2020.
    [23] S. Takahashi, K. Asada, K. Takasawa, R. Shimoyama, A. Sakai, A. Bolatkan, N. Shinkai, K. Kobayashi, M. Komatsu, S. Kaneko, J. Sese, and R. Hamamoto, “Predicting Deep Learning Based Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Reverse Phase Protein Array Data,” Biomolecules, vol. 10, no. 10, p. 1460, 2020.
    [24] K.-H. Yu, D. A. Levine, H. Zhang, D. W. Chan, Z. Zhang, and M. Snyder, “Predicting Ovarian Cancer Patients’ Clinical Response to Platinum-Based Chemotherapy by Their Tumor Proteomic Signatures,” Journal of Proteome Research, vol. 15, no. 8, pp. 2455–2465, 2016.
    [25] K.-H. Yu, G. J. Berry, D. L. Rubin, C. Ré, R. B. Altman, and M. Snyder, “Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma,” Cell Systems, vol. 5, no. 6, 2017.
    [26] K.-H. Yu, C. Zhang, G. J. Berry, R. B. Altman, C. Ré, D. L. Rubin, and M. Snyder, “Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features,” Nature Communications, vol. 7, no. 1, 2016.
    [27] Y. Fu, A. W. Jung, R. V. Torne, S. Gonzalez, H. Vöhringer, A. Shmatko, L. R. Yates, M. Jimenez-Linan, L. Moore, and M. Gerstung, “Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis,” Nature Cancer, vol. 1, no. 8, pp. 800–810, 2020.
    [28] M. R. Lamprecht, D. M. Sabatini, and A. E. Carpenter, “CellProfiler™: free, versatile software for automated biological image analysis,” BioTechniques, vol. 42, no. 1, pp. 71–75, 2007.
    [29] X. Zhu, J. Yao and J. Huang, "Deep convolutional neural network for survival analysis with pathological images," 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, pp. 544-547, 2016.
    [30] X. Zhu, J. Yao, F. Zhu, and J. Huang, “WSISA: Making Survival Prediction from Whole Slide Histopathological Images,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [31] M. Satyanarayanan, A. Goode, B. Gilbert, J. Harkes, and D. Jukic, “OpenSlide: A vendor-neutral software foundation for digital pathology,” Journal of Pathology Informatics, vol. 4, no. 1, p. 27, 2013.
    [32] M. Macenko, M. Niethammer, J. S. Marron, D. Borland, J. T. Woosley, Xiaojun Guan, C. Schmitt, and N. E. Thomas, “A method for normalizing histology slides for quantitative analysis,” 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2009.
    [33] J. K. Fawcett and J. E. Scott, “A RAPID AND PRECISE METHOD FOR THE DETERMINATION OF UREA,” Journal of Clinical Pathology, vol. 13, no. 2, pp. 156–159, 1960.
    [34] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [36] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [37] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [38] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [39] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2012.
    [40] Y. Qian and P. C. Woodland, “Very deep convolutional neural networks for robust speech recognition,” 2016 IEEE Spoken Language Technology Workshop (SLT), 2016.
    [41] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
    [42] J. Friedman, T. Hastie, and R. Tibshirani, “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, vol. 33, no. 1, 2010.
    [43] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881–892, 2002.
    [44] G. Yan and T. Greene, “Investigating the effects of ties on measures of concordance,” Statistics in Medicine, vol. 27, no. 21, pp. 4190–4206, Sidey.
    [45] J. M. Bland and D. G. Altman, “The logrank test,” BMJ, vol. 328, no. 7447, p. 1073, 2004.
    [46] R. E. Melchers, “Structural Reliability Analysis and Pre-diction,” 2nd edition, John Wiley and Sons, Chichester, pp. 456, 1999.
    [47] E. L. Kaplan and P. Meier, “Nonparametric Estimation from Incomplete Observations,” Journal of the American Statistical Association, vol. 53, no. 282, pp. 457–481, 1958.
    [48] K. Pearson, “X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 50, no. 302, pp. 157–175, 1900.
    [49] L. van der Maaten and G. Hinton, “Visualizing Data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579--2605, 2008.

    下載圖示 校內:2023-09-01公開
    校外:2023-09-01公開
    QR CODE