| 研究生: |
焦慎 Jiao, Shen |
|---|---|
| 論文名稱: |
基於生成式深度學習之腮腺腫瘤影像分類及分割 Classification and Segmentation of Parotid Tumor Images Based on Generative Deep Learning |
| 指導教授: |
洪昌鈺
HORNG, MING-HUWI |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 84 |
| 中文關鍵詞: | 腮腺腫瘤 、腫瘤分類 、醫學影像分割 、條件式擴散模型 、深度學習 |
| 外文關鍵詞: | Parotid Tumor, Classification, Segmentation, Diffusion Model, Deep Learning |
| 相關次數: | 點閱:18 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
腮腺腫瘤為頭頸部常見之唾液腺腫瘤,涵蓋多種良性與惡性病變,惡性類型臨床表現缺乏特異性,術前準確鑑別對治療與預後至關重要。傳統診斷依賴醫師人工觀察電腦斷層(CT)影像,易受經驗差異影響且效率不彰,且良惡性腫瘤在影像上常高度相似,造成判讀困難。
本研究提出一套基於生成式深度學習的三階段自動分析系統,提升腮腺腫瘤影像之診斷與分割準確度。系統包含:(1) 偵測模型自頭頸部CT影像中定位腮腺區域;(2) 分類模型判別影像中腫瘤存在與否及其良惡性,並採多切片投票提升穩定性;(3) 採用條件式擴散模型Diff-TransUNet進行像素級腫瘤分割,結合卷積網路、Transformer與擴散機制以處理邊界模糊及小病灶。
研究使用成大醫院228位患者之CT影像進行四折交叉驗證。實驗結果顯示,Diff-TransUNet在Dice係數、邊界精度與小腫瘤檢出率皆優於傳統模型,分類亦與臨床診斷高度一致。整體系統模組化設計具高效能與臨床解釋性,可有效減輕人工負擔,提升診斷一致性與客觀性,具潛力應用於其他頭頸部腫瘤之分析。
Parotid tumors are common salivary gland neoplasms in the head and neck region, encompassing various benign and malignant lesions. Malignant types often lack specific clinical manifestations, making accurate preoperative differentiation critical for treatment planning and prognosis. Traditional diagnosis relies on manual interpretation of computed tomography (CT) images by physicians, which is time-consuming and prone to variability due to differences in clinical experience. Moreover, benign and malignant tumors often appear highly similar on CT images, further complicating diagnosis.
This study proposes a three-stage automatic analysis system based on generative deep learning to improve the diagnostic and segmentation accuracy of parotid tumor images. The system consists of: (1) a detection model that localizes the parotid region from head-and-neck CT scans; (2) a classification model that determines the presence of a tumor and further distinguishes between benign and malignant types, enhanced by a multi-slice voting strategy to improve stability; and (3) a segmentation module using the proposed conditional diffusion model, Diff-TransUNet, which combines convolutional networks, Transformers, and diffusion mechanisms to effectively handle blurred boundaries and small lesions.
The study used CT images from 228 patients collected at National Cheng Kung University Hospital and performed four-fold cross-validation. Experimental results showed that Diff-TransUNet outperformed traditional models in Dice coefficient, boundary accuracy, and small tumor detection, while classification results were highly consistent with clinical diagnoses. The modular system design demonstrated high performance and clinical interpretability, effectively reducing manual workload, improving diagnostic consistency and objectivity, and showing strong potential for application to other head and neck tumor analyses.
[1]Raja R. Seethala. An update on grading of salivary gland carcinomas. Head and Neck Pathology, 3(1):69–77, 2009.
[2]K. D. Olsen. Carcinoma of the salivary gland. Surgical Oncology Clinics of North America, 3(4):761–773, 1994.
[3]Lisanne V. van Dijk and Clifton D. Fuller. Artificial intelligence and radiomics in head and neck cancer care: Opportunities, mechanics, and challenges. American Society of Clinical Oncology Educational Book, 41:e225–e235, 2021.
[4]Y. Y. P. Lee, K. T. Wong, A. D. King, and A. T. Ahuja. Imaging of salivary gland tumours. European Journal of Radiology, 66(3):419–436, 2008.
[5]David C. Howlett, K. W. Kesse, D. V. Hughes, and D. F. Sallomi. The role of imaging in the evaluation of parotid disease. Clinical Radiology, 57(8):692–701, 2002.
[6]Xia. Attention-based deep learning for auto- matic parotid tumor segmentation in contrast-enhanced ct. Frontiers in Oncology, 12:1028382, 2022.
[7]Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolu- tional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 565–571. IEEE, 2016.
[8]Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing andComputer-Assisted Intervention (MICCAI), volume 9351 of Lecture Notes in Com- puter Science, pages 234–241. Springer, 2015.
[9]Geert Litjens, Thijs Kooi, Babak E. Bejnordi, Arnaud A. A. Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram van Ginneken, and Clara I. Sa´nchez. A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60–88, 2017.
[10]Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239, 2020.
[11]Julia Wolleb, Fabian Sandku¨hler, Peter C. Cattin, et al. Diffusion models for implicit image segmentation ensembles. arXiv preprint arXiv:2112.03145, 2021.
[12]Roy Amit and Ron Meir. Segdiff: Image segmentation with diffusion probabilistic models. arXiv preprint arXiv:2112.00390, 2021.
[13]Julia Wolleb, Florentin Bieder, Robin Sandk¨uhler, and Philippe C. Cattin. Diffusion models for implicit image segmentation ensembles. In Proceedings of the 5th International Conference on Medical Imaging with Deep Learning (MIDL), volume 172 of Proceedings of Machine Learning Research, pages 1403–1419. PMLR, 2022.
[14]Zhaohu Xing, Liang Wan, Huazhu Fu, Guang Yang, and Lei Zhu. Diff-unet: A diffusion embedded network for volumetric segmentation. Medical Image Analysis, 105:103654, 2025.
[15]Jianpeng Chen, Yongxin Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L. Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
[16]Yutong Xie, Jingwei Zhang, and Chunhua Shen. Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In Medical Image Computingand Computer-Assisted Intervention (MICCAI), volume 12903 of Lecture Notes in Computer Science, pages 171–180. Springer, 2021.
[17]Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger Roth, and Daguang Xu. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 574–584, 2022.
[18]Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
[19]Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Con- ference on Learning Representations (ICLR), 2021.
[20]Ze Liu, Yutong Lin, Yuqi Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
[21]Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 28, 2015.
[22]Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016.
[23]Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Op- timal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
[24]Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Train- able bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696, 2022.
[25]Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. In Computer Vision – ECCV 2016, volume 9905 of Lecture Notes in Computer Science, pages 21–37. Springer, 2016.
[26]Mingxing Tan, Ruoming Pang, and Quoc V. Le. Efficientdet: Scalable and effi- cient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10781–10790, 2020.
[27]Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for con- volutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Re- search, pages 6105–6114. PMLR, 2019.
[28]Xingyi Zhou, Dequan Wang, and Philipp Kra¨henbu¨hl. Objects as points. arXiv preprint arXiv:1904.07850, 2019.
[29]Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Con- volutional block attention module. In Computer Vision – ECCV 2018, volume 11211 of Lecture Notes in Computer Science, pages 3–19. Springer, 2018.
[30]Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv´e J´egou. Training data-efficient image transformers & dis- tillation through attention. In Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 10347–10357. PMLR, 2021.
[31]Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis.In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 8780–8794, 2021.
[32]Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
[33]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, L- ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998–6008, 2017.
[34]Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Fo- cal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2):318–327, 2020.
[35]Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. Unet++: A nested u-net architecture for medical image segmentation. InDeep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (DLMIA/ML-CDS 2018), volume 11045 of Lecture Notes in Computer Science, pages 3–11. Springer, 2018.
[36]Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, and Manning Wang. Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537, 2021.