| 研究生: |
莊庭睿 Zhuang, Ting-Rui |
|---|---|
| 論文名稱: |
使用卷積-曼巴網路於三維低劑量電腦斷層掃描影像進行肺結節偵測 3D LDCT Lung Nodule Detection Using Convolution-Mamba Network |
| 指導教授: |
連震杰
Lien, Jenn-Jier James 郭淑美 Guo, Shu-Mei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 人工智慧科技碩士學位學程 Graduate Program of Artificial Intelligence |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 肺結節 、肺癌 、電腦斷層掃描 、曼巴 、卷積 、狀態空間模型 |
| 外文關鍵詞: | Pulmonary nodule, lung cancer, computed tomography, Mamba, convolution, state space model |
| 相關次數: | 點閱:7 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為提升肺癌早期篩查中低劑量電腦斷層(Low Dose Computed Tomography, LDCT)影像的結節偵測效率與準確性,本研究旨在解決現有深度學習方法的瓶頸。傳統卷積神經網路(CNN)難以捕捉三維全局上下文,而Transformer模型則因其在處理高解析度影像時二次方複雜度所導致的龐大計算成本而應用受限。為解決上述挑戰,我們提出一個高效的單階段三維肺結節偵測框架,其核心為一創新的卷積-曼巴網路(Convolution-Mamba Network)。此架構透過並行雙分支設計,協同融合了卷積卓越的局部特徵提取能力與曼巴(Mamba)模型在建模長程依賴關係時的線性時間複雜度優勢。我們透過通道分離(Channel Split)與重排機制 (Channel Shuffle),實現了兩種互補特徵的深度資訊交互。更重要的是,為解決Mamba模型原生於一維序列的局限性,我們設計了三視圖曼巴 (Tri-View Mamba) 機制。該機制模擬放射科醫師的多視圖閱片習慣,沿著軸狀面(Axial)、冠狀面(Coronal)和矢狀面(Sagittal)三個正交平面進行六路並行雙向掃描,使模型能夠捕獲完整且各向同性的三維空間上下文資訊。在成大醫院的臨床資料集上驗證,本模型於臨床常用工作點(平均每掃描2個假陽性)下,召回率達到83.9%;在8個假陽性時,召回率更高達92.8%。實例分析亦證實,本方法對於先前難以識別的、與周圍組織粘連的困難結節具有更強的辨識能力。本研究成功將曼巴架構的優勢擴展至3D醫學影像偵測任務。對先前方法難以識別的、與周圍組織粘連的困難結節具有更強的辨識能力。綜上所述,本研究成功地將曼巴架構的優勢應用於三維肺結節偵測,並通過的三視圖曼巴和卷積-曼巴模塊解決了其在三維空間資料上的適配性問題。
To enhance the efficiency and accuracy of nodule detection in low-dose computed tomography (LDCT) for early lung cancer screening, this study aims to address the bottlenecks of existing deep learning methodologies. While traditional Convolutional Neural Networks (CNNs) struggle to capture 3D global context, Transformer-based models are constrained by the substantial computational costs associated with their quadratic complexity, limiting their application in high-resolution imaging.
To overcome these challenges, we propose an efficient, one-stage 3D pulmonary nodule detection framework centered on a novel Convolution-Mamba Network. This architecture employs a parallel dual-branch design to synergistically fuse the superior local feature extraction capabilities of convolution with the linear-time complexity advantage of the Mamba model for modeling long-range dependencies. Through channel split and channel shuffle mechanisms, we facilitate deep information exchange between these two complementary feature types. More importantly, to address Mamba's inherent limitation as a 1D sequential model, we designed an innovative Tri-View Mamba mechanism. Inspired by the multi-view reading practice of radiologists, this mechanism performs six-path, parallel, and bidirectional scanning along the three orthogonal planes—axial, coronal, and sagittal—enabling the model to capture complete and isotropic 3D spatial context.
Validated on a clinical dataset from National Cheng Kung University Hospital, our model achieved a recall of 83.9% at a clinically common operating point of 2 false positives per scan, with the recall increasing to 92.8% at 8 false positives per scan. Case studies further demonstrate that our method exhibits a superior recognition capability for challenging nodules, particularly those adhering to surrounding tissues, which were previously difficult to identify. In summary, this study successfully extends the advantages of the Mamba architecture to 3D medical image detection tasks and resolves its spatial adaptation challenges through the proposed Tri-View Mamba and Convolution-Mamba modules.
[1] H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation,” in Proceedings of the European conference on computer vision, pp. 205-218, 2022.
[2] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, “Transunet: Transformers Make Strong Encoders for Medical Image Segmentation, ”arXiv preprint arXiv:2102.04306, 2021.
[3] J. Christensen, A. E. Prosper, C. C. Wu, J. Chung, E. Lee, B. Elicker, A. R. Hunsaker, M. Petranovic, K. L. Sandler, and B. Stiles, “Acr Lung-Rads V2022: Assessment Categories and Management Recommendations, ”Journal of the American College of Radiology, 21(3), pp. 473-488, 2024.
[4] M. Dolejší, and J. Kybic, “Automatic Two-Step Detection of Pulmonary Nodules,” in Proceedings of the Medical Imaging 2007: Computer-Aided Diagnosis, pp. 1093-1104, 2007.
[5] R. Girshick, “Fast R-Cnn,” in Proceedings of the Proceedings of the IEEE international conference on computer vision, pp. 1440-1448, 2015.
[6] A. Gu, and T. Dao, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces, ”arXiv preprint arXiv:2312.00752, 2023.
[7] Z. Guo, L. Zhao, J. Yuan, and H. Yu, “Msanet: Multiscale Aggregation Network Integrating Spatial and Channel Information for Lung Nodule Detection, ”IEEE Journal of Biomedical and Health Informatics, 26(6), pp. 2547-2558, 2021.
[8] A. Hatamizadeh, and J. Kautz, “Mambavision: A Hybrid Mamba-Transformer Vision Backbone,” in Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 25261-25270, 2025.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[10] V. T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. Fischer, and B. Ommer, “Zigma: A Dit-Style Zigzag Mamba Diffusion Model,” in Proceedings of the European conference on computer vision, pp. 148-166, 2024.
[11] T. Huang, X. Pei, S. You, F. Wang, C. Qian, and C. Xu, “Localmamba: Visual State Space Model with Windowed Selective Scan,” in Proceedings of the European Conference on Computer Vision, pp. 12-22, 2024.
[12] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” in Proceedings of the Proceedings of the IEEE international conference on computer vision, pp. 2980-2988, 2017.
[13] Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, J. Jiao, and Y. Liu, “Vmamba: Visual State Space Model, ”Advances in neural information processing systems, 37, pp. 103031-103063, 2024.
[14] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” in Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012-10022, 2021.
[15] J. Mei, M.-M. Cheng, G. Xu, L.-R. Wan, and H. Zhang, “Sanet: A Slice-Aware Network for Pulmonary Nodule Detection, ”IEEE transactions on pattern analysis and machine intelligence, 44(8), pp. 4374-4387, 2021.
[16] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Proceedings of the International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 2015.
[17] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. Van Den Bogaard, P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, and B. Geurts, “Validation, Comparison, and Combination of Algorithms for Automatic Detection of Pulmonary Nodules in Computed Tomography Images: The Luna16 Challenge, ”Medical image analysis, 42, pp. 1-13, 2017.
[18] T. Song, J. Chen, X. Luo, Y. Huang, X. Liu, N. Huang, Y. Chen, Z. Ye, H. Sheng, and S. Zhang, “Cpm-Net: A 3d Center-Points Matching Network for Pulmonary Nodule Detection in Ct Scans,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 550-559, 2020.
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need, ”Advances in neural information processing systems, 30, 2017.
[20] J. Wang, J. Chen, D. Chen, and J. Wu, “Large Window-Based Mamba Unet for Medical Image Segmentation: Beyond Convolution and Self-Attention, ”CoRR, 2024.
[21] Z. Xing, T. Ye, Y. Yang, G. Liu, and L. Zhu, “Segmamba: Long-Range Sequential Modeling Mamba for 3d Medical Image Segmentation,” in Proceedings of the International conference on medical image computing and computer-assisted intervention, pp. 578-588, 2024.
[22] Z. Xu, T. Li, Y. Liu, Y. Zhan, J. Chen, and T. Lukasiewicz, “Pac-Net: Multi-Pathway Fpn with Position Attention Guided Connections and Vertex Distance Iou for 3d Medical Image Detection, ”Frontiers in Bioengineering and Biotechnology, 11, pp. 1049555, 2023.
[23] Y. Yue, and Z. Li, “Medmamba: Vision Mamba for Medical Image Classification, ”arXiv preprint arXiv:2403.03849, 2024.
[24] S. Zheng, L. J. Cornelissen, X. Cui, X. Jing, R. N. Veldhuis, M. Oudkerk, and P. M. van Ooijen, “Deep Convolutional Neural Networks for Multiplanar Lung Nodule Detection: Improvement in Small Nodule Identification, ”Medical physics, 48(2), pp. 733-744, 2021.
[25] D. Zhou, J. Fang, X. Song, C. Guan, J. Yin, Y. Dai, and R. Yang, “Iou Loss for 2d/3d Object Detection,” in Proceedings of the 2019 international conference on 3D vision (3DV), pp. 85-94, 2019.
[26] L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model, ”arXiv preprint arXiv:2401.09417, 2024.