簡易檢索 / 詳目顯示

研究生: 蘇詠捷
Su, Yung-Chieh
論文名稱: 一個基於偽標籤的半監督深度學習網路應用於手術工具分割
A Pseudo-Labeling Semi-Supervised Deep Learning Network for Surgical Tools Segmentation
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 68
中文關鍵詞: 機器學習影像分割手術工具分割人工智慧
外文關鍵詞: Machine Learning, Image Segmentation, Surgical Tool Segmentation, AI
相關次數: 點閱:39下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在內視鏡手術中將手術器械進行即時分割可以有效幫助醫師訓練與了解手術情況,這樣的問題非常適合使用深度學習來進行影像分割。但是,使用深度學習來處理醫學影像時,往往不易得到大量的訓練資料來訓練好模型,這恰好使得模型無法發揮出理論上的性能。

    為了要解決這樣的問題,本研究引用了一個半監督的對比式學習模型架構,使得模型可以同時使用有標記與沒有標記的訓練資料,這樣有效的降低的對於資料集的依賴。本研究額外對沒有標記的資料集施加不同強度的資料增強,利用弱資料增強的結果作為偽標籤來訓練強資料增強時的模型,這樣的改進在提升分割能力的同時,也降低模型容易過度擬合的缺點。此外,在模型架構上,本研究也引入了跨空間的多尺度注意力模塊,保留每個通道上的信息,將通道分組為多個子特徵,使語意特徵在每個特徵組分布均勻。

    實驗結果顯示,本研究提出的方法可以在使用較少的模型參數與較低的計算量下達到了更好的表現。

    In endoscopic surgeries, real-time segmentation of surgical instruments aids surgeons in training and understanding surgical procedures. Although deep learning is highly effective for image segmentation, the limited availability of training data in medical imaging often constrains model performance.

    To address this challenge, this Thesis proposes a semi-supervised contrastive learning framework that leverages both labeled and unlabeled data, reducing dependence on large datasets. Weak augmentation is applied to generate pseudo-labels for training strongly augmented models, thereby improving segmentation accuracy and mitigating overfitting. This Thesis also introduces a cross-spatial multi-scale attention module, which preserves channel information while ensuring the uniform distribution of semantic features across groups.

    Experimental results demonstrate the superiority of the proposed method in terms of performance, parameter efficiency, and computational cost.

    中文摘要 I Abstract II Acknowledge III Contents IV List of Tables VII List of Figures VIII Chapter 1 Introduction 1 Chapter 2 Background and Related Works 3 2.1 Semi-supervised Learning (SSL) 3 2.2 Mean-Teacher 4 2.3 Cross Pseudo Supervision (CPS) 5 2.4 Duo-SegNet 6 2.5 Min-Max similarity 8 2.6 U-Net and U-Net++ 9 2.7 TransUNet 11 2.8 Attention Mechanisms 13 2.8.1 Squeeze and Excitation Attention (SE) 14 2.8.2 Convolutional Block Attention (CBAM) 15 2.8.3 Coordinate Attention (CA) 17 2.8.4 Efficient Multi-Scale Attention (EMA) 19 Chapter 3 Proposed Method 21 3.1 Data Preprocess and Data Augmentation 21 3.1.1 Weak and Strong Augmentation 21 3.1.2 Label and Unlabel Data 23 3.2 Algorithm flow 23 3.2.1 Training Architecture 23 3.3 Proposed Network Architecture 26 3.3.1 Res2net 26 3.3.2 Encoder 27 3.3.3 Decoder 27 3.3.4 Pseudo Label 28 3.3.5 Projector 28 3.4 Loss Function 29 3.5 Total loss 31 Chapter 4 Experiment Results 32 4.1 Experiment Dataset 32 4.1.1 Kvasir-instrument 33 4.1.2 EndoVis’17 35 4.1.3 ART-NET 37 4.1.4 RoboTool 39 4.2 Parameter and Experiment Setting 41 4.3 Evaluation Settings 42 4.4 Experiment Results 43 4.5 Visual Results 46 4.6 Computational Results 49 4.7 Ablation Experiment 50 Chapter 5 Conclusion and Future Work 52 5.1 Conclusion 52 5.2 Future work 52 Reference 53

    [1] Emanuele Colleoni, Philip Edwards, and Danail Stoyanov. Synthetic and real inputs for tool segmentation in robotic surgery. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020.
    [2] Maria Grammatikopoulou, Evangello Flouty, Abdolrahim Kadkhodamohammadi, Gwenolé Quellec, Andre Chow, Jean Nehme, Imanol Luengo, and Danail Stoyanov. Cadis: Cataract dataset for surgical rgb-image segmentation. Medical Image Analysis, 71, 2021.
    [3] Fangbo Qin, Yangming Li, Yun-Hsuan Su, De Xu, and Blake Hannaford. Surgical instrument segmentation for endoscopic vision with data fusion of cnn prediction and kinematic pose. In 2019 International Conference on Robotics and Automation (ICRA), 2019.
    [4] Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, et al. 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426, 2019.
    [5] Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
    [6] Chenyu You, Yuan Zhou, Ruihan Zhao, Lawrence Staib, and James S. Duncan. Simcvd: Simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Transactions on Medical Imaging, 41(9), 2022.
    [7] Frank Klinker. Exponential moving average versus moving exponential average. Mathematische Semesterberichte, 58, 2011.
    [8] Liansheng Wang, Jiacheng Wang, Lei Zhu, Huazhu Fu, Ping Li, Gary Cheng, Zhipeng Feng, Shuo Li, and Pheng-Ann Heng. Dual multiscale mean teacher network for semi-supervised infection segmentation in chest ct volume for covid-19. IEEE Transactions on Cybernetics, 2022.
    [9] Himashi Peiris, Zhaolin Chen, Gary Egan, and Mehrtash Harandi. Duo-segnet: adversarial dual-views for semi-supervised medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. Springer, 2021.
    [10] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 2015.
    [11] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging, 39(6), 2019.
    [12] Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
    [13] A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
    [14] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
    [15] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), 2018.
    [16] Qibin Hou, Daquan Zhou, and Jiashi Feng. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
    [17] Daliang Ouyang, Su He, Guozhong Zhang, Mingzhu Luo, Huaiyong Guo, Jian Zhan, and Zhijie Huang. Efficient multi-scale attention module with cross-spatial learning. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
    [18] Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 2021.
    [19] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    [20] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
    [21] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
    [22] Dor Bank, Noam Koenigstein, and Raja Giryes. Autoencoders. Machine learning for data science handbook: data mining and knowledge discovery handbook, 2023.
    [23] Ziyang Wang and Congying Ma. Dual-contrastive dual-consistency dual-transformer: A semi-supervised approach to medical image segmentation. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2023.
    [24] Ange Lou, Kareem Tawfik, Xing Yao, Ziteng Liu, and Jack Noble. Min-max similarity: A contrastive semi-supervised deep learning network for surgical tools segmentation. IEEE Transactions on Medical Imaging, 42(10), 2023.
    [25] Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven A Hicks, Vajira Thambawita, Enrique Garcia-Ceja, Michael A Riegler, Thomas de Lange, Peter T Schmidt, Håvard D Johansen, et al. Kvasir-instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy. In MultiMedia Modeling: 27th International Conference, MMM 2021, Prague, Czech Republic, June 22–24, 2021, Proceedings, Part II 27. Springer, 2021.
    [26] Md. Kamrul Hasan, Lilian Calvet, Navid Rabbani, and Adrien Bartoli. Detection, segmentation, and 3d pose estimation of surgical tools using convolutional neural networks and algebraic geometry. Medical Image Analysis, 70, 2021.
    [27] Luis C Garcia-Peraza-Herrera, Lucas Fidon, Claudia D’Ettorre, Danail Stoyanov, Tom Vercauteren, and Sébastien Ourselin. Image compositing for segmentation of surgical tools without manual annotations. IEEE transactions on medical imaging, 40(5), 2021.
    [28] Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan Yuille. Deep co-training for semi-supervised image recognition. In Proceedings of the European conference on computer vision (ECCV), 2018.
    [29] Xiaokang Chen, Yuhui Yuan, Gang Zeng, and Jingdong Wang. Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE