簡易檢索 / 詳目顯示

研究生: 沈彥呈
Shen, Yen-Cheng
論文名稱: 基於 SE-Unet 結合 Res-Net 與雙注意力機制模型之聲門語意分割
The Glottis Semantic Segmentation Model Based on SE-UNet and Res-Net With Dual-Attention Mechanism
指導教授: 楊竹星
Yang, Chu-Sing
共同指導教授: 謝錫堃
Shieh, Ce-Kuen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 56
中文關鍵詞: 醫學影像辨識語意分割雙注意力機制深度學習
外文關鍵詞: Medical Image Recognition, Semantic Segmentation, Dual Attention Mechanism, Deep Learning
相關次數: 點閱:124下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 醫療影像辨識是近年來人們一直在探討的技術,對於部份的疑難雜症,仍須耗費龐大的醫療資源進行病症分析,如何降低醫療資源浪費的問題便產生;醫療影像辨 識也能進行初步的篩檢,不僅能幫助醫療資源較為匱乏的鄉鎮,還能引導新進的醫 療人員,幫助其學習相關病症。近幾年來機器學習、深度學習等技術日新月異,有 許多學者使用病患的資料以及醫師的診斷來進行模型訓練,希望藉此得到強而有力 的模型來做影像辨識,本文也將朝向模型架構的方向深入探索。
    研讀過往的研究之後,發現Unet的架構經常被應用於醫療影像辨識的領域中, 由於編碼-解碼(Encoder-Decoder)模型使Unet有出色的表現,加上其架構易於更 改,便能做許多機制上的嘗試,舉凡加入注意力機制,增強特徵在擷取時的能力, 或是新增殘差塊(Residual block),盡可能保留圖片最完整的資訊,同時遏止過度凝合的狀況發生等,本文主要偵測關於聲門的資料,屬於小物件偵測的問題,故需 要更多能夠精準提取特徵的機制,本文提出了結合Unet、殘差網路塊以及雙注意力 機制(Dual-Attention Mechanism)的模型架構,並在後續的實驗分別去測試在不同機制下的準確度,以及與其他架構的比較,來證明本文提出架構之正確性。
    實驗結果表明,由於雙注意力機制以及殘差網路塊的特性,分別讓網路具有更準確的權重去訓練,還具備保留輸入圖片的資訊的能力,讓訓練成果更加明朗。

    The paper propose the URSD model for addressing the semantic segmentation challenge in glottis image. The model employs a U-Net architecture as its foundation, integrating the Encoder-Decoder structure with the Bottleneck from the Residual Network. This fusion not only preserves feature information but also reduces parameter count, optimizing efficiency.Furthermore, the skip-connections were replaced with SENet, and the feature enhancement mechanism of SENet was substituted with a dual attention mechanism. This approach preserves the features of skip-connections while simultaneously strengthening the capability for feature extraction.As a result, this approach demonstrates the capacity to achieve robust feature extraction and retention through skip-connections. Ultimately, the proposed URSD model exhibits promising performance on open datasets such as BAGLS.

    中文摘要 I Abstract II 致謝 VI 目錄 VII 表目錄 X 圖目錄 XI 第一章 緒論 1 1-1. 研究背景 1 1-2. 研究動機及目的 2 第二章 相關研究及背景知識 5 2-1. 人工智慧時代下的電腦視覺 5 2-2. Image Classification Model 5 2-3. Semantic Segmentation Model 9 2-4. Attention Mechansim 11 2-5. 關於聲門切割之研究 12 2-6. U-Net base with Residual block or Attention Mechansim 13 2-7. SE-Net 15 第三章 URSD架構設計 18 3-1. URSD系統架構 18 3-2. Dual Attention Mechanism 19 3-2.1 Position Attention Module 20 3-2.2 Channel Attention Module 22 3-3 Initialization 23 3-3.1 Constant Intialization 23 3-3.2 Random Initialization 24 3-3.3 Xavier Initialization 25 3-3.4 Kaiming Initialization 25 3-4. Loss Function 26 3-5. 架構配置細節 26 3-5.1 Bottleneck 26 3-5.2 UP Residual Block 28 3-5.3 Concatenated Layer 30 第四章 實驗設置與結果分析 31 4-1. 實驗環境建置與資料集 31 4-2. 不同模型之比較 35 4-3. 不同Data Set實驗 40 4-4. 消融實驗(Ablation Experiment) 42 4-4.1 有無Kaiming Initialization之實驗 43 4-4.2 機制的有效性 44 第五章 結論與未來展望 47 5-1. 結論 47 5-2. 未來展望 48 參考文獻 49

    [1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,pp. 770–778.
    [2] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
    [3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing
    systems, vol. 30, 2017.
    [4] S. Allin, J. Galeotti, G. Stetten, and S. Dailey, “Enhanced snake based segmentation of vocal folds,” in 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro (IEEE Cat No. 04EX821), 2004, pp. 812–815 Vol. 1.
    [5] Y. Yan, X. Chen, and D. Bless, “Automatic tracing of vocal-fold motion from high- speed digital images,” IEEE Transactions on Biomedical Engineering, vol. 53, no. 7, pp. 1394–1400, 2006.
    [6] A. Skalski, T. Zielinki, and D. Deliyski, “Analysis of vocal folds movement in high speed videoendoscopy based on level set segmentation and image registration,” in 2008 International Conference on Signals and Electronic Systems, 2008, pp. 223–226.
    [7] J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Medical image analysis, vol. 11, no. 4, pp. 400–413, 2007.
    [8] J. Unger, D. J. Hecker, M. Kunduk, M. Schuster, B. Schick, and J. Lohscheller, “Quantifying spatiotemporal properties of vocal fold dynamics based on a multiscale analysis of phonovibrograms,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 9, pp. 2422–2433, 2014.
    [9] Y. Yan, D. Bless, and X. Chen, “Biomedical image analysis in high-speed laryngeal imaging of voice production,” in 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE, 2006, pp. 7684–7687.
    [10] D. D. Mehta, D. D. Deliyski, T. F. Quatieri, and R. E. Hillman, “Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings.”Journal of Speech, Language, and Hearing Research, vol. 54, no. 1, pp. 47–54, 2011.
    [11] O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 3, pp. 795–806, 2015.
    [12] A. Rahman, S.-H. Salleh, K. Ahmad, and K. Anuar, “Analysis of vocal fold vibrations from high-speed digital images based on dynamic time warping,” International Journal of Computer and Information Engineering, vol. 8, no. 6, pp. 306–309, 2014.
    [13] Y. S. Derdiman and T. Koc, “Deep learning model development with u-net architecture for glottis segmentation,” in 2021 29th Signal Processing and Communications Applications Conference (SIU), 2021, pp. 1–4.
    [14] H. Ding, Q. Cen, X. Si, Z. Pan, and X. Chen, “Automatic glottis segmentation for laryngeal endoscopic images based on u-net,” Biomedical Signal Processing and Control,vol. 71, p. 103116, 2022.
    [15] X. Huang, J. Deng, X. Wang, P. Zhuang, L. Huang, and C. Zhao, “Automatic glottis segmentation method based on lightweight u-net,” in Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, 2022, pp. 89–100.
    [16] I.-M. Chen, P.-Y. Yeh, Y.-C. Hsieh, T.-C. Chang, S. Shih, W.-F. Shen, and C.-L. Chin, “3d vosnet: Segmentation of endoscopic images of the larynx with subsequent generation of indicators,” Heliyon, vol. 9, no. 3, 2023.
    [17] A. Hamad, M. Haney, T. E. Lever, and F. Bunyak, “Automated segmentation of the vocal folds in laryngeal endoscopy videos using deep convolutional regression networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
    [18] N. P. Narendra and P. Alku, “Glottal source information for pathological voice detection,” IEEE Access, vol. 8, pp. 67 745–67 755, 2020.
    [19] M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” Plos one, vol. 15, no. 2, p. e0227791, 2020.
    [20] M. Döllinger, T. Schraut, L. A. Henrich, D. Chhetri, M. Echternach, A. M. Johnson,
    M. Kunduk, Y. Maryn, R. R. Patel, R. Samlan et al., “Re-training of convolutional neural networks for glottis segmentation in endoscopic high-speed videos,” Applied Sciences, vol. 12, no. 19, p. 9791, 2022.
    [21] P. Gómez, A. M. Kist, P. Schlegel, D. A. Berry, D. K. Chhetri, S. Dürr, M. Echternach, A. M. Johnson, S. Kniesburges, M. Kunduk et al., “Bagls, a multihospital benchmark for automatic glottis segmentation,” Scientific data, vol. 7, no. 1, p. 186, 2020.
    [22] R. Groh, S. Dürr, A. Schützenberger, M. Semmler, and A. M. Kist, “Long-term performance assessment of fully automatic biomedical glottis segmentation at the point of care,” Plos one, vol. 17, no. 9, p. e0266989, 2022.
    [23] A. M. Kist, K. Breininger, M. Dörrich, S. Dürr, A. Schützenberger, and M. Semmler, “A single latent channel is sufficient for biomedical glottis segmentation,” Scientific Reports, vol. 12, no. 1, p. 14292, 2022.
    [24] A. M. Kist and M. Döllinger, “Efficient biomedical image segmentation on edgetpus at point of care,” IEEE Access, vol. 8, pp. 139 356–139 366, 2020.
    [25] E. Kruse, M. Döllinger, A. Schützenberger, and A. M. Kist, “Glottisnetv2: Temporal glottal midline detection using deep convolutional neural networks,” IEEE Journal of Translational Engineering in Health and Medicine, vol. 11, pp. 137–144, 2023.
    [26] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual u-net,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018.
    [27] A. Khanna, N. D. Londhe, S. Gupta, and A. Semwal, “A deep residual u-net convolutional neural network for automated lung segmentation in computed tomography images,” Biocybernetics and Biomedical Engineering, vol. 40, no. 3, pp. 1314–1327, 2020.
    [28] Z. Chu, T. Tian, R. Feng, and L. Wang, “Sea-land segmentation with res-unet and fully connected crf,” in IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, 2019, pp. 3840–3843.
    [29] H. Wang and F. Miao, “Building extraction from remote sensing images using deep residual u-net,” European Journal of Remote Sensing, vol. 55, no. 1, pp. 71–85, 2022.
    [30] Z. Jiang, P. Tahmasebi, and Z. Mao, “Deep residual u-net convolution neural networks with autoregressive strategy for fluid flow predictions in large-scale geosystems,” Advances in Water Resources, vol. 150, p. 103878, 2021.
    [31] K. Cao and X. Zhang, “An improved res-unet model for tree species classification using airborne high-resolution images,” Remote Sensing, vol. 12, no. 7, p. 1128, 2020.
    [32] X. Zhao, P. Zhang, F. Song, G. Fan, Y. Sun, Y. Wang, Z. Tian, L. Zhang, and G. Zhang, “D2a u-net: Automatic segmentation of covid-19 ct slices based on dual attention and hybrid dilated convolution,” Computers in biology and medicine, vol. 135, p. 104526,2021.
    [33] P. Tang, C. Zu, M. Hong, R. Yan, X. Peng, J. Xiao, X. Wu, J. Zhou, L. Zhou, and Y. Wang, “Da-dsunet: dual attention-based dense su-net for automatic head-and-neck tumor segmentation in mri images,” Neurocomputing, vol. 435, pp. 103–113, 2021.
    [34] S. Zhao, C. Geng, C. Guo, F. Tian, and X. Tang, “Saru: A self-attention resunet to generate synthetic ct images for mr-only bnct treatment planning,” Medical Physics, vol. 50, no. 1, pp. 117–127, 2023.
    [35] D. Maji, P. Sigedar, and M. Singh, “Attention res-unet with guided decoder for semantic segmentation of brain tumors,” Biomedical Signal Processing and Control, vol. 71, p. 103077, 2022.
    [36] I. Ahmad, Y. Xia, H. Cui, and Z. U. Islam, “Dan-nucnet: A dual attention based framework for nuclei segmentation in cancer histology images under wild clinical conditions,” Expert Systems with Applications, vol. 213, p. 118945, 2023.
    [37] V.-T. Pham, T.-T. Tran, P.-C. Wang, P.-Y. Chen, and M.-T. Lo, “Ear-unet: A deep learning-based approach for segmentation of tympanic membranes from otoscopic images,” Artificial Intelligence in Medicine, vol. 115, p. 102065, 2021.
    [38] M. Yeung, E. Sala, C.-B. Schönlieb, and L. Rundo, “Focus u-net: A novel dual attention-gated cnn for polyp segmentation during colonoscopy,” Computers in biology and medicine, vol. 137, p. 104815, 2021.
    [39] C. Wan, J. Wu, H. Li, Z. Yan, C. Wang, Q. Jiang, G. Cao, Y. Xu, and W. Yang, “Optimized-unet: novel algorithm for parapapillary atrophy segmentation,” Frontiers in Neuroscience, vol. 15, p. 758887, 2021.
    [40] Y. Wei, A. Zeng, X. Zhang, and H. Huang, “Rag-net: Resnet-50 attention gate network for accurate iris segmentation,” IET Image Processing, vol. 16, no. 11, pp. 3057–3066, 2022.
    [41] R. Bi, C. Ji, Z. Yang, M. Qiao, P. Lv, and H. Wang, “Residual based attention-unet combing dac and rmp modules for automatic liver tumor segmentation in ct,” Mathematical Biosciences and Engineering, vol. 19, no. 5, pp. 4703–4718, 2022.
    [42] B. Wang, J. Qin, L. Lv, M. Cheng, L. Li, D. Xia, and S. Wang, “Mlkca-unet: Multiscale large-kernel convolution and attention in unet for spine mri segmentation,” Optik, vol. 272, p. 170277, 2023.
    [43] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
    [44] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3141–3149.
    [45] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6230–6239.
    [46] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2018.
    [47] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial
    intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
    [48] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
    [49] K. Pogorelov, K. R. Randel, C. Griwodz, S. L. Eskeland, T. de Lange, D. Johansen, C. Spampinato, D.-T. Dang-Nguyen, M. Lux, P. T. Schmidt, M. Riegler, and P. Halvorsen, “Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection,” in Proceedings of the 8th ACM on Multimedia Systems Conference, ser. MMSys’17. New York, NY, USA: ACM, 2017, pp. 164–169.
    [50] J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, and F. Vilariño, “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,” Computerized medical imaging and graphics, vol. 43, pp. 99–111, 2015. 51

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE