簡易檢索 / 詳目顯示

研究生: 毛亮凱
Mao, Liang-Kai
論文名稱: 透過由粗到細的注意力模塊改善單階物件偵測器應用於偵測胸腔 X 光片上的氣管導管與氣管隆突
Detecting Endotracheal Tube and Carina on Portable Supine Chest Radiographs using One-Stage Detector with a Coarse-To-Fine Attention
指導教授: 陳奇業
Chen, Chi-Yeh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 42
中文關鍵詞: 氣管插管物件偵測注意力模塊深度學習
外文關鍵詞: Endotracheal intubation, Object detection, Coarse-to-Fine Attention, Deep learning
相關次數: 點閱:128下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在加護病房 (ICU),當病人無法自主呼吸時,氣管插管是常見的醫療程序之一,而在插管後必須確認氣管導管的位置是否合適,以減少產生併發症的機會。氣管導管 (ETT) 是否錯位,可以藉由氣管導管末端到氣管隆突 (Carina) 之間的距離 (ETT-Carina distance) 來評估。然而,要準確的定位氣管導管末端與氣管隆突,有兩個困難點。首先,在加護病房病人身上,常常會有維持生命或監測生命徵象的器具或設備,這些器具或設備可能會導致神經網路誤判氣管導管末端與氣管隆突的位置。除此之外,在加護病房,常常因為病人的狀況而無法使用成像品質較好的設備,因此改用可攜式的 X 光機搭配仰臥位的前後 X 光視角 (supine AP view),而這種方式取得的 X 光品質通常比較差。雖然先前的論文已針對這些問題提出解法,但是他們大多需要人工設計的模板,而且他們的準確度應該可以再提升。因此,本篇論文的目標,是在無需人工模板的情況下更準確地定位氣管導管末端和氣管隆突以檢測錯位。本篇論文提出來的架構由 FCOS、由粗到細的注意力模塊 (CTFA) 和掩碼分支所組成。CTFA 藉由全局建模式的注意力模塊 (GA) 捕捉長距離的關係,然後再透過尺度式的注意力模塊 (SA) 捕捉區域資訊並調整特徵值的權重,而掩碼分支進一步加強神經網路的骨幹和頸部的特徵。除此之外,本篇論文也透過後處理,選出最佳的位置來代表氣管導管末端與氣管隆突。本篇論文以成功大學附設醫院提供的資料集做實驗,結果在預測導管錯位上取得 88.82% 的準確度,而預測氣管導管末端到氣管隆突的距離上的平均誤差為 5.333 ± 6.240 mm,預測氣管導管末端位置的平均誤差為 4.304 ± 5.526 mm,預測氣管隆突位置的平均誤差為 4.118 ± 3.655 mm。除此之外,外部的資料集也驗證了本篇論文的有效性。

    In intensive care units (ICUs), endotracheal intubation is a common medical procedure when a patient cannot breathe spontaneously. After intubation, the position of the endotracheal tube (ETT) should be checked to avoid complications. The malposition can be detected by the distance between the ETT tip and the Carina (ETT-Carina distance). However, it struggles with a limited performance for two major problems. First, the life-supporting equipment and the external monitoring devices might cause ambiguity in the position of an ETT and Carina. Second, the image quality of chest radiographs obtained in the supine anteroposterior (AP) view with a portable machine is poor compared with the fixed machine. While previous studies proposed some methods to address these problems, they always suffered from the requirements of manual intervention and their effectiveness could be further improved. Therefore, the purpose of this thesis is to locate the ETT tip and the Carina more accurately for detecting the malposition without manual intervention. The proposed architecture is composed of FCOS: Fully Convolutional One-Stage Object Detection, an attention mechanism named Coarse-to-Fine Attention (CTFA), and a mask branch. The CTFA captures long-range relationships by global-modelling attention (GA) and rescales the feature value with local relationships grabbed by scale attention (SA). The mask branch further enhances the feature representation of the ``backbone" and ``neck" in the FCOS. Moreover, a post-process algorithm is adopted to select the final location of the ETT tip and the Carina. Therefore, the malposition can be detected by calculating the distance between the ETT tip and the Carina. With the dataset provided by National Cheng Kung University Hospital, the proposed architecture achieves 88.82% accuracy and the ETT-Carina distance errors are less than 5.333 ± 6.240 mm. Furthermore, the object error of ETT tip and Carina is less than 4.304 ± 5.526 mm and 4.118 ± 3.655 mm respectively. The external validation results also demonstrate the robustness of the proposed method.

    摘要 i Abstract ii 誌謝 iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1. Introduction 1 1.1. Overview 1 1.2. Motivation 1 1.3. Contribution 2 1.4. Organization 3 Chapter 2. Related Work 4 2.1. Object Detection 4 2.1.1. Anchor-based Approach 4 2.1.2. Anchor-free Approach 5 2.2. Attention Mechanism 7 2.2.1. Global Modelling Attention 7 2.2.2. Scale Attention 8 Chapter 3. Method 9 3.1. Overview 9 3.2. Coarse-to-Fine Attention (CTFA) 10 3.2.1. Global-Modeling Attention (GA) 11 3.2.2. Scale Attention (SA) 12 3.3. Mask Branch 14 3.4. Post-process Algorithm 15 3.5. Loss Function 16 Chapter 4. Experiment and Result 19 4.1. Dataset and Evaluation Metrics 19 4.2. Implementation Details 21 4.3. Results 21 4.4. Compare with the SOTA 25 4.5. Ablation Study 27 4.5.1. Structure of GA 27 4.5.2. Structure of SA 28 4.5.3. Compare with Attention Modules 28 4.5.4. Fusion Method 29 4.5.5. Fusing Global Modelling Attention and Scale Attention 30 4.5.6. Mask Branch 30 4.5.7. The Post-process Algorithm 31 4.5.8. Compare with Intensivists 33 4.5.9. Visualization 35 Chapter 5. Conclusion 38 References 39

    [1] Matthew R Bentz and Steven L Primack. Intensive care unit imaging. Clinics in chest medicine, 36(2):219–234, 2015.
    [2] Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018.
    [3] Chi-Yeh Chen, Min-Hsin Huang, Yung-Nien Sun, and Chao-Han Lai. Development of automatic endotracheal tube and carina detection on portable supine chest radiographs using artificial intelligence. arXiv preprint arXiv:2206.03017, 2022.
    [4] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
    [5] Sheng Chen, Min Zhang, Liping Yao, and Wentao Xu. Endotracheal tubes positioning detection in adult portable chest radiography for intensive care unit. International journal of computer assisted radiology and surgery, 11(11):2049–2057, 2016.
    [6] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6569–6578, 2019.
    [7] Shuanglang Feng, Heming Zhao, Fei Shi, Xuena Cheng, Meng Wang, Yuhui Ma, Dehui Xiang, Weifang Zhu, and Xinjian Chen. Cpfnet: Context pyramid fusion network for medical image segmentation. IEEE transactions on medical imaging, 39(10):3008–3018, 2020.
    [8] Maayan Frid-Adar, Rula Amer, and Hayit Greenspan. Endotracheal tube detection and segmentation in chest radiographs using synthetic data. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 784–792. Springer, 2019.
    [9] Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu.
    Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3146–3154, 2019.
    [10] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
    [11] Lawrence R Goodman, Peter A Conrardy, Faye Laing, and MORLEY M Singer. Radiographic evaluation of endotracheal tube position. American Journal of Roentgenology, 127(3):433–434, 1976.
    [12] Ran Gu, Guotai Wang, Tao Song, Rui Huang, Michael Aertsen, Jan Deprest, Sébastien Ourselin, Tom Vercauteren, and Shaoting Zhang. Ca-net: Comprehensive attention con volutional neural networks for explainable medical image segmentation. IEEE transactions on medical imaging, 40(2):699–711, 2020.
    [13] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
    [14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    [15] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141,2018.
    [16] Tim B Hunter, Mihra S Taljanovic, Pei H Tsau, William G Berger, and James R Standen. Medical devices of the chest. Radiographics, 24(6):1725–1746, 2004.
    [17] Zhimin Huo, Hongda Mao, Jane Zhang, Anne-Marie Sykes, Samson Munn, and John Wandtke. Computer-aided detection of malpositioned endotracheal tubes in portable chest radiographs. In Medical Imaging 2014: Computer-Aided Diagnosis, volume 9035, pages 151–156. SPIE, 2014.
    [18] E-Fong Kao, Twei-Shiun Jaw, Chun-Wei Li, Ming-Chung Chou, and Gin-Chung Liu. Automated detection of endotracheal tubes in paediatric chest radiographs. Computer methods and programs in biomedicine, 118(1):1–10, 2015.
    [19] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    [20] Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Lei Li, and Jianbo Shi. Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing, 29:7389–7398, 2020.
    [21] Paras Lakhani. Deep convolutional neural networks for endotracheal tube position and x-ray image classification: challenges and opportunities. Journal of digital imaging, 30(4):460–468, 2017.
    [22] Paras Lakhani, Adam Flanders, and Richard Gorniak. Endotracheal tube position assessment on chest radiographs using deep learning. Radiology: Artificial Intelligence, 3(1):e200026, 2020.
    [23] Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV), pages 734–750, 2018.
    [24] Rui Li, Shunyi Zheng, Ce Zhang, Chenxi Duan, Libo Wang, and Peter M Atkinson. Abcnet: Attentive bilateral contextual network for efficient semantic segmentation of fine-resolution remotely sensed imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 181:84–98, 2021.
    [25] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
    [26] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
    [27] Songtao Liu, Di Huang, et al. Receptive field block net for accurate and fast object detection. In Proceedings of the European conference on computer vision (ECCV), pages 385–400, 2018.
    [28] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
    [29] Shuyu Miao, Shanshan Du, Rui Feng, Yuejie Zhang, Huayu Li, Tianbi Liu, Lin Zheng, and Weiguo Fan. Balanced single-shot object detection using cross-context attentionguided network. Pattern Recognition, 122:108258, 2022.
    [30] KC Ong, P Eng, YY Ong, et al. Ideal endotracheal tube placement by referencing measurements on the tube. Annals of the Academy of Medicine, Singapore, 25(4):550–552, 1996.
    [31] Bharath Ramakrishna, Matthew Brown, Jonathan Goldin, Christopher Cagnon, and Dieter Enzmann. An improved automatic computer aided tube detection and labeling system on chest radiographs. In Medical Imaging 2012: Computer-Aided Diagnosis, volume 8315, page 83150R. International Society for Optics and Photonics, 2012.
    [32] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards realtime object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
    [33] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 658–666, 2019.
    [34] Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, and Hongsheng Li. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3531–3539, 2021.
    [35] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video superresolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883,2016.
    [36] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    [37] Christian Sitzwohl, Angelika Langheinrich, Andreas Schober, Peter Krafft, Daniel I Sessler, Harald Herkner, Christopher Gonano, Christian Weinstabl, and Stephan C Kettner. Endobronchial intubation detected by insertion depth of endotracheal tube, bilateral auscultation, or observation of chest movements: randomised trial. Bmj, 341, 2010.
    [38] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi.
    Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 4278–4284. AAAI Press, 2017.
    [39] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
    [40] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: Fully convolutional onestage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9627–9636, 2019.
    [41] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: A simple and strong anchorfree object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):1922–1933, 2022.
    [42] Manu Varshney, Kavita Sharma, Rakesh Kumar, and Preeti G Varshney. Appropriate depth of placement of oral endotracheal tube and its possible determinants in indian
    adult patients. Indian journal of anaesthesia, 55(5):488, 2011.
    [43] Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. Cspnet: A new backbone that can enhance learning capability of cnn. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 390–391, 2020.
    [44] Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
    [45] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE