簡易檢索 / 詳目顯示

研究生: 廖翊宏
Liao, Yi-Hong
論文名稱: 基於點擊之互動式實例切割系統
An Interactive Click-based Instance Segmentation System
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 43
中文關鍵詞: 深度學習機器視覺語意切割人機互動互動式實例切割卷積神經網路
外文關鍵詞: deep learning, computer vision, semantic segmentation, human-computer interaction, interactive instance segmentation, convolutional neural networks
相關次數: 點閱:113下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著深度學習的蓬勃發展,圖像切割有了突破性的成長,並能進一步應用於影像編輯、醫學影像分析、人臉切割、自駕系統等領域。實例分割的目的是讓機器能夠辨別各個像素點,並從圖像中切割出物件。本論文採用人機互動之概念,提出了一套基於點擊的互動式實例切割系統,主要希望能藉由使用者所給予的輔助點擊資訊,幫助模型切割出目標物件遮罩,並且對於不滿意的遮罩,能夠添加後續的正點集或負點擊,直到獲取精準的物件遮罩。首先讓使用者點擊出所要預測的目標物體,並產生相應的熱圖,且加強了第一點擊的影響力來獲取更多資訊。接著經由我們提出的網路萃取特徵。我們提出的互動融合及前傳播模塊(IFFM)加在網路的各階段,讓使用者給予的互動資訊,在神經網路的深層仍能持續被利用,並透過使用者迭代的點擊分割出最終結果。此外,為了使訓練資料更具多變性,且讓網路學習修正不滿意的預測遮罩的能力,我們採用隨機與迭代的自動式點擊採樣策略來模擬訓練階段的點擊。根據實驗的測試結果,本系統能僅透過幾次點擊就在大部分的真實與非真實世界的圖像上都獲得精確的物件遮罩。

    In recent years, with the rapid development of deep learning, image segmentation has seen advance growth and can be further applied to image editing, medical image analysis, self-driving systems, etc. The purpose of instance segmentation is to enable the computer to identify and extract all pixels in the object from the image. In this thesis, we propose a click-based interactive instance segmentation system. The main purpose is to help the model segment accurate object masks by using the user-click information and the additional follow-up clicks to obtain accurate object masks gradually. First, the user clicks on the target object and generates a corresponding heat map, which enhances the impact of the click to obtain more information. The feature extraction is performed by our proposed network, which effectively use the proposed interactive fusion and forwarding modules (IFFMs) stage by stage. The IFFMs allow the interactive information to be continuously utilized in the deep neural network. To make the data more versatile and enable the network to learn to correct the unsatisfactory masks, we use random and iterative sampling strategy during training. According to the experimental results, our proposed network can predict the accurate object masks reliably with very few clicks on most of the images.

    摘要 I Abstract II 誌謝 III Contents IV List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 3 1.3 Thesis Organization 4 Chapter 2 Related Work 5 2.1 Interactive Instance Segmentation 5 2.2 High-Resolution Network (HRNet) 6 2.3 Object-Contextual Representations (OCR) 7 2.4 Convolutional Block Attention Module (CBAM) 9 2.5 Efficient Channel Attention (ECA) 10 2.6 First Click Attention (FCA) 11 2.7 Iteratively Trained Interactive Segmentation (ITIS) 12 Chapter 3 The Proposed Interactive Instance Segmentation System 14 3.1 Overview of the Proposed System 15 3.2 Pre-processing Unit 16 3.2.1 Click Simulator 17 3.2.2 Randomly and Iteratively Click Sampling Scheme 18 3.2.3 Heat Map Generator 19 3.3 The Proposed Backbone Network 20 3.4 Interaction Fusion and Forwarding Module (IFFM) 24 3.5 Training Loss Functions 25 3.5.1 Auxiliary Mask Loss 26 3.5.2 Segmentation Loss 26 Chapter 4 Experimental Results 28 4.1 Environmental Settings and Dataset 28 4.2 Ablation Study and Comparing Results 30 4.3 System Implementation 34 Chapter 5 Conclusions 37 Chapter 6 Future Work 38 Reference 39

    [1] C. Rother, V. Kolmogorov and A. Blake, “"GrabCut" interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 309-314, August 2004.
    [2] V. Kwatra, A. Schödl, I. Essa, G. Turk and A. Bobick, “Graphcut textures: Image and video synthesis using graph cuts,” ACM Transactions on Graphics (TOG), vol. 22, no. 3, pp. 277-286, July 2003.
    [3] B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji and J. Malik, “Semantic contours from inverse detectors,” Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 991-998, doi: 10.1109/ICCV.2011.6126343. NOV. 2011.
    [4] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan and C. L. Zitnick, “Microsoft coco: Common objects in context,” Proceedings of the European Conference on Computer Vision (ECCV), pp. 740-755, September 2014.
    [5] D. Martin, C. Fowlkes, D. Tal & J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics.” Proceedings of IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 416-423, July 2001.
    [6] F. Perazzi, J. Pont-Tuset, B. McWilliams, L. V. Gool, M. Gross, A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 724–732, June 2016.
    [7] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, A. Zisserman, “The pascal visual object classes (VOC) challenge,” International Journal of Computer Vision, pp. 303–338, 2009.
    [8] N. Xu, B. Price, S. Cohen, J. Yang, T. Huang, “Deep interactive object selection,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 373–381, June 2016.
    [9] K. Sofiiuk, I. Petrov, O. Barinova, A. Konushin, “F-BRS: Rethinking backpropagating refinement for interactive segmentation,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8623–8632, June 2020.
    [10] K. Sofiiuk, I. A. Petrov & A. Konushin, “Reviving iterative training with mask guidance for interactive segmentation.” arXiv preprint arXiv:2102.06583, 2021.
    [11] V. Gulshan, C. Rother, A. Criminisi, A. Blake, A. Zisserman, “Geodesic star convexity for interactive image segmentation,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3129–3136, June 2010.
    [12] K. K. Maninis, S. Caelles, J. Pont-Tuset, L. V. Gool, “Deep extreme cut: From extreme points to object segmentation,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 616–625, June 2018.
    [13] K. He, G. Gkioxari, P. Dollár and R. Girshick, “Mask r-cnn,” Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961-2969, 2017.
    [14] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff & H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation.” Proceedings of the European Conference on Computer Vision (ECCV), pp. 801-818, 2018.
    [15] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, B. Xiao, “Deep high-resolution representation learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 3349 – 3364, April 2020.
    [16] Y. Yuan, X. Chen, J. Wang, “Object-contextual representations for semantic segmentation,” Proceedings of the European Conference on Computer Vision (ECCV), pp. 173–190, 2020.
    [17] S. Woo, J. Park, J. Y. Lee & I. S Kweon, “Cbam: Convolutional block attention module.” Proceedings of the European Conference on Computer Vision (ECCV), pp. 3-19, 2018.
    [18] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo and Q. Hu, "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks," Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531-11539, June 2020.
    [19] Z. Lin, Z. Zhang, L. Z. Chen, M. M. Cheng and S. P. Lu, "Interactive Image Segmentation with First Click Attention," Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13336-13345, June 2020.
    [20] S. Mahadevan, P. Voigtlaender & B. Leibe, “Iteratively trained interactive segmentation.” arXiv preprint arXiv:1805.04398, 2018.
    [21] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, June 2016.
    [22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556, 2014.
    [23] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, April 2018.
    [24] K. Sofiiuk, O. Barinova, A. Konushin, O. Barinova, “AdaptIS: Adaptive instance selection network,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7355–7363, February 2019.
    [25] T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for Dense Object Detection," Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2999-3007, October 2017.
    [26] W. -D. Jang and C. -S. Kim, “Interactive Image Segmentation via Backpropagating Refinement Scheme,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5292-5301, June 2019.
    [27] Y. Hao et al., “EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow,” Proceedings of IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 1551-1560, October 2021
    [28] G. Song and K. M. Lee, “Bi-Directional Seed Attention Network for Interactive Image Segmentation,” in IEEE Signal Processing Letters, vol. 27, pp. 1540-1544, August 2020.

    下載圖示 校內:2024-08-01公開
    校外:2024-08-01公開
    QR CODE