簡易檢索 / 詳目顯示

研究生: 宋博瑋
Sung, Po-Wei
論文名稱: 基於多解析度網路之互動式實例切割系統
An Interactive Instance Segmentation System Based on Multi-resolution Networks
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 48
中文關鍵詞: 深度學習機器視覺人機互動語意切割互動式實例切割
外文關鍵詞: Deep Learning, Machine Vision, Human-computer Interaction, Semantic Segmentation, Interactive Instance Segmentation
相關次數: 點閱:41下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語意切割的目的是讓機器能夠辨別各個像素點,並從圖像中切割出物件。過去主要仰賴於傳統演算法,但對於複雜的圖像仍難以得到令人滿意的結果。近年來,隨著深度學習的蓬勃發展,語意切割有了突破性的成長,並能進一步應用於機器視覺、人臉切割、醫學診斷等領域。本篇論文採用人機互動之概念,提出了一套基於多解析度網路的互動式實例切割系統,主要希望能藉由使用者所給予的輔助資訊,幫助模型切割出精準的物件遮罩。前處理程序首先讓使用者點擊出所要預測的單一物件,接著經由我們提出的多解析度網路萃取出不同尺度之特徵,在獲取全局與低等級特徵後輸出初始遮罩。最後,我們將初步結果匯入後處理程序以優化出最終的預測遮罩。根據實驗的測試結果,我們提出的網路能夠預測出完整且邊緣銳利的物件遮罩,並在PASCAL VOC 2012驗證集之中得到89.1%的mIoU值。若與其他方法相比,本論文提出的互動式實例切割系統能穩定且精確的預測出各類物件的遮罩。

    The purpose of semantic segmentation is to allow the machine to distinguish each pixel and segment the objects from the image. In the past, the segmentation methods, which mainly relied on the traditional algorithms, still have difficult to obtain satisfactory results for complex images. In recent years, with the vigorous development of deep learning, semantic segmentation has achieved breakthrough growth, and can be further applied to machine vision, facial segmentation, medical diagnosis and other fields. In this thesis, by adopting human-computer interaction, we propose an interactive instance segmentation system based on multi-resolution networks. We mainly hope that the auxiliary information given by the users can help the model to segment the accurate mask of the object. The pre-processing program first allows the user to click on the object to be predicted, and then extract the features with different scales through our proposed multi-resolution network, and outputs the initial mask after obtaining the global and low-level features. Then, we import the preliminary results into the post-processing program to optimize the final predicted mask. According to the experimental results, the proposed network can predict the complete and sharp-edged mask of the object to obtain the mIoU of 89.1% in the PASCAL VOC 2012 validation set. In additions, comparing to other methods, the proposed interactive instance segmentation system in this thesis can predict the masks of various objects stably and accurately.

    摘 要 I Abstract II Contents IV List of Tables VI List of Figures VII Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 3 1.3 Literature Review 4 1.4 Thesis Organization 7 Chapter 2 Related Work 8 2.1 Residual Dense Block 8 2.2 Depthwise Separable Convolution 10 2.3 Atrous (Dilated) Convolution 13 2.4 Deep Extreme Cut (DEXTR) 13 2.5 DeepLab v3+ 15 2.6 High-resolution Network (HRNet) 17 2.7 OTSU Thresholding 19 Chapter 3 The Proposed Interactive Instance Segmentation System 20 3.1 Overview of the Proposed IIS System 21 3.2 Pre-processing Unit 22 3.3 The Proposed Multi-resolution Network 25 3.3.1 Loss Function 31 3.4 Post-processing Unit 31 Chapter 4 Experimental Results 33 4.1 Environmental Settings and Dataset 33 4.2 Ablation Study and Comparing Results 35 4.3 System Implementation 39 Chapter 5 Conclusions 42 Chapter 6 Future Work 43 Reference 44

    [1] Y. Zhang, & L. Wu. (2011). "Optimal multi-level thresholding based on maximum Tsallis entropy via an artificial bee colony approach." Entropy, 13(4), 841-859.
    [2] A. K. Jain. (2010). "Data clustering: 50 years beyond K-means." Pattern Recognition Letters, 31(8), 651-666.
    [3] K. Pearson. "X. Contributions to the mathematical theory of evolution.—II. Skew variation in homogeneous material." Philosophical Transactions of the Royal Society of London.(A.) 186 (1895): 343-414.
    [4] D. Marr, & E. Hildreth. (1980). "Theory of edge detection." Proceedings of the Royal Society of London. Series B. Biological Sciences, 207(1167), 187-217.
    [5] J. Canny, "A Computational Approach to Edge Detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851.
    [6] I. Sobel. (2014). "An Isotropic 3x3 Image Gradient Operator." Presentation at Stanford A.I. Project 1968.
    [7] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 2005, pp. 886-893 vol. 1, doi: 10.1109/CVPR.2005.177.
    [8] D. G. Lowe, "Object recognition from local scale-invariant features," Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 1999, pp. 1150-1157 vol.2, doi: 10.1109/ICCV.1999.790410.
    [9] H. Bay, A. Ess, T. Tuytelaars, & L. Van Gool. (2008). "Speeded-up robust features (SURF)." Computer Vision and Image Understanding, 110(3), 346-359.
    [10] J. A. Suykens, & J. Vandewalle. (1999). "Least squares support vector machine classifiers." Neural Processing Letters, 9(3), 293-300.
    [11] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3431-3440, doi: 10.1109/CVPR.2015.7298965.
    [12] J. Lin, W. Wang, S. Huang and H. Chen, "Learning based semantic segmentation for robot navigation in outdoor environment," 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), Otsu, 2017, pp. 1-5, doi: 10.1109/IFSA-SCIS.2017.8023347.
    [13] S. Saito, T. Li, & H. Li. (2016, October). "Real-time facial segmentation and performance capture from rgb input." In European conference on computer vision (pp. 244-261). Springer, Cham.
    [14] Z. Gu et al., "CE-Net: Context Encoder Network for 2D Medical Image Segmentation," in IEEE Transactions on Medical Imaging, vol. 38, no. 10, pp. 2281-2292, Oct. 2019, doi: 10.1109/TMI.2019.2903562.
    [15] D. Feng et al., "Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges," in IEEE Transactions on Intelligent Transportation Systems, doi: 10.1109/TITS.2020.2972974.
    [16] Y. Freund, & R. E. Schapire. (1995, March). "A desicion-theoretic generalization of on-line learning and an application to boosting." In European conference on computational learning theory (pp. 23-37). Springer, Berlin, Heidelberg.
    [17] J. Deng, W. Dong, R. Socher, L. Li, Kai Li and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848.
    [18] A. Krizhevsky, I. Sutskever, & G. E. Hinton. (2012). "Imagenet classification with deep convolutional neural networks." In Advances in neural information processing systems (pp. 1097-1105).
    [19] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6517-6525, doi: 10.1109/CVPR.2017.690.
    [20] O. Ronneberger, P. Fischer, & T. Brox. (2015, October). "U-net: Convolutional networks for biomedical image segmentation." In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
    [21] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 1 April 2018, doi: 10.1109/TPAMI.2017.2699184.
    [22] K. He, G. Gkioxari, P. Dollár, & R. Girshick. (2017). "Mask r-cnn." In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
    [23] J. Dai, K. He and J. Sun, "Instance-Aware Semantic Segmentation via Multi-task Network Cascades," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 3150-3158, doi: 10.1109/CVPR.2016.343.
    [24] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031.
    [25] Y. Zhang, Y. Tian, Y. Kong, B. Zhong and Y. Fu, "Residual Dense Network for Image Super-Resolution," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 2472-2481, doi: 10.1109/CVPR.2018.00262.
    [26] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, ... & H. Adam. (2017). "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861.
    [27] F. Yu, & V. Koltun. (2015). "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122.
    [28] K. -. Maninis, S. Caelles, J. Pont-Tuset and L. Van Gool, "Deep Extreme Cut: From Extreme Points to Object Segmentation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 616-625, doi: 10.1109/CVPR.2018.00071.
    [29] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, & H. Adam. (2018). "Encoder-decoder with atrous separable convolution for semantic image segmentation." In Proceedings of the European conference on computer vision (ECCV) (pp. 801-818).
    [30] J. Wang et al., "Deep High-Resolution Representation Learning for Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.2983686.
    [31] N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, doi: 10.1109/TSMC.1979.4310076.
    [32] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
    [33] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, ... & C. L. Zitnick. (2014, September). "Microsoft coco: Common objects in context." In European conference on computer vision (pp. 740-755). Springer, Cham.
    [34] G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, "Densely Connected Convolutional Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 2261-2269, doi: 10.1109/CVPR.2017.243.
    [35] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, "Pyramid Scene Parsing Network," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6230-6239, doi: 10.1109/CVPR.2017.660.
    [36] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
    [37] Y. Y. Boykov and M. -. Jolly, "Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images," Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vancouver, BC, Canada, 2001, pp. 105-112 vol.1, doi: 10.1109/ICCV.2001.937505.
    [38] L. Grady, "Random Walks for Image Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1768-1783, Nov. 2006, doi: 10.1109/TPAMI.2006.233.
    [39] N. Xu, B. Price, S. Cohen, J. Yang and T. Huang, "Deep Interactive Object Selection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 373-381, doi: 10.1109/CVPR.2016.47.
    [40] A. Benard, M. Gygli. "Interactive video object segmentation in the wild." arXiv preprint arXiv:1801.00269 (2017).
    [41] J. Liew, Y. Wei, W. Xiong, S. Ong and J. Feng, "Regional Interactive Image Segmentation Networks," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2746-2754, doi: 10.1109/ICCV.2017.297.
    [42] S. Mahadevan, P. Voigtlaender, & B. Leibe. (2018). "Iteratively trained interactive segmentation." arXiv preprint arXiv:1805.04398.
    [43] S. Majumder and A. Yao, "Content-Aware Multi-Level Guidance for Interactive Instance Segmentation," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 11594-11603, doi: 10.1109/CVPR.2019.01187.

    下載圖示 校內:2025-07-20公開
    校外:2025-07-20公開
    QR CODE