簡易檢索 / 詳目顯示

研究生: 曾于和
Tseng, Yu-Ho
論文名稱: 結合影像分割及物體辨識演算法的機器視覺研究
Combination of Segmentation and Detection Algorithms for Computer Vision
指導教授: 詹劭勳
Jan, Shau-Shiun
學位類別: 碩士
Master
系所名稱: 工學院 - 航空太空工程學系
Department of Aeronautics & Astronautics
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 59
中文關鍵詞: 計算機視覺物體辨識影像分割
外文關鍵詞: computer vision, object detection, semantic segmentation
相關次數: 點閱:116下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在機器視覺中最主要可以分為影像分割和物體辨識。目前的機器視覺是以提高影像分割或物體辨識其中一種方法的性能。然而,我們認為不同的目標物適合不同的描述方式,因此本篇論文最主要的目標是建構出一套系統能使用合適的描述方式針對相對應的目標。為了驗證本篇論文的想法,我們建造了一個模擬的環境,模擬環境中有五種不同的物體,分別為人、車、道路、草地以及天空。針對這個環境,我們將人與車視為避免發生碰撞的物體並且要盡量維持在道路範圍內,因為物體辨識對於偵測的物體能預留安全空間,而影像分割則可以描述物體的形狀,因此在這篇論文,我們採用物體辨識來偵測人和車,並透過影像分割來辨識道路。 本篇論文建構了四個網絡,一個是單純的物體辨識的網絡,另一個是單純影像分割的網絡,最後第三個與第四個是我們所提出的網絡,也就是整合物體辨識以及影像分割的神經網絡。第三與第四的網絡差別在於網絡訓練的順序。好的訓練順序與不好的訓練順序會影響最後的精準度。好的訓練順序能保留與單純物體辨識和單純影像分割的精準度,不好的則只能保留約90%的精準度。根據好的訓練順序網絡的實驗結果,我們提出的網絡比起單純物體辨識的網絡和單純影像分割的網絡能更有效的描述相對應的目標物,並且在精準度方面,我們提出的網絡在影像分割部分與單純影像分割維持一樣的精準度,在物體辨識部分則維持單純物體辨識99.9%的精準度。

    Current computer vision is intended to improve the performance of either semantic segmentation or object detection. However, we believe that different targets require different descriptions. Therefore, this work is to construct a system that can use appropriate descriptions to target the corresponding goals. Because object detection can reserve a safe margin for detected objects and semantic segmentation can describe the shape of objects. Thus, four networks are constructed in this study. The first is an object detection network alone, the second is a semantic segmentation network alone, and both the third and the forth networks we proposed are neural networks that integrate object detection and semantic segmentation. The differences between the third network and the fourth one are the order of network training. The training sequence is a dominant factor influencing the final accuracy. A good training sequence preserves the accuracies of the object detection network as well as the semantic segmentation network. According to the experimental results, the proposed network is more effective than the object detection network alone or the semantic segmentation network alone to describe the corresponding target. In terms of accuracy, the segmentation network maintains the same accuracy as the semantic segmentation network alone, and the object detection network maintains a 99.9% accuracy of object detection alone.

    摘要 I ABSTRACT II 致謝 III Table of Contents IV List of Tables VI List of Figures VII NOMENCLATURE IX CHAPTER 1 INTRODUCTION 1 1.1 Computer Vision 1 1.2 Literature Review 4 1.3 Motivation and Objectives 6 1.4 Benchmark Datasets 7 1.5 Research Outline 9 CHAPTER 2 Foundation of CNNs and Related Work 10 2.1 Convolutional Neural Networks 10 2.1.1 Convolution Function 11 2.1.2 Activation Function 13 2.1.3 Max Pooling 14 2.1.4 Loss Function 15 2.2 Semantic Segmentation 16 2.3 Object Detection 21 2.4 Interim Summary 23 CHAPTER 3 Integration of Segmentation with Detection 24 3.1 Introduction to the proposed methods 24 3.2 Base network 24 3.3 Semantic Segmentation 26 3.4 Object Detection 31 3.5 Combination Methods 36 3.6 Interim Summary 39 CHAPTER 4 Experiment Results and Analysis 40 4.1 Experiment Development Environment 40 4.2 Experimental Datasets 41 4.3 Comparison Criteria 43 4.4 Results and Discussion 48 CHAPTER 5 CONCLUSIONS AND FUTURE WORK 56 REFERENCES 57

    [1] Badrinarayanan, V., Handa, A., & Cipolla, R. “Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling”, 2015.
    [2] Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., & Zhang, X. “End to end learning for self-driving cars”, 2016.
    [3] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. “Semantic image segmentation with deep convolutional nets and fully connected crfs”, 2014.
    [4] Clevert, D. A., Unterthiner, T., & Hochreiter, S. “Fast and accurate deep network learning by exponential linear units (elus)”. 2015.
    [5] Dai, J., Li, Y., He, K., & Sun, J. “R-fcn: Object detection via region-based fully convolutional networks”. In Advances in neural information processing systems, pp. 379-387, 2016.
    [6] Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D. “Scalable object detection using deep neural networks”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147-2154, 2014.
    [7] Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. “The pascal visual object classes (voc) challenge”. International journal of computer vision, 2010.
    [8] Girshick, R. “Fast r-cnn”. In Proceedings of the IEEE international conference on computer vision, pp. 1440-1448, 2015.
    [9] Girshick, R., Donahue, J., Darrell, T., & Malik, J. “Rich feature hierarchies for accurate object detection and semantic segmentation”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
    [10] He, K., Zhang, X., Ren, S., & Sun, J. “Deep residual learning for image recognition”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
    [11] Krizhevsky, A., Sutskever, I., & Hinton, G. E. “Imagenet classification with deep convolutional neural networks”. In Advances in neural information processing systems, pp. 1097-1105, 2012.
    [12] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. “Gradient-based learning applied to document recognition”. Proceedings of the IEEE, pp. 2278-2324, 1998.
    [13] LeCun, Y., Haffner, P., Bottou, L., & Bengio, Y. “Object recognition with gradient-based learning”. In Shape, contour and grouping in computer vision, pp. 319-345, Springer, Berlin, Heidelberg, 1999.
    [14] Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. “Unsupervised learning of hierarchical representations with convolutional deep belief networks”. Communications of the ACM, 2011.
    [15] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. “Ssd: Single shot multibox detector”. In European conference on computer vision, pp. 21-37, Springer, Cham, October, 2016.
    [16] Long, J., Shelhamer, E., & Darrell, T. “Fully convolutional networks for semantic segmentation”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440, 2015.
    [17] Noh, H., Hong, S., & Han, B. “Learning deconvolution network for semantic segmentation”. In Proceedings of the IEEE international conference on computer vision, pp. 1520-1528, 2015.
    [18] Pierluigiferrari, “ssd keras”, https://github.com/pierluigiferrari/ssd keras, 2017.
    [19] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. “You only look once: Unified, real-time object detection”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
    [20] Ren, S., He, K., Girshick, R., Zhang, X., & Sun, J. “Object detection networks on convolutional feature maps”. IEEE transactions on pattern analysis and machine intelligence, 2017.
    [21] Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. “Overfeat: Integrated recognition, localization and detection using convolutional networks”, 2013.
    [22] Simonyan, K., & Zisserman, A. “Very deep convolutional networks for large-scale image recognition”, 2014.
    [23] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., & Rabinovich, A. “Going deeper with convolutions”, 2014.
    [24] Tan, Z., Liu, B., & Yu, N. “PPEDNet: Pyramid Pooling Encoder-Decoder Network for Real-Time Semantic Segmentation”. In International Conference on Image and Graphics, pp. 328-339, Springer, Cham, September, 2017.
    [25] Zeiler, M. D., & Fergus, R. “Visualizing and understanding convolutional networks”. In European conference on computer vision, pp. 818-833, Springer, Cham, September, 2014.
    [26] Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. “Deconvolutional networks”, 2010.
    [27] Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., & Torr, P. H. “Conditional random fields as recurrent neural networks”. In Proceedings of the IEEE international conference on computer vision, pp. 1529-1537, 2015.

    無法下載圖示 校內:2023-08-30公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE