| 研究生: |
曾于和 Tseng, Yu-Ho |
|---|---|
| 論文名稱: |
結合影像分割及物體辨識演算法的機器視覺研究 Combination of Segmentation and Detection Algorithms for Computer Vision |
| 指導教授: |
詹劭勳
Jan, Shau-Shiun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 航空太空工程學系 Department of Aeronautics & Astronautics |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 計算機視覺 、物體辨識 、影像分割 |
| 外文關鍵詞: | computer vision, object detection, semantic segmentation |
| 相關次數: | 點閱:116 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在機器視覺中最主要可以分為影像分割和物體辨識。目前的機器視覺是以提高影像分割或物體辨識其中一種方法的性能。然而,我們認為不同的目標物適合不同的描述方式,因此本篇論文最主要的目標是建構出一套系統能使用合適的描述方式針對相對應的目標。為了驗證本篇論文的想法,我們建造了一個模擬的環境,模擬環境中有五種不同的物體,分別為人、車、道路、草地以及天空。針對這個環境,我們將人與車視為避免發生碰撞的物體並且要盡量維持在道路範圍內,因為物體辨識對於偵測的物體能預留安全空間,而影像分割則可以描述物體的形狀,因此在這篇論文,我們採用物體辨識來偵測人和車,並透過影像分割來辨識道路。 本篇論文建構了四個網絡,一個是單純的物體辨識的網絡,另一個是單純影像分割的網絡,最後第三個與第四個是我們所提出的網絡,也就是整合物體辨識以及影像分割的神經網絡。第三與第四的網絡差別在於網絡訓練的順序。好的訓練順序與不好的訓練順序會影響最後的精準度。好的訓練順序能保留與單純物體辨識和單純影像分割的精準度,不好的則只能保留約90%的精準度。根據好的訓練順序網絡的實驗結果,我們提出的網絡比起單純物體辨識的網絡和單純影像分割的網絡能更有效的描述相對應的目標物,並且在精準度方面,我們提出的網絡在影像分割部分與單純影像分割維持一樣的精準度,在物體辨識部分則維持單純物體辨識99.9%的精準度。
Current computer vision is intended to improve the performance of either semantic segmentation or object detection. However, we believe that different targets require different descriptions. Therefore, this work is to construct a system that can use appropriate descriptions to target the corresponding goals. Because object detection can reserve a safe margin for detected objects and semantic segmentation can describe the shape of objects. Thus, four networks are constructed in this study. The first is an object detection network alone, the second is a semantic segmentation network alone, and both the third and the forth networks we proposed are neural networks that integrate object detection and semantic segmentation. The differences between the third network and the fourth one are the order of network training. The training sequence is a dominant factor influencing the final accuracy. A good training sequence preserves the accuracies of the object detection network as well as the semantic segmentation network. According to the experimental results, the proposed network is more effective than the object detection network alone or the semantic segmentation network alone to describe the corresponding target. In terms of accuracy, the segmentation network maintains the same accuracy as the semantic segmentation network alone, and the object detection network maintains a 99.9% accuracy of object detection alone.
[1] Badrinarayanan, V., Handa, A., & Cipolla, R. “Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling”, 2015.
[2] Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., & Zhang, X. “End to end learning for self-driving cars”, 2016.
[3] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. “Semantic image segmentation with deep convolutional nets and fully connected crfs”, 2014.
[4] Clevert, D. A., Unterthiner, T., & Hochreiter, S. “Fast and accurate deep network learning by exponential linear units (elus)”. 2015.
[5] Dai, J., Li, Y., He, K., & Sun, J. “R-fcn: Object detection via region-based fully convolutional networks”. In Advances in neural information processing systems, pp. 379-387, 2016.
[6] Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D. “Scalable object detection using deep neural networks”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147-2154, 2014.
[7] Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. “The pascal visual object classes (voc) challenge”. International journal of computer vision, 2010.
[8] Girshick, R. “Fast r-cnn”. In Proceedings of the IEEE international conference on computer vision, pp. 1440-1448, 2015.
[9] Girshick, R., Donahue, J., Darrell, T., & Malik, J. “Rich feature hierarchies for accurate object detection and semantic segmentation”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
[10] He, K., Zhang, X., Ren, S., & Sun, J. “Deep residual learning for image recognition”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[11] Krizhevsky, A., Sutskever, I., & Hinton, G. E. “Imagenet classification with deep convolutional neural networks”. In Advances in neural information processing systems, pp. 1097-1105, 2012.
[12] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. “Gradient-based learning applied to document recognition”. Proceedings of the IEEE, pp. 2278-2324, 1998.
[13] LeCun, Y., Haffner, P., Bottou, L., & Bengio, Y. “Object recognition with gradient-based learning”. In Shape, contour and grouping in computer vision, pp. 319-345, Springer, Berlin, Heidelberg, 1999.
[14] Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. “Unsupervised learning of hierarchical representations with convolutional deep belief networks”. Communications of the ACM, 2011.
[15] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. “Ssd: Single shot multibox detector”. In European conference on computer vision, pp. 21-37, Springer, Cham, October, 2016.
[16] Long, J., Shelhamer, E., & Darrell, T. “Fully convolutional networks for semantic segmentation”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440, 2015.
[17] Noh, H., Hong, S., & Han, B. “Learning deconvolution network for semantic segmentation”. In Proceedings of the IEEE international conference on computer vision, pp. 1520-1528, 2015.
[18] Pierluigiferrari, “ssd keras”, https://github.com/pierluigiferrari/ssd keras, 2017.
[19] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. “You only look once: Unified, real-time object detection”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
[20] Ren, S., He, K., Girshick, R., Zhang, X., & Sun, J. “Object detection networks on convolutional feature maps”. IEEE transactions on pattern analysis and machine intelligence, 2017.
[21] Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. “Overfeat: Integrated recognition, localization and detection using convolutional networks”, 2013.
[22] Simonyan, K., & Zisserman, A. “Very deep convolutional networks for large-scale image recognition”, 2014.
[23] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., & Rabinovich, A. “Going deeper with convolutions”, 2014.
[24] Tan, Z., Liu, B., & Yu, N. “PPEDNet: Pyramid Pooling Encoder-Decoder Network for Real-Time Semantic Segmentation”. In International Conference on Image and Graphics, pp. 328-339, Springer, Cham, September, 2017.
[25] Zeiler, M. D., & Fergus, R. “Visualizing and understanding convolutional networks”. In European conference on computer vision, pp. 818-833, Springer, Cham, September, 2014.
[26] Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. “Deconvolutional networks”, 2010.
[27] Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., & Torr, P. H. “Conditional random fields as recurrent neural networks”. In Proceedings of the IEEE international conference on computer vision, pp. 1529-1537, 2015.
校內:2023-08-30公開