| 研究生: |
江仲馨 Chiang, Chung-Hsin |
|---|---|
| 論文名稱: |
應用密集殘差融合網路於駕駛輔助系統之點雲分類 Point Cloud Classification for Driving Assistance System Based on Dense Residual Fusion Network |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 3D點雲 、卷積神經網路 、密集殘差融合網路 、深度圖像 、方向角圖像 |
| 外文關鍵詞: | Object classification, 3D point cloud, Convolution neural network |
| 相關次數: | 點閱:171 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著將光達引入行車輔助系統,點雲辨識架構不斷地推陳出新。現有的辨識架構常將未經預處理的3D點雲作為2D卷積神經網路的輸入,這容易限制網路對於點雲的特徵表達能力。此外,由於對無人駕駛車輛的安全性要求越來越高,僅使用相機所獲取的資訊易受到光線的影響進而造成誤判。我們的方法額外增添從光達中獲取的點雲資訊,並將點雲轉換成2D方向角圖和深度圖,其中分割好的點雲會進一步投影至RGB圖上,用於切割出與之相對應的感興趣區域(Region of Interest, ROI)以增加輸入資訊的多樣性。基於上述原因,我們提出了結合2D方向角(BA)圖、深度圖和RGB圖的高精度分類網路。然而,僅添加輸入訊息種類並不足以提高神經網路的分類能力。在本文中,我們使用密集殘差融合網路(Dense Residual Fusion Network, DRF-Net)來整合不同的輸入訊息,該網路由密集殘差模塊(Dense Residual Block, DRB)組成以提高網路的特徵提取能力,這有利於特徵的重複使用。除此之外,我們提出的階層輔助學習機制可以使中間層的網路學習到更好的特徵。將KITTI原始數據資料集經過3D點雲前處理後,我們所提出的DRF-Net可以實現98.33%的準確率。
Compared with the state-of-the-art architectures, using the 3D point cloud as the input of the 2D convolutional neural network without preprocessing will restrict the feature expression of the network. To address this issue, we propose a high-precision classification network using bearing angle (BA) images, depth images, and RGB images. Due to the development of unmanned vehicles, determining how to recognize objects from the information collected by sensors is important. Our approach takes data from Lider and a camera and projects a 3D point cloud into 2D BA images and depth images. The RGB image captured by the camera is used to select the region of interest (ROI) corresponding to the point cloud.
However, only adding input information is not enough to improve the classification ability of general convolutional neural networks. In our approach, we use a Dense-Residual Fusion Network (DRF-Net), which consists of three Dense-Residual blocks(DRBs). Our Dense-Residual block adopts the structure of residual blocks with dense connections. The Dense-Residual Fusion Network can achieve high accuracy with these three inputs on a KITTI raw dataset.
[1] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1, pp. 886–893, IEEE, 2005.
[2] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
[3] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928, IEEE, 2015.
[4] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in Proceedings of the IEEE international conference on computer vision, pp. 945–953, 2015.
[5] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660, 2017.
[6] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Advances in neural information processing systems, pp. 5099–5108, 2017.
[7] Y. Zhang and M. Rabbat, “A graph-cnn for 3d point cloud classification,” in Proceedings of the Conference on 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6279–6283, IEEE, 2018.
[8] B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, P. Morton, and A. Frenkel, “On the segmentation of 3d lidar point clouds,” in Proceedings of the Conference on 2011 IEEE International Conference on Robotics and Automation, pp. 2798–2805, IEEE, 2011.
[9] A. Börcs, B. Nagy, and C. Benedek, “Instant object detection in lidar point clouds,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 7,pp. 992–996, 2017.
[10] H.-T. C. Chien-Chou Lin, “Object classification in point cloud using convolutional neural network with 2d images,” in Proceedings of the Conference on IPPR Conference on Computer Version, Graphics and Image Processing, 2017.
[11] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,1998.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[13] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of the Conference on ICLR, 2015.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
[16] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, pp. 91–99, 2015.
[18] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, Selective search for object recognition,” International journal of computer vision, vol. 104, no. 2, pp. 154–171, 2013.
[19] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[20] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
[21] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
[22] X. Chen, H. Ma, J.Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915, 2017.
[23] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
[24] P. M. Chu, S. Cho, Y.W. Park, and K. Cho, “Fast point cloud segmentation based on flood-fill algorithm,” in Proceedings of the 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 656–659, IEEE, 2017.
[25] D. Scaramuzza, A. Harati, and R. Siegwart, “Extrinsic self calibration of a camera and a 3d laser range finder from natural scenes,” in Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4164–4169, IEEE, 2007.
[26] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[27] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
[28] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.
[29] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550:354–359, 2017.
[30] D. Silver et al., “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018.
[31] Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Dumitru, Vanhoucke, Vincent, and Rabinovich, Andrew. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.
[32] Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. Frustum pointnets for 3d object detection from RGB-D data. In CVPR, 2018
[33] Martin Simonovsky and Nikos Komodakis, “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, United States, 2017.
[34] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980