| 研究生: |
李至傑 Li, Chih-Chieh |
|---|---|
| 論文名稱: |
基於點雲與影像體素特徵深度學習之物件辨識 Development of Object Detection Method Based on Deep Learning of Voxel Features of Point Clouds and Images |
| 指導教授: |
江佩如
Ching, Pei-Ju |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 系統及船舶機電工程學系 Department of Systems and Naval Mechatronic Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 66 |
| 中文關鍵詞: | 深度學習 、三維物件辨識 、點雲與影像融合 、點雲體素 |
| 外文關鍵詞: | Deep learning, 3D object detection, point cloud and image fusion, point cloud voxel |
| 相關次數: | 點閱:153 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於點雲與影像結合的深度學習架構不斷推陳出新,主要用於道路的三維物件辨識,在先前研究當中,由於二維影像特徵與點雲特徵的維度不同,因此在訓練階段必須分別將影像與點雲特徵分開訓練,導致訓練過程較繁瑣。在本研究中,為了改善並且提升辨識能力,參考以點雲特徵體素做為輸入的架構,在點雲體素化階段建立影像特徵體素,其透過光達與相機之間的投影關係,賦予點雲RGB特徵並進行體素化,得到富有影像特徵的體素網格,加入既有的高精度三維物件辨識模型Voxel R-CNN進行優化。在實驗階段,除了驗證主要的改進模型之外,為了更進一步驗證影像特徵體素對點雲特徵體素做為輸入的架構之影響,亦將其加入PV-R-CNN中進行驗證,在實驗結果中可以發現,加入影像特徵體素後可以提升準確率,尤其是在小物體當中如行人、腳踏車,其準確率能夠提升約2%。
Based on the combination of point cloud and image deep learning architecture continues to innovate, mainly used for road three-dimensional object detection, in the previous study, because the two-dimensional image features and point cloud features of the dimension is different, so in the training stage must be separately from the image and point cloud features training, resulting in a cumbersome training process. In this study, in order to improve and enhance the recognition ability, the image feature voxels are established in voxelization stage, which is given the RGB features to point cloud and voxelization through the projection relationship between lidar and the camera, and the voxel mesh rich in image features is obtained, then the existing high-precision 3D object recognition model Voxel R-CNN is added for optimization.
In the experimental stage, in addition to verifying the main improved model, in order to further verify the influence of image feature voxels on the architecture of point cloud feature voxels as input, it is also added to PV-R-CNN for verification, and it can be found in the experimental results that the accuracy rate can be improved after adding image feature voxels, especially in small objects such as pedestrians and cyclist, and its accuracy can be increased by about 2%.
[1]C. Qi, H. Su, K. Mo, and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77-85, 2017.
[2]C. Qi, L. Yi, H. Su, and L. J. Guibas, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space," in NIPS, 2017.
[3]Z. Yang, Y. Sun, S. Liu, and J. Jia, "3DSSD: Point-Based 3D Single Stage Object Detector," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11037-11045, 2020.
[4]S. Shi, X. Wang, and H. Li, "PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-779, 2019.
[5]Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, "STD: Sparse-to-Dense 3D Object Detector for Point Cloud," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1951-1960, 2019.
[6]B. Li, T. Zhang, and T. Xia, "Vehicle Detection from 3D Lidar Using Fully Convolutional Network," ArXiv, vol. abs/1608.07916, 2016.
[7]B. Yang, W. Luo, and R. Urtasun, "PIXOR: Real-time 3D Object Detection from Point Clouds," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7652-7660, 2018.
[8]W. Ali, S. Abdelkarim, M. A. Zahran, M. Zidan, and A. E. Sallab, "YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud," ArXiv, vol. abs/1808.02350, 2018.
[9]M. Simon, S. Milz, K. Amende, and H.-M. Groß, "Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds," in ECCV Workshops, 2018.
[10]Y. Zhou and O. Tuzel, "VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4490-4499, 2018.
[11]Y. Yan, Y. Mao, and B. Li, "SECOND: Sparsely Embedded Convolutional Detection," Sensors (Basel, Switzerland), vol. 18, 2018.
[12]A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, "PointPillars: Fast Encoders for Object Detection From Point Clouds," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12689-12697, 2019.
[13]S. Shi et al., "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10526-10535, 2020.
[14]J. Deng, S. Shi, P.-C. Li, W.-g. Zhou, Y. Zhang, and H. Li, "Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection," in AAAI, 2021.
[15]X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-view 3D Object Detection Network for Autonomous Driving," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6526-6534, 2017.
[16]J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, "Joint 3D Proposal Generation and Object Detection from View Aggregation," 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1-8, 2018.
[17]M. Liang, B. Yang, S. Wang, and R. Urtasun, "Deep Continuous Fusion for Multi-sensor 3D Object Detection," in ECCV, 2018.
[18]M. Liang, B. Yang, Y. Chen, R. Hu, and R. Urtasun, "Multi-Task Multi-Sensor Fusion for 3D Object Detection," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7337-7345, 2019.
[19]J. H. Yoo, Y. Kim, J. S. Kim, and J. W. Choi, "3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection," in ECCV, 2020.
[20]T. Huang, Z. Liu, X. Chen, and X. Bai, "EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection," in ECCV, 2020.
[21]C. Chen, L. Z. Fragonara, and A. Tsourdos, "RoIFusion: 3D Object Detection From LiDAR and Vision," IEEE Access, vol. 9, pp. 51710-51721, 2021.
[22]C.-H. Wang, H.-W. Chen, and L.-C. Fu, "VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection," ArXiv, vol. abs/2111.00966, 2021.
[23]J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517-6525, 2017.
[24]Y. Wu, Y. Wang, S. Zhang, and H. Ogai, "Deep 3D Object Detection Networks Using LiDAR Data: A Review," IEEE Sensors Journal, vol. 21, pp. 1152-1171, 2021.
[25]A. Paszke et al., "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in NeurIPS, 2019.
[26]R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, "Region-Based Convolutional Networks for Accurate Object Detection and Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, pp. 142-158, 2016.
[27]S. Q. Ren, K. M. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," (in English), Ieee Transactions on Pattern Analysis and Machine Intelligence, Article vol. 39, no. 6, pp. 1137-1149, Jun 2017, doi: 10.1109/tpami.2016.2577031.
[28]N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, "Soft-NMS — Improving Object Detection with One Line of Code," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562-5570, 2017.
[29]D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, 2002.
[30]A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354-3361, 2012.
[31]W. Zhang, Z. Wang, and C. C. Loy, "Exploring data augmentation for multi-modality 3D object detection," arXiv preprint arXiv:2012.12741, 2020.
[32]T.-Y. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, "Feature Pyramid Networks for Object Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, 2017.
[33]M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, vol. 88, pp. 303-338, 2009.
[34]T.-Y. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999-3007, 2017.
[35]OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds. (2020).[Online]. Available: https://github.com/open-mmlab/OpenPCDet
校內:2027-08-28公開