簡易檢索 / 詳目顯示

研究生: 李至傑
Li, Chih-Chieh
論文名稱: 基於點雲與影像體素特徵深度學習之物件辨識
Development of Object Detection Method Based on Deep Learning of Voxel Features of Point Clouds and Images
指導教授: 江佩如
Ching, Pei-Ju
學位類別: 碩士
Master
系所名稱: 工學院 - 系統及船舶機電工程學系
Department of Systems and Naval Mechatronic Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 66
中文關鍵詞: 深度學習三維物件辨識點雲與影像融合點雲體素
外文關鍵詞: Deep learning, 3D object detection, point cloud and image fusion, point cloud voxel
相關次數: 點閱:153下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 基於點雲與影像結合的深度學習架構不斷推陳出新,主要用於道路的三維物件辨識,在先前研究當中,由於二維影像特徵與點雲特徵的維度不同,因此在訓練階段必須分別將影像與點雲特徵分開訓練,導致訓練過程較繁瑣。在本研究中,為了改善並且提升辨識能力,參考以點雲特徵體素做為輸入的架構,在點雲體素化階段建立影像特徵體素,其透過光達與相機之間的投影關係,賦予點雲RGB特徵並進行體素化,得到富有影像特徵的體素網格,加入既有的高精度三維物件辨識模型Voxel R-CNN進行優化。在實驗階段,除了驗證主要的改進模型之外,為了更進一步驗證影像特徵體素對點雲特徵體素做為輸入的架構之影響,亦將其加入PV-R-CNN中進行驗證,在實驗結果中可以發現,加入影像特徵體素後可以提升準確率,尤其是在小物體當中如行人、腳踏車,其準確率能夠提升約2%。

    Based on the combination of point cloud and image deep learning architecture continues to innovate, mainly used for road three-dimensional object detection, in the previous study, because the two-dimensional image features and point cloud features of the dimension is different, so in the training stage must be separately from the image and point cloud features training, resulting in a cumbersome training process. In this study, in order to improve and enhance the recognition ability, the image feature voxels are established in voxelization stage, which is given the RGB features to point cloud and voxelization through the projection relationship between lidar and the camera, and the voxel mesh rich in image features is obtained, then the existing high-precision 3D object recognition model Voxel R-CNN is added for optimization.
    In the experimental stage, in addition to verifying the main improved model, in order to further verify the influence of image feature voxels on the architecture of point cloud feature voxels as input, it is also added to PV-R-CNN for verification, and it can be found in the experimental results that the accuracy rate can be improved after adding image feature voxels, especially in small objects such as pedestrians and cyclist, and its accuracy can be increased by about 2%.

    摘要 i 致謝 ix 目錄 x 表目錄 xiv 圖目錄 xv 第一章 緒論 1 1.1 研究動機與目的 1 1.2 文獻回顧 2 1.2.1 基於點雲的物件辨識方法 2 1.2.2 基於點雲與影像融合的物件辨識 7 1.2.3 文獻回顧總結 11 1.3 主要貢獻 11 1.4 本論文之系統簡介及架構 12 第二章 深度學習網路相關技術 13 2.1 體素特徵 13 2.1.1 點雲體素化 13 2.2 卷積神經網路 15 2.2.1 體素特徵的三維卷積神經網路 15 2.2.2 三維稀疏卷積 16 2.3 R-CNN網路 17 2.3.1 RPN 18 2.3.2 體素特徵的RPN 20 2.3.3 體素感興趣區域池化層(Voxel RoI-Pooling) 22 2.4 座標轉換 24 2.4.1 3D點雲投影到2D影像 24 第三章 網路架構建立與參數設置 27 3.1 影像特徵體素與點雲特徵體素之建立 28 3.1.1 影像與點雲之特徵體素建立 29 3.1.2 資料擴增 30 3.2 特徵擷取與RPN層 32 3.2.1 三維體素特徵擷取 33 3.2.2 二維鳥瞰圖特徵擷取 34 3.2.3 候選框之選取 35 3.3 感興趣區域池化與回歸層 37 3.3.1 感興趣區域池化層 37 3.3.2 感興趣區域回歸層 38 第四章 實驗結果與討論 39 4.1 訓練參數設置 39 4.1.1 邊界框判斷 39 4.1.2 損失函數設置 40 4.1.3 訓練參數 44 4.2 實驗開發環境與資料集 46 4.2.1 實驗環境 46 4.2.2 OpenPCDet 47 4.2.3 KITTI數據集 47 4.3 評估指標與項目 49 4.3.1 評估指標 49 4.3.2 評估項目 51 4.4 實驗結果與分析 52 4.4.1 比較不同架構之準確率與速度 52 4.4.2 比較不同類別之準確率 56 第五章 結論與未來展望 62 5.1 結論 62 5.2 未來展望 63 參考文獻 64

    [1]C. Qi, H. Su, K. Mo, and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77-85, 2017.
    [2]C. Qi, L. Yi, H. Su, and L. J. Guibas, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space," in NIPS, 2017.
    [3]Z. Yang, Y. Sun, S. Liu, and J. Jia, "3DSSD: Point-Based 3D Single Stage Object Detector," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11037-11045, 2020.
    [4]S. Shi, X. Wang, and H. Li, "PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-779, 2019.
    [5]Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, "STD: Sparse-to-Dense 3D Object Detector for Point Cloud," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1951-1960, 2019.
    [6]B. Li, T. Zhang, and T. Xia, "Vehicle Detection from 3D Lidar Using Fully Convolutional Network," ArXiv, vol. abs/1608.07916, 2016.
    [7]B. Yang, W. Luo, and R. Urtasun, "PIXOR: Real-time 3D Object Detection from Point Clouds," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7652-7660, 2018.
    [8]W. Ali, S. Abdelkarim, M. A. Zahran, M. Zidan, and A. E. Sallab, "YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud," ArXiv, vol. abs/1808.02350, 2018.
    [9]M. Simon, S. Milz, K. Amende, and H.-M. Groß, "Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds," in ECCV Workshops, 2018.
    [10]Y. Zhou and O. Tuzel, "VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4490-4499, 2018.
    [11]Y. Yan, Y. Mao, and B. Li, "SECOND: Sparsely Embedded Convolutional Detection," Sensors (Basel, Switzerland), vol. 18, 2018.
    [12]A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, "PointPillars: Fast Encoders for Object Detection From Point Clouds," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12689-12697, 2019.
    [13]S. Shi et al., "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10526-10535, 2020.
    [14]J. Deng, S. Shi, P.-C. Li, W.-g. Zhou, Y. Zhang, and H. Li, "Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection," in AAAI, 2021.
    [15]X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-view 3D Object Detection Network for Autonomous Driving," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6526-6534, 2017.
    [16]J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, "Joint 3D Proposal Generation and Object Detection from View Aggregation," 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1-8, 2018.
    [17]M. Liang, B. Yang, S. Wang, and R. Urtasun, "Deep Continuous Fusion for Multi-sensor 3D Object Detection," in ECCV, 2018.
    [18]M. Liang, B. Yang, Y. Chen, R. Hu, and R. Urtasun, "Multi-Task Multi-Sensor Fusion for 3D Object Detection," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7337-7345, 2019.
    [19]J. H. Yoo, Y. Kim, J. S. Kim, and J. W. Choi, "3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection," in ECCV, 2020.
    [20]T. Huang, Z. Liu, X. Chen, and X. Bai, "EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection," in ECCV, 2020.
    [21]C. Chen, L. Z. Fragonara, and A. Tsourdos, "RoIFusion: 3D Object Detection From LiDAR and Vision," IEEE Access, vol. 9, pp. 51710-51721, 2021.
    [22]C.-H. Wang, H.-W. Chen, and L.-C. Fu, "VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection," ArXiv, vol. abs/2111.00966, 2021.
    [23]J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517-6525, 2017.
    [24]Y. Wu, Y. Wang, S. Zhang, and H. Ogai, "Deep 3D Object Detection Networks Using LiDAR Data: A Review," IEEE Sensors Journal, vol. 21, pp. 1152-1171, 2021.
    [25]A. Paszke et al., "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in NeurIPS, 2019.
    [26]R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, "Region-Based Convolutional Networks for Accurate Object Detection and Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, pp. 142-158, 2016.
    [27]S. Q. Ren, K. M. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," (in English), Ieee Transactions on Pattern Analysis and Machine Intelligence, Article vol. 39, no. 6, pp. 1137-1149, Jun 2017, doi: 10.1109/tpami.2016.2577031.
    [28]N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, "Soft-NMS — Improving Object Detection with One Line of Code," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562-5570, 2017.
    [29]D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, 2002.
    [30]A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354-3361, 2012.
    [31]W. Zhang, Z. Wang, and C. C. Loy, "Exploring data augmentation for multi-modality 3D object detection," arXiv preprint arXiv:2012.12741, 2020.
    [32]T.-Y. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, "Feature Pyramid Networks for Object Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, 2017.
    [33]M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, vol. 88, pp. 303-338, 2009.
    [34]T.-Y. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999-3007, 2017.
    [35]OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds. (2020).[Online]. Available: https://github.com/open-mmlab/OpenPCDet

    無法下載圖示 校內:2027-08-28公開
    校外:2027-08-28公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE