簡易檢索 / 詳目顯示

研究生: 林育詳
Lin, Yu-Hsiang
論文名稱: 運用機率分佈於圖像與點雲物件偵測之定位物體方法
Locating Objects with Probability Distributions in Images and Point Clouds
指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 86
中文關鍵詞: 卷積神經網路物件偵測機率分佈
外文關鍵詞: convolutional neural network, object detection, probability distribution
相關次數: 點閱:39下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文採用機率分佈來描述影像物體的位置和大小,並以Kullback-Leibler divergence作為回歸損失函數訓練物件偵測(Object detection)網路。相較於以往大多數方法以矩形邊界框來表示物體的位置和大小,我們提出近似分佈轉換器(Nearest Distribution Converter, NDC)將預測的機率分佈轉換為相近的均勻分佈,並將均勻分布視為矩形邊界框。我們提出的方法應用在圖像物件偵測模型 YOLOv3、YOLOv4-tiny 和 YOLOv4 上,在PASCAL VOC數據集評估結果顯示,偵測效能分別提高 0.92%、0.75% 和 0.48%。我們提出的方法還應用在三維點雲物件偵測模型SECOND,在KITTI數據集評估結果顯示,對於行人和騎士的偵測效能有提升。

    In this paper, we predict the locations as probability distributions for the object detection. We adopt the Kullback-Leibler divergences as the regression losses to train the deep neural networks. Since most existing evaluations label the objects with rectangular bounding boxes, we propose the Nearest Distribution Converter to find the closest uniform distributions from the predicted ones. Our proposed method can improve the detected accuracy measured in mAP by 0.57%, 0.75%, and 0.48% on the image object detection models YOLOv3, the YOLOv4-tiny, and the YOLOv4, respectively. We also apply our proposed method to the point cloud object detection model SECOND, and the evaluation results show that our method improves model performance compared to the original method, particularly for the detection of pedestrians and cyclists.

    中文摘要 I 目錄 IX 圖目錄 XII 表目錄 XIV 第一章 緒論 1 1-1 前言 1 1-2 研究動機 1 1-3 研究貢獻 3 1-4 論文架構 4 第二章 相關研究背景介紹 5 2-1 深度學習 5 2-2 卷積神經網路 6 2-3 訓練神經網路 8 2-4 物件偵測 9 2-4-1 交併比 10 2-4-2 非極大值抑制 11 第三章 相關文獻回顧 12 3-1 基於卷積神經網路之圖像物件偵測演算法 12 3-1-1 兩階段架構之R-CNN系列演算法 12 3-1-2 單階段架構之YOLO系列演算法 16 3-1-3 相關文獻之圖像物件偵測模型效能比較 21 3-2 基於卷積神經網路之點雲物件偵測演算法 22 3-2-1 基於多視圖點雲物件偵測模型 22 3-2-2 基於視錐點雲物件偵測模型 23 3-2-3 基於體素點雲物件偵測模型 24 3-2-4 基於柱子點雲物件偵測模型 26 3-2-5 相關文獻之點雲物件偵測模型效能比較 27 3-3 回歸損失函數 28 3-3-1 Mean Square Error Loss 31 3-3-2 Smooth L1 loss 32 3-3-3 IoU loss 32 3-3-4 GIoU loss 33 3-3-5 CIoU loss 33 3-3-6 不同回歸損失函數之相關文獻總結 35 第四章 物件偵測之機率定位方法 37 4-1 圖像物件偵測之機率定位方法 37 4-1-1 圖像物件偵測之KL-loss回歸損失函數 39 4-1-2 圖像物件偵測之近似分布轉換 40 4-2 點雲物件偵測之機率定位方法 41 4-2-1 點雲物件偵測之KL-loss回歸損失函數 41 4-2-2 點雲物件偵測之近似分布轉換 44 4-3 各個模型之Anchor匹配策略和整體損失函數 46 4-3-1 各個模型之Anchor匹配策略 46 4-3-2 各個物件偵測模型之整體損失函數 50 第五章 實驗結果 55 5-1 資料集(Dataset) 55 5-2 物件偵測效能評估介紹 57 5-3 訓練細節 59 5-4 圖像物件偵測實驗結果 60 5-4-1 不同回歸損失函數應用不同模型的實驗結果 60 5-4-2 不同損失函數應用不同模型在各個類別的實驗結果 61 5-4-3 視覺化結果 63 5-5 點雲物件偵測實驗結果 65 5-5-1 不同回歸損失函數應用於PV-RCNN、PointPillars和SECOND的實驗結果 65 5-5-2 設置不同正樣本閥值的實驗結果 67 5-5-3 統計圖分析 69 5-5-4 視覺化結果比較 71 第六章 結論與未來展望 74 6-1 結論 74 6-2 未來展望 74 參考文獻 75 附錄A 79 附錄B 82

    [1] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
    [2] R. Girshick, “Fast R-CNN,” In Proceedings of the IEEE International Conference on Computer Vision. 2015. p. 1440-1448.
    [3] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," ln IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017.
    [4] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single Shot MultiBox Detector," In European Conference on Computer Vision. Springer, Cham, 2016. p. 21-37.
    [5] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
    [6] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6517-6525.
    [7] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.
    [8] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao. "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020.
    [9] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, " Multi-View 3D Object Detection Network for Autonomous Driving," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 1907-1915.
    [10] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum PointNets for 3D Object Detection From RGB-D Data,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 918-927.
    [11] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 4490-4499.
    [12] Y. Yan, Y. Mao, and B. Li,” SECOND: Sparsely Embedded Convolutional Detection,” Sensors, 2018, 18.10: 3337.
    [13] J. Yu, Y. Jiang, Z. Wang, Z. Cao, and T. Huang. "UnitBox: An Advanced Object Detection Network," In Proceedings of the 24th ACM international conference on Multimedia. 2016. p. 516-520.
    [14] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid and S. Savarese, "Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 658-666.
    [15] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, ”Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression,” In Proceedings of the AAAI Conference on Artificial Intelligence. p. 12993-13000.
    [16] S.Kullback, Solomon. “Information theory and statistics,” Courier Corporation, 1997.
    [17] M.Menéndez, J.Pardo, L.Pardo, and M.Pardo, “The Jensen-Shannon divergence.” Journal of the Franklin Institute, 1997, 334.2: 307-318.
    [18] O. Calin and C. Udri¸ste, “Geometric Modeling in Probability and Statistics,” Berlin, Germany: Springer, 2014.
    [19] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, “A Theoretical Framework for Back-Propagation,” In Proceedings of the 1988 Connectionist Models Summer School. 1988. p. 21-28.
    [20] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.
    [21] A. Geiger, P. Lenz and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354-3361.
    [22] T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2999-3007.
    [23] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision , vol. 104, no. 2, pp. 154–171, 2013.
    [24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,”In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770-778.
    [25] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 2117-2125.
    [26] B. Graham, “Sparse 3D Convolutional Neural Networks,” arXiv preprint arXiv:1505.02890, 2015.
    [27] B. Graham and L. van der Maaten, “Submanifold Sparse Convolutional Networks,” arXiv preprint arXiv:1505.02890, 2015.
    [28] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
    [29] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychological review, vol. 65, no. 6, p. 386, 1958.
    [30] J. Zhang, S. Sclaroff, Z. Lin, X. Shen, B. Price and R. Mech, "Unconstrained Salient Object Detection via Proposal Subset Optimization," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5733-5742, doi: 10.1109/CVPR.2016.618.
    [31] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Improving Object Detection With One Line of Code,” in Proceedings of the IEEE Conference on Computer Vision, pp. 5561–5569, 2017.
    [32] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
    [33] S. Shi et al., "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
    [34] C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

    下載圖示 校內:2025-09-08公開
    校外:2025-09-08公開
    QR CODE