| 研究生: |
林育詳 Lin, Yu-Hsiang |
|---|---|
| 論文名稱: |
運用機率分佈於圖像與點雲物件偵測之定位物體方法 Locating Objects with Probability Distributions in Images and Point Clouds |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 86 |
| 中文關鍵詞: | 卷積神經網路 、物件偵測 、機率分佈 |
| 外文關鍵詞: | convolutional neural network, object detection, probability distribution |
| 相關次數: | 點閱:39 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本文採用機率分佈來描述影像物體的位置和大小,並以Kullback-Leibler divergence作為回歸損失函數訓練物件偵測(Object detection)網路。相較於以往大多數方法以矩形邊界框來表示物體的位置和大小,我們提出近似分佈轉換器(Nearest Distribution Converter, NDC)將預測的機率分佈轉換為相近的均勻分佈,並將均勻分布視為矩形邊界框。我們提出的方法應用在圖像物件偵測模型 YOLOv3、YOLOv4-tiny 和 YOLOv4 上,在PASCAL VOC數據集評估結果顯示,偵測效能分別提高 0.92%、0.75% 和 0.48%。我們提出的方法還應用在三維點雲物件偵測模型SECOND,在KITTI數據集評估結果顯示,對於行人和騎士的偵測效能有提升。
In this paper, we predict the locations as probability distributions for the object detection. We adopt the Kullback-Leibler divergences as the regression losses to train the deep neural networks. Since most existing evaluations label the objects with rectangular bounding boxes, we propose the Nearest Distribution Converter to find the closest uniform distributions from the predicted ones. Our proposed method can improve the detected accuracy measured in mAP by 0.57%, 0.75%, and 0.48% on the image object detection models YOLOv3, the YOLOv4-tiny, and the YOLOv4, respectively. We also apply our proposed method to the point cloud object detection model SECOND, and the evaluation results show that our method improves model performance compared to the original method, particularly for the detection of pedestrians and cyclists.
[1] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
[2] R. Girshick, “Fast R-CNN,” In Proceedings of the IEEE International Conference on Computer Vision. 2015. p. 1440-1448.
[3] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," ln IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017.
[4] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single Shot MultiBox Detector," In European Conference on Computer Vision. Springer, Cham, 2016. p. 21-37.
[5] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
[6] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6517-6525.
[7] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018.
[8] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao. "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020.
[9] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, " Multi-View 3D Object Detection Network for Autonomous Driving," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 1907-1915.
[10] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum PointNets for 3D Object Detection From RGB-D Data,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 918-927.
[11] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 4490-4499.
[12] Y. Yan, Y. Mao, and B. Li,” SECOND: Sparsely Embedded Convolutional Detection,” Sensors, 2018, 18.10: 3337.
[13] J. Yu, Y. Jiang, Z. Wang, Z. Cao, and T. Huang. "UnitBox: An Advanced Object Detection Network," In Proceedings of the 24th ACM international conference on Multimedia. 2016. p. 516-520.
[14] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid and S. Savarese, "Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 658-666.
[15] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, ”Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression,” In Proceedings of the AAAI Conference on Artificial Intelligence. p. 12993-13000.
[16] S.Kullback, Solomon. “Information theory and statistics,” Courier Corporation, 1997.
[17] M.Menéndez, J.Pardo, L.Pardo, and M.Pardo, “The Jensen-Shannon divergence.” Journal of the Franklin Institute, 1997, 334.2: 307-318.
[18] O. Calin and C. Udri¸ste, “Geometric Modeling in Probability and Statistics,” Berlin, Germany: Springer, 2014.
[19] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, “A Theoretical Framework for Back-Propagation,” In Proceedings of the 1988 Connectionist Models Summer School. 1988. p. 21-28.
[20] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.
[21] A. Geiger, P. Lenz and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354-3361.
[22] T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2999-3007.
[23] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision , vol. 104, no. 2, pp. 154–171, 2013.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,”In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770-778.
[25] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 2117-2125.
[26] B. Graham, “Sparse 3D Convolutional Neural Networks,” arXiv preprint arXiv:1505.02890, 2015.
[27] B. Graham and L. van der Maaten, “Submanifold Sparse Convolutional Networks,” arXiv preprint arXiv:1505.02890, 2015.
[28] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
[29] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychological review, vol. 65, no. 6, p. 386, 1958.
[30] J. Zhang, S. Sclaroff, Z. Lin, X. Shen, B. Price and R. Mech, "Unconstrained Salient Object Detection via Proposal Subset Optimization," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5733-5742, doi: 10.1109/CVPR.2016.618.
[31] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Improving Object Detection With One Line of Code,” in Proceedings of the IEEE Conference on Computer Vision, pp. 5561–5569, 2017.
[32] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[33] S. Shi et al., "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[34] C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.