| 研究生: |
葉明諺 Yeh, Ming-Yen |
|---|---|
| 論文名稱: |
應用於自駕技術之多重路徑偵測器 Multiple Paths Detector for Autonomous Driving |
| 指導教授: |
楊家輝
Yang, Jar-Ferr |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 深度學習 、物件偵測 、自動輔助駕駛系統 、單張影像偵測 、多重路徑偵測 |
| 外文關鍵詞: | deep learning, object detection, autonomous driving, single shot detection, multiple paths detection |
| 相關次數: | 點閱:51 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習的蓬勃發展,自動駕駛的可行性提高了許多,並且在世界各地成為十分熱門的研究題目,而如何捕捉精準的物件偵測結果在自駕領域中成為非常重要的技術環節。因此,有許多論文提出改善深度學習模型的方法來增加物件偵測的精準度,但過大的深度學習網路容易使資源消耗過大以及計算量太高,在後端整合各部分自駕技術時會造成整體系統速度太慢,不符合實務上的需求。由上述可知,為了滿足實務上的可行性,速度、精確度跟資源的消耗都同等重要。在本論文中,我們針對自動輔助駕駛系統提出一套基於卷積神經網路的偵測方法。此偵測系統是以輕量化的想法構成,藉由組織並分析物件偵測的問題後,我們利用由骨幹模組、偵測模組、分割模組形成的多重路徑偵測器來減緩這些缺失並提升效能。此外,我們同時也提出針對特徵關聯性的損失函數來使偵測效果更加穩健。而本論文的偵測目標是辨識台灣道路上的影像中常見的物件,包含汽、機車、行人等。為了使系統能適應臺灣的一般道路,我們也針對台灣的道路場景進行資料的收集和建置。透過實驗結果證明,本論文的方法達到準確度與速度的平衡,並在mAP (mean Average Precision)上有不錯的結果。
With deep learning technologies, autonomous driving becomes an achievable and popular research topic around the world. How to deal with precision object detection is a key kernel of autonomous driving and other smart services. In recent years, many approaches have been proposed to improve its accuracy but with the expense of high resource consumption. It is noted that the practical issues of speed, accuracy and resource consumption are equally important for practical application. In this thesis, we propose a low-cost detection system based on convolutional neural networks for autonomous driving. Thus, the proposed detection system tried to collect all lightweight ideas. After analyzing all potential concepts for object detection, we design a multiple paths detector (MPD) formed by the backbone module, the detection module, the bounding box segmentation module to improve detection performance and alleviate their shortcomings. Besides, we revise the loss function to take into account feature correlation to make the detector more robust. The detection goal of this thesis is to identify the objects, including vehicles, scooters, and pedestrians on the roads of Taiwan. In order to adapt the proposed MPD system to friendly-used in Taiwan, we not only use public dataset but also collect the road scenes in Taiwan. The experimental results show that the proposed MPD system achieves a balance between accuracy, and speed and has good mAP (mean average precision) results.
[1] C. P. Papageorgiou, M. Oren and T. Poggio, "A general framework for object detection," Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), Bombay, India, 1998, pp. 555-562, doi: 10.1109/ICCV.1998.710772.
[2] A. Borji, M. Cheng, H. Jiang and J. Li, "Salient object detection: a benchmark," in IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5706-5722, Dec. 2015, doi: 10.1109/TIP.2015.2487833.
[3] J. Long, E. Shelhamer, & T. Darrell (2015). "Fully convolutional networks for semantic segmentation." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
[4] K. He, G.Gkioxari, P. Dollár, & R.Girshick (2017). "Mask r-cnn." In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
[5] V. Badrinarayanan, A. Kendall and R. Cipolla, "SegNet: a deep convolutional encoder-decoder architecture for image segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 1 Dec. 2017, doi: 10.1109/TPAMI.2016.2644615.
[6] R. M. Haralick, K. Shanmugam and I. Dinstein, "Textural features for image classification," in IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-3, no. 6, pp. 610-621, Nov. 1973, doi: 10.1109/TSMC.1973.4309314.
[7] D. Lu, & Q. Weng (2007). "A survey of image classification methods and techniques for improving classification performance." International journal of Remote sensing, 28(5), 823-870.
[8] D. G. Lowe, "Object recognition from local scale-invariant features," Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 1999, pp. 1150-1157 vol.2, doi: 10.1109/ICCV.1999.790410.
[9] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-up robust features (SURF)," Computer vision and image understanding, vol. 110, pp. 346-359, 2008.
[10] E. Rublee, V. Rabaud, K. Konolige and G. Bradski, "ORB: an efficient alternative to SIFT or SURF," 2011 International Conference on Computer Vision, Barcelona, 2011, pp. 2564-2571, doi: 10.1109/ICCV.2011.6126544.
[11] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, pp. 119-139, 1997.
[12] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, pp. 886-893.
[13] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, "Object detection with discriminatively trained part-based models," IEEE transactions on pattern analysis and machine intelligence, vol. 32, pp. 1627-1645, 2010.
[14] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A practical guide to support vector classification," 2003.
[15] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.
[16] L. Breiman, "Random forests." Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
[17] J.Yan, Z. Lei, L Wen, & S. Z. Li (2014). "The fastest deformable part model for object detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2497-2504).
[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[19] J. Deng, W. Dong, R. Socher, L. Li, Kai Li and Li Fei-Fei, "ImageNet: a large-scale hierarchical image database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848.
[20] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, et al.., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, pp. 541-551, 1989.
[21] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[22] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[23] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al.., "Mobilenets: efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[24] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "Squeezenet: alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size," arXiv preprint arXiv:1602.07360, 2016.
[25] R.Girshick, J.Donahue, T. Darrell, & J.Malik (2014). "Rich feature hierarchies for accurate object detection and semantic segmentation." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
[26] R.Girshick (2015). "Fast r-cnn." In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
[27] S. Ren, , K. He, R.Girshick, & J. Sun (2015). "Faster r-cnn: towards real-time object detection with region proposal networks." In Advances in neural information processing systems (pp. 91-99).
[28] J. R.Uijlings, K. E. Van De Sande, T. Gevers, & A. W. Smeulders (2013). "Selective search for object recognition." International journal of computer vision, 104(2), 154-171.
[29] J. Redmon, S. Divvala, R. Girshick, & A. Farhadi (2016). "You only look once: unified, real-time object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[30] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, & A. C. Berg (2016, October). "Ssd: single shot multibox detector." In European conference on computer vision (pp. 21-37). Springer, Cham.
[31] C. Y. Fu, W. Liu, A. Ranga, A. Tyagi, & A. C. Berg (2017). "Dssd: Deconvolutional single shot detector." arXiv preprint arXiv:1701.06659.
[32] Z. Li, & F. Zhou (2017). "FSSD: feature fusion single shot multibox detector." arXiv preprint arXiv:1712.00960.
[33] S. Ioffe, & C. Szegedy (2015). "Batch normalization: accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167.
[34] Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, & A. L.Yuille (2018). "Single-shot object detection with enriched semantics." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5813-5821).
[35] X. Wang, R. Girshick, A. Gupta, & K. He (2018). "Non-local neural networks." In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794-7803).
[36] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, & A. Torralba (2014). "Object detectors emerge in deep scene cnns." arXiv preprint arXiv:1412.6856.
[37] C. L. Zitnick, & P. Dollár (2014, September). "Edge boxes: locating object proposals from edges." In European conference on computer vision (pp. 391-405). Springer, Cham.
[38] Mobilenet SSD: https://github.com/tensorflow/models/tree/master/research/object_
detection.
[39] W. J. Yang, Y. H. Chen, P. C. Chung and J. F. Yang, “Lightweight moving object detector trained with a small training set,"Proc. of International Conference on Mechatronic, Automobile, and Environmental Engineering, Shizuoka, Japan, July 2019.
校內:2025-07-20公開