簡易檢索 / 詳目顯示

研究生: 陳少甫
Chen, Shao-Fu
論文名稱: 追蹤式YOLO網路之強健車輛偵測系統
A Robust Vehicle Detection System by Using Tracked YOLO Networks
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 50
中文關鍵詞: 深度學習自駕車YOLO長短期記憶模型(LSTM)物件偵測
外文關鍵詞: Deep Learning, Self-driving Car, YOLO(You Only Look Once), Long Short Term Memory(LSTM), Object Detection
相關次數: 點閱:59下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自駕車是近年熱門話題與技術重要研究的方向,隨著偵測深度學習的蓬勃發展,自駕車的實現是指日可待的。不管是未來的自駕車還是現行的先進輔助駕駛系統(Advanced Driving Assistant System,簡稱ADAS),道路安全行駛的控管往往是排在第一順位。因此,自駕車對於前方車輛的強健偵測與追蹤顯得格外重要。本論文,我們利用長短期記憶 (Long Short Term Memory, 簡稱LSTM)系統,提出了一套基於YOLO類神經網路的車偵測系統,主要是希望長短期記憶模型(LSTM)能補綴YOLOv2在某些時間點無法偵測的物件。我們首先設計車輛式YOLO網路篩選道路上的小汽車、巴士與卡車,並將產生出的物件位置資訊加以整理。然後,我們提出的車輛狀態決定系統以判定YOLO前方偵測物件的狀態,並依序將所有多物件指標為:物件消失、新物件出現與狀態維持等三個狀態。待狀態確立後,我們將資料匯入長短期記憶模型(LSTM)網路以穩定追蹤新物件與存在的物件。實驗的測試結果,我們能夠補足原YOLO網路漏掉物件,並提供穩定且精確的物件偵測系統。相較於原YOLOv2的模型,YOLO-vehicle不但提高了32.68%偵測率,也同時降低了1.77%的誤判率。

    Autonomous driving car is a popular subject and an important research topic in the world in recent years. With extraordinary developments of deep learning detection techniques, the fully automatic self-driving cars are going to be available in the near future. Driving with safety control is always the first priority no matter in the future self-driving car and the current advanced driving assistant systems (ADAS). Thus, it is obvious that the robust vehicle detection and tracking becomes the most important issue nowadays. In this thesis, we propose a you-only-look-once (YOLO) vehicle detection neural network by using long short term memory (LSTM) system to successfully catch up with those absent bounding boxes of missing objects. First, the YOLO-vehicle is trained to detect cars, buses and trucks on the roads. We then analyze the classes and locations information of the identified vehicles and design a vehicle status decision system to classify the status of the detections obtained from YOLO-Vehicle. Dealing with multiple vehicles, the decision system can classify each object into: disappeared, new coming or same object status. After the decision, the detected information depended on decision status will be further forward to the LSTM module to release the disappeared objects, to track new objects and the existing objects. Experimental results show that the proposed system can compensate disappeared objects in some of the timing by our stable object tracking system.

    摘 要 I Abstract II Contents V List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 2 1.3 Literature Review 3 1.4 Thesis Organization 6 Chapter 2 Related Work 8 2.1 Convolutional Neural Network 8 2.1.1 Convolutional Layers 9 2.1.2 Pooling Layers 10 2.1.3 Activation Function 11 2.1.4 Fully Connected Layers (Dense Layer) 12 2.2 Non-maximum Suppression(NMS) 13 2.3 You Only Look Once (YOLO) 13 2.3.1 YOLO 14 2.3.2 YOLOv2 16 2.4 Long Short Term Memory (LSTM) 18 Chapter 3 The Proposed Object Tracking by Using LSTM Based on YOLO-vehicle 21 3.1 Overview of the Proposed System 22 3.2 YOLO-vehicle Detector 23 3.2.1 Data Representation of YOLO Predictor 23 3.2.2 Proposed Network Structure 25 3.2.3 Loss Functions 28 3.3 Vehicle LSTM Tracker 29 Chapter 4 Experimental Results 35 4.1 Environmental Settings and Datasets 35 4.2 Comparisons of Different Vehicle Detection Methods 38 4.3 Vehicle Detection with LSTM Tracker 41 Chapter 5 Conclusions 46 Chapter 6 Future Work 47 Reference 48

    [1] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [2] J. Redmon, and A. Farhadi, "YOLO9000: better, faster, stronger." pp. 7263-7271.
    [3] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields,” arXiv preprint arXiv:1812.08008, 2018.
    [4] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. V. Gehler, and B. Schiele, "Deepcut: Joint subset partition and labeling for multi person pose estimation." pp. 4929-4937.
    [5] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
    [6] Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification." pp. 1701-1708.
    [7] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation." pp. 3431-3440.
    [8] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn." pp. 2961-2969.
    [9] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.
    [10] N. Dalal, and B. Triggs, "Histograms of oriented gradients for human detection." pp. 886--893.
    [11] D. G. Lowe, "Object recognition from local scale-invariant features." pp. 1150-1157.
    [12] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627-1645, 2009.
    [13] P. Viola, and M. Jones, “Rapid object detection using a boosted cascade of simple features,” CVPR (1), vol. 1, pp. 511-518, 2001.
    [14] Y. Freund, and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119-139, 1997.
    [15] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
    [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks." pp. 1097-1105.
    [17] M. D. Zeiler, and R. Fergus, "Visualizing and understanding convolutional networks." pp. 818-833.
    [18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions." pp. 1-9.
    [19] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition." pp. 770-778.
    [20] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks." pp. 7132-7141.
    [21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database." pp. 248-255.
    [22] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation." pp. 580-587.
    [23] R. Girshick, "Fast r-cnn." pp. 1440-1448.
    [24] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks." pp. 91-99.
    [25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection." pp. 779-788.
    [26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector." pp. 21-37.
    [27] V. Nair, and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines." pp. 807-814.
    [28] S. Ioffe, and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
    [29] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231-1237, 2013.
    [30] M. Hermans, and B. Schrauwen, "Training and analysing deep recurrent neural networks." pp. 190-198.
    [31] A. Graves, A.-r. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks." pp. 6645-6649.

    下載圖示 校內:2024-09-01公開
    校外:2024-09-01公開
    QR CODE