| 研究生: |
蔣允中 Jiang, Yun-Zhong |
|---|---|
| 論文名稱: |
多任務學習:整合不同道路資料集特徵並基於單一骨幹網絡產出多項任務結果 Multi-Task Learning: Combining Features from Different Road Datasets and Producing Multiple Task Outputs Based on a Single Backbone Network |
| 指導教授: |
許志仲
Hsu, Chih-Chung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 數據科學研究所 Institute of Data Science |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 多任務學習 、自動駕駛 、電腦視覺 、深度學習 |
| 外文關鍵詞: | Multi-task learning, Autonomous driving, Computer vision, Deep learning |
| 相關次數: | 點閱:64 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在設計並訓練一個具有強泛化能力的多任務學習模型,能夠處理和輸出多種不同任務的結果,並以自動駕駛系統的電腦視覺任務(即時物體偵測、語意分割和影像分類) 作為應用案例,為了實現這一目標,本研究使用了強大的YOLOv6骨幹網絡作為基礎模型,融合了在各種下游任務中表現優異的模組,達到多任務間的平衡,確保模型在保持高效推理速度的同時,能在不同任務上都具有良好的表現。
研究中引入了交互式發展學習(ADL)策略,用於多任務學習框架的訓練過程,通過交替訓練不同的任務分支,有效解決了不同資料集之間的干擾問題,並避免了參數災難性遺忘的現象,提升了模型的穩定性,實驗結果顯示,本研究提出的多任務模型在多項指標上均展示出優異的性能,物件偵測在Tsinghua-Tencent 100K 資料集得到了0.719 的分數(mAP50),語意分割在BDD100K 資料集則有0.264(mIoU),而StanfordCars 資料集的影像分類分數為0.803(top-1 acc),雖然在某些專門的單任務性指標上略有不足,但綜合能力讓我們在需要同時處理多種任務的情況下具有優勢,也說明我們的模型能夠在不同的交通場景下保持較好的準確性和推理速度等綜合能力,展示了其在實際應用中的潛力,為後續研究提供了有價值的參考和指引。
This study aims to design and train a multi-task learning model with strong generalization capabilities, capable of handling and outputting results for various tasks. The model’s application case focuses on computer vision tasks in autonomous driving systems, including real-time object detection, semantic segmentation, and image classification. To achieve this goal, the study utilizes the powerful YOLOv6 backbone network as the base model, integrating modules that perform well across various downstream tasks. This approach balances multiple tasks, ensuring the model maintains high inference speed while delivering strong performance across different tasks.
The study introduces an Adaptive Development Learning (ADL) strategy to the training process of the multi-task learning framework. By alternately training different task branches, this approach effectively addresses the interference between different datasets and avoids catastrophic forgetting of parameters, thereby enhancing model stability. Experimental results show that the proposed multi-task model demonstrates outstanding performance across multiple metrics: it achieves a score of 0.719 (mAP50) in object detection on the Tsinghua-Tencent 100K dataset, 0.264 (mIoU) in semantic segmentation on the BDD100K dataset, and 0.803 (top-1 accuracy) in image classification on the StanfordCars dataset. Although the model may slightly underperform on some specialized single-task metrics, its comprehensive capabilities provide a significant advantage when handling multiple tasks simultaneously. This also indicates that our model maintains good accuracy and inference speed across different traffic scenarios, showcasing its potential for practical applications and offering valuable insights and guidance for future research.
[1] Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. Yolox: Exceeding yolo series in 2021, 2021.
[2] Hong, Y., Pan, H., Sun, W., and Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes, 2021.
[3] Hsu, C., Jiang, Y.-Z., and Huang, W.-H. Swift concurrent semantic segmentation and object detection on edge devices. In 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (2023), pp. 40–45.
[4] Krause, J., Stark, M., Deng, J., and Fei-Fei, L. 3d object representations for fine-grained categorization. In 2013 IEEE International Conference on Computer Vision Workshops (2013), pp. 554–561.
[5] Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., Li, Y., Zhang, B., Liang, Y., Zhou, L., Xu, X., Chu, X., Wei, X., and Wei, X. Yolov6: A single-stage object detection framework for industrial applications, 2022.
[6] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. Single shot multibox detector. In European Conference on Computer Vision (2016), Springer, pp. 21–37.
[7] Redmon, J., and Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[8] Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).
[9] Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition, 2015.
[10] Tan, M., and Le, Q. V. Efficientnetv2: Smaller models and faster training, 2021.
[11] Ultralytics. Ultralytics github repository. https://github.com/ultralytics/ultralytics, 2024.
[12] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., and Xiao, B. Deep high-resolution representation learning for visual recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), IEEE, pp. 5693–5703.
[13] Wu, D., Chen, Y., Yuan, L., Liu, Y., Sheng, L., Shi, J., Cao, Y., and Liu, S. Yolop: You only look once for panoptic driving perception. arXiv preprint arXiv:2108.11250 (2021).
[14] Xu, H., Wang, J., Han, X., Ding, E., Tao, D., and Huang, G. Pidnet: A real-time semantic segmentation network inspired by pid controllers. arXiv preprint arXiv:2206.02066 (2022).
[15] Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., and Sang, N. Rethinking bisenet for realtime semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 9716–9725.
[16] Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 2633–2642.
[17] Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., and Hu, S. Traffic-sign detection and classification in the wild. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2110–2118.
校內:2029-08-26公開