簡易檢索 / 詳目顯示

研究生: 蔣允中
Jiang, Yun-Zhong
論文名稱: 多任務學習:整合不同道路資料集特徵並基於單一骨幹網絡產出多項任務結果
Multi-Task Learning: Combining Features from Different Road Datasets and Producing Multiple Task Outputs Based on a Single Backbone Network
指導教授: 許志仲
Hsu, Chih-Chung
學位類別: 碩士
Master
系所名稱: 管理學院 - 數據科學研究所
Institute of Data Science
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 48
中文關鍵詞: 多任務學習自動駕駛電腦視覺深度學習
外文關鍵詞: Multi-task learning, Autonomous driving, Computer vision, Deep learning
相關次數: 點閱:64下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在設計並訓練一個具有強泛化能力的多任務學習模型,能夠處理和輸出多種不同任務的結果,並以自動駕駛系統的電腦視覺任務(即時物體偵測、語意分割和影像分類) 作為應用案例,為了實現這一目標,本研究使用了強大的YOLOv6骨幹網絡作為基礎模型,融合了在各種下游任務中表現優異的模組,達到多任務間的平衡,確保模型在保持高效推理速度的同時,能在不同任務上都具有良好的表現。
      研究中引入了交互式發展學習(ADL)策略,用於多任務學習框架的訓練過程,通過交替訓練不同的任務分支,有效解決了不同資料集之間的干擾問題,並避免了參數災難性遺忘的現象,提升了模型的穩定性,實驗結果顯示,本研究提出的多任務模型在多項指標上均展示出優異的性能,物件偵測在Tsinghua-Tencent 100K 資料集得到了0.719 的分數(mAP50),語意分割在BDD100K 資料集則有0.264(mIoU),而StanfordCars 資料集的影像分類分數為0.803(top-1 acc),雖然在某些專門的單任務性指標上略有不足,但綜合能力讓我們在需要同時處理多種任務的情況下具有優勢,也說明我們的模型能夠在不同的交通場景下保持較好的準確性和推理速度等綜合能力,展示了其在實際應用中的潛力,為後續研究提供了有價值的參考和指引。

    This study aims to design and train a multi-task learning model with strong generalization capabilities, capable of handling and outputting results for various tasks. The model’s application case focuses on computer vision tasks in autonomous driving systems, including real-time object detection, semantic segmentation, and image classification. To achieve this goal, the study utilizes the powerful YOLOv6 backbone network as the base model, integrating modules that perform well across various downstream tasks. This approach balances multiple tasks, ensuring the model maintains high inference speed while delivering strong performance across different tasks.
    The study introduces an Adaptive Development Learning (ADL) strategy to the training process of the multi-task learning framework. By alternately training different task branches, this approach effectively addresses the interference between different datasets and avoids catastrophic forgetting of parameters, thereby enhancing model stability. Experimental results show that the proposed multi-task model demonstrates outstanding performance across multiple metrics: it achieves a score of 0.719 (mAP50) in object detection on the Tsinghua-Tencent 100K dataset, 0.264 (mIoU) in semantic segmentation on the BDD100K dataset, and 0.803 (top-1 accuracy) in image classification on the StanfordCars dataset. Although the model may slightly underperform on some specialized single-task metrics, its comprehensive capabilities provide a significant advantage when handling multiple tasks simultaneously. This also indicates that our model maintains good accuracy and inference speed across different traffic scenarios, showcasing its potential for practical applications and offering valuable insights and guidance for future research.

    摘要 i 目錄 vii 表目錄 ix 圖目錄 x 第一章緒論 1 1.1. 研究背景與動機 1 1.2. 研究目的 3 1.3. 資料集說明 3 1.4. 研究挑戰 6 1.5. 論文貢獻 7 1.6. 論文架構 8 第二章文獻探討 9 2.1. 深度學習於電腦視覺之基礎任務 9 2.2. YOLO 主系列網絡架構 11 2.3. YOLOP 多任務全景分割網絡 12 2.4. 即時語意分割PIDNet 12 2.5. 漸進式迭代學習策略 13 第三章研究方法 14 3.1. 研究問題定義 14 3.2. 方法流程設計 14 3.3. 模型架構 16 3.3.1. 骨幹網絡(Backbone) 16 3.3.2. 頸部網絡(Neck) 17 3.3.3. 任務頭(Head) 19 3.4. 訓練技巧 21 3.4.1. 網絡凍結(Freeze Network) 21 3.4.2. Alternating Development Learning(ADL) 22 3.4.3. 損失函數(Loss Function) 23 第四章實驗結果討論 25 4.1. 評估指標 25 4.2. 數據增強及前處理 27 4.3. 參數設定 27 4.4. 實例演示訓練策略 29 4.5. 基礎三項任務比較 29 4.6. 推論速度、模型大小、參數比較 31 4.7. 消融實驗 32 4.8. 參數敏感度分析 32 第五章結論與未來研究展望 34 參考文獻 35

    [1] Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. Yolox: Exceeding yolo series in 2021, 2021.
    [2] Hong, Y., Pan, H., Sun, W., and Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes, 2021.
    [3] Hsu, C., Jiang, Y.-Z., and Huang, W.-H. Swift concurrent semantic segmentation and object detection on edge devices. In 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (2023), pp. 40–45.
    [4] Krause, J., Stark, M., Deng, J., and Fei-Fei, L. 3d object representations for fine-grained categorization. In 2013 IEEE International Conference on Computer Vision Workshops (2013), pp. 554–561.
    [5] Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., Li, Y., Zhang, B., Liang, Y., Zhou, L., Xu, X., Chu, X., Wei, X., and Wei, X. Yolov6: A single-stage object detection framework for industrial applications, 2022.
    [6] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. Single shot multibox detector. In European Conference on Computer Vision (2016), Springer, pp. 21–37.
    [7] Redmon, J., and Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
    [8] Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).
    [9] Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition, 2015.
    [10] Tan, M., and Le, Q. V. Efficientnetv2: Smaller models and faster training, 2021.
    [11] Ultralytics. Ultralytics github repository. https://github.com/ultralytics/ultralytics, 2024.
    [12] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., and Xiao, B. Deep high-resolution representation learning for visual recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), IEEE, pp. 5693–5703.
    [13] Wu, D., Chen, Y., Yuan, L., Liu, Y., Sheng, L., Shi, J., Cao, Y., and Liu, S. Yolop: You only look once for panoptic driving perception. arXiv preprint arXiv:2108.11250 (2021).
    [14] Xu, H., Wang, J., Han, X., Ding, E., Tao, D., and Huang, G. Pidnet: A real-time semantic segmentation network inspired by pid controllers. arXiv preprint arXiv:2206.02066 (2022).
    [15] Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., and Sang, N. Rethinking bisenet for realtime semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 9716–9725.
    [16] Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 2633–2642.
    [17] Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., and Hu, S. Traffic-sign detection and classification in the wild. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2110–2118.

    無法下載圖示 校內:2029-08-26公開
    校外:2029-08-26公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE