簡易檢索 / 詳目顯示

研究生: 陳品彤
Chen, Pin-Tong
論文名稱: 結合多重監督以提升基於查詢的多項式建模車道線偵測網路訓練效率
Enhancing Training Efficiency of Query-Based Lane Detection with Polynomial Modeling via Mixed Supervision
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 51
中文關鍵詞: 深度學習自動駕駛車道線辨識基於查詢之偵測網路Transformer
外文關鍵詞: Deep Learning, Autonomous Driving, Lane Detection, Query-based Detection Network, Transformer
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 車道線偵測是自動駕駛中的核心任務,對於車輛控制、路徑規劃與行車安全至關重要。近年來,LSTR 等基於查詢的框架利用多項式曲線來表示車道線,並採用 Transformer 架構,推動了此領域的進展。然而,這類方法通常依賴匈牙利演算法進行一對一匹配,導致監督訊號稀疏,進而使訓練收斂速度變慢。本論文引入一種高效的混合監督策略,以改善基於查詢的車道線偵測模型的訓練效率。不同於其他方法需額外引入查詢,我們透過調整 Transformer 解碼器中自注意力與交叉注意力的順序,提取中間特徵並應用一對多的匹配損失進行監督。此設計提供更穩定的訓練訊號,同時不改變原有模型的推論結構。
    本研究在 TuSimple 與 CULane 兩個車道線資料集上進行實驗,結果顯示所提出的方法在不增加計算成本的情況下,能夠有效提升訓練效率並維持準確性,為實現穩健且高效的自動駕駛車道線偵測提供一項實用方案。

    Lane detection is a fundamental task in autonomous driving, essential for vehicle control, path planning, and safety. Recent query-based frameworks like LSTR have advanced this field by representing lane lines as polynomial curves and using transformer-based architectures. However, these methods rely on one-to-one matching via the Hungarian algorithm, which often leads to sparse supervision and slow training convergence. This thesis introduces an efficient mixed supervision strategy for query-based lane detection. Instead of introducing additional queries, we extract intermediate decoder features by reordering self-attention and cross-attention layers and supervising them using a one-to-many matching loss. This design provides stronger training signals and improves convergence, all while maintaining the original model structure during inference.
    Experiments on the TuSimple and CULane datasets show that the proposed method achieves competitive accuracy with significantly improved training efficiency. Our results demonstrate that mixed supervision can enhance the performance of query-based lane detectors without increasing computational cost, offering a practical solution for robust and efficient autonomous driving systems.

    摘 要 II Abstract III 誌謝 IV Contents V List of Tables IX List of Figures X Chapter 1 1 Introduction 1 1.1. Research Background 1 1.2. Motivations 2 1.3. Literature Reviews 4 1.3.1 Based on Lane Representations 4 1.3.2 Based on Prediction Mechanisms 5 1.4. Thesis Organization 6 Chapter 2 7 Related Work 7 2.1. Query- based Detection Network 7 2.2 Mixed Supervision in Transformer-based Detection 9 Chapter 3 11 The Proposed Query-based Lane Detection System with Mixed Supervision 11 3.1 Overview of the Proposed MS-LSTR System 11 3.2. Lane Curve Modeling 12 3.2.1 Curve Definition on the Ground Plane 12 3.2.2 Projection onto the Image Plane (Without Pitch) 13 3.2.3 Generalization for a Tilted Camera 14 3.3 Model Architecture 15 3.3.1 Transformer Encoder 16 3.3.2 Transformer Decoders 17 3.4 Matching Costs an Losses Calculation 18 3.4.1 Matching Cost 19 3.4.2 Loss Function 21 3.4.2.1 One-to-one Loss 22 3.4.2.2 One-to-many Loss 22 3.4.2.3 Total Loss 23 Chapter 4 24 Experimental Results 24 4.1. Experimental Environments 24 4.2. Datasets and Evaluation Metrics 25 4.3 Implementation Details 27 4.4. Quantitative Results Comparing to Other Methods 28 4.5. Qualitative Results Compare to Other Methods 31 4.5.1 Qualitative Results on CULane 31 4.5.2 Qualitative Results on TuSimple 32 4.6 Ablation Studies 34 4.6.1 Effectiveness of Mixed Supervision on Training Efficiency 34 4.6.2 Effect of Descendance of One-to-many (O2M) Matching Loss Weight 35 Chapter 5 37 Conclusions 37 Chapter 6 38 Future Work 38 References 39

    [1] X. Pan, J. Shi, P. Luo, X. Wang and X. Tang, "Spatial As Deep: Spatial CNN for Traffic Lane Detection," in Proc. of AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.

    [2] TuSimple, "TuSimple Lane Detection Challenge," [Online]. Available: https://github.com/TuSimple/tusimple-benchmark [Accessed: Jul. 2025].
    [3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin, "Attention Is All You Need," Advances in Neural Information Processing Systems (NeurIPS), vol. 30, pp. 5998–6008, 2017.
    [4] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S., End-to-end object detection with transformers, European Conference on Computer Vision (ECCV), pp. 213–229, 2020.
    [5] Y. Zhang, X. Liu, Z. Wang, L. Yang and D. Yang, "End-to-End Lane Shape Prediction with Transformers," IEEE Intelligent Vehicles Symposium (IV), pp. 288–293, 2021.
    [6] L. Tabelini, R. Berriel, T. M. Paixão, C. Badue, A. F. De Souza and T. Oliveira-Santos, "Keep Your Eyes on the Lane: Real-Time Attention-Guided Lane Detection," in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 294–302, 2021.
    [7] X. Zhang, Y. Sun, J. Jiang, C. Cao and Z. Luo, "Accelerating DETR Convergence via Dynamic Anchor Boxes," in Proc. of European Conference on Computer Vision (ECCV), pp. 147–164, 2022.
    [8] X. Liu, H. Zhang, Y. Wu, Y. Wang and W. Liu, "Efficient DETR Training with Mixed Supervision," in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14577–14586 , 2022.
    [9] Y. Hou, Z. Ma, C. Liu, C. Chang and J. Yan, "Inter-Region Affinity Distillation for Road Marking Segmentation," in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12484–12493, 2020,
    [10] T. Zheng, X. Li, Q. Sun, H. Lu and L. Cheng, "CLRNet: Cross Layer Refinement Network for Lane Detection," in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1028–1035, 2022.
    [11] D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans and L. Van Gool, "Towards End-to-End Lane Detection: an Instance Segmentation Approach," in Proc. of IEEE Intelligent Vehicles Symposium (IV), pp. 286–291, 2018.
    [12] Behrendt, K., & Soussan, R., Unsupervised labeled lane markers using maps, IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. –, 2019.
    [13] Wang, R., Qin, J., Li, K., Li, Y., Cao, D., & Xu, J., BEV-LaneDet: An efficient 3D lane detection based on virtual camera via key-points, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1002–1011, 2023.
    [14] Wang, H., Qin, J., Li, K., Li, Y., Cao, D., Xu, J., et al., OpenLane-V2: A topology reasoning benchmark for unified 3D HD mapping, Advances in Neural Information Processing Systems (NeurIPS), vol. 36, pp. 18873–18884, 2023.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE