簡易檢索 / 詳目顯示

研究生: 盧柏翰
LU, POHAN
論文名稱: 面向多場景的小物件偵測之高效率 YOLOv11 輕量化模型
An Efficient Lightweight YOLOv11 Model for Small Object Detection in Multi-Scene Environments
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2026
畢業學年度: 114
語文別: 英文
論文頁數: 91
中文關鍵詞: 小物件偵測輕量化模型多尺度偵測架構GhostConvYOLO多場景學習
外文關鍵詞: Small Object Detection, Lightweight Model, Multi-scale Detection Architecture, GhostConv, YOLO, Multi-scene Learning
ORCID: 0009-0002-9151-181X
ResearchGate: Deep Learning
相關次數: 點閱:52下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在多場景環境中進行小物件偵測,常因目標尺度有限與視覺條件複雜而面臨挑戰,而實際部署情境亦對模型的體積與計算效率提出嚴格限制。本論文提出一種以 YOLOv11 為基礎的輕量化、部署導向小物件偵測框架,適用於多元場景下的實務應用。本研究以 YOLOv11-nano 為基準模型,透過效率導向的架構調整策略,包括通道重新配置(Channel Reallocation)、基於 GhostConv 的特徵融合,以及加入 P2 分支的多尺度偵測設計,以在不增加網路深度或引入重量級模組的前提下,保留高解析度空間特徵。實驗結果顯示,於 TACO、PlastOPol 與 VisDrone 等資料集上,所提出之方法能在顯著降低模型體積與參數量的情況下,維持良好的小物件偵測能力。整體而言,本方法在模型體積約降低40%、參數量控制於1.5M的條件下,提供一個具備實際部署可行性的多場景小物件偵測解決方案。

    Small object detection in multi-scene environments is challenging due to limited object scale and complex visual conditions, while practical deployment requires compact and efficient detection models. This thesis presents a lightweight, deployment-oriented object detection framework based on YOLOv11 for small-object detection across diverse scenes. Built on the YOLOv11-nano baseline, the proposed method applies efficiency-driven architectural refinements, including channel reallocation, GhostConv-based feature fusion, and an extended multi-scale design with an additional P2 branch to preserve high-resolution spatial information, without increasing network depth or introducing heavyweight modules. Experiments on the TACO, PlastOPol, and VisDrone datasets show that the proposed framework maintains effective small-object detection capability under significantly reduced model size and parameter budgets. With approximately 40% reduction in model size and a parameter count of 1.5M, the proposed method offers a practical solution for deployment-oriented small-object detection in multi-scene environments.

    中文摘要 i Abstract ii Acknowledgements iii Contents iv List of Tables viii List of Figures ix 1 Introduction 1 1.1 Research Background and Motivation 1 1.2 Problem Statement 3 1.3 Research Objective and Contributions 4 1.4 Thesis Organization 5 2 Related Works 6 2.1 Two-Stage and One-Stage Object Detection 6 2.2 Evolution of the YOLO Series 8 2.2.1 YOLOv8: Anchor-Free Design and Task Decoupling 8 2.2.2 YOLOv9: Gradient Regulation and Efficient Feature Aggregation 8 2.2.3 YOLOv10: Consistent Optimization between Training and Inference 9 2.2.4 YOLOv11: Unified Accuracy–Efficiency Optimization 10 2.2.5 YOLOv12: Enhanced Multi-Scale Reasoning and Transformer Integration 12 2.3 Small Object Detection 13 2.3.1 Challenges in Small Object Detection 13 2.3.2 Existing Strategies for Small Object Detection 14 2.4 Efficient Feature Extraction for Lightweight Object Detection 16 2.4.1 CNN-Based Feature Extraction and Its Limitations 16 2.4.2 Lightweight Convolutions and Feature Redundancy Reduction 17 2.4.3 Ghost Convolution for Efficient Feature Generation 17 2.4.4 Channel Redundancy and Channel Reallocation 19 2.5 Attention Mechanisms in Multi-Scale Vision Networks 20 2.5.1 Overview of Attention Mechanisms in Vision Models 20 2.5.2 Attention Placement within Hierarchical Architectures 21 2.5.3 Multi-Scale Feature Aggregation with Attention 22 3 The Proposed Method 24 3.1 Design Motivation and Overall Framework 24 3.2 Lightweight Backbone via Channel Reallocation 27 3.2.1 Motivation and Design Rationale 27 3.2.2 Stage-aware Channel Reallocation Strategy 28 3.2.3 Empirical Trade-off and Channel Cap Selection 31 3.2.4 Relation to Existing Compression Methods 31 3.3 GhostConv-based Bi-directional Feature Fusion 32 3.3.1 Design Motivation 32 3.3.2 GhostConv for Efficient Feature Generation 33 3.3.3 Bi-directional Feature Fusion Strategy 34 3.4 Multi-scale Feature Pyramid Design 35 3.4.1 P2 Branch 36 3.4.2 4-Scale Detection Head 37 3.4.3 Accuracy–Efficiency Trade-off 37 3.5 Attention Module Integration 38 3.5.1 SPPF for Global Context Aggregation 38 3.5.2 C2PSA for Context-aware Feature Refinement 39 3.5.3 Design Rationale and Extensibility 41 3.6 Loss Function 41 4 Performance Evaluation 44 4.1 Experimental Datasets 44 4.1.1 TACO Dataset 44 4.1.2 VisDrone Dataset 45 4.1.3 PlastOPol Dataset 46 4.2 Evaluation Metrics 47 4.2.1 Mean Average Precision (mAP@0.5) 48 4.2.2 Model Size 49 4.2.3 Computational Complexity (GFLOPs) 49 4.3 Implementation Setting 49 4.4 Quantitative Results 51 4.4.1 Results on the TACO Dataset 51 4.4.2 Results on the VisDrone Dataset 56 4.4.3 Results on the PlastOPol Dataset 65 4.4.4 Cross-Dataset Discussion 69 4.5 Ablation Experimental Results 70 4.5.1 Effect of Channel Reallocation (CR) 70 4.5.2 Effect of GhostConv (GC) 71 4.5.3 Contribution of Extended Pyramid Levels (P2-P5) 71 4.5.4 Effect of the 4-Scale Detect Head 72 4.5.5 Overall Analysis 72 5 Conclusion and Future Work 73 5.1 Conclusion 73 5.2 Future Work 75 References 77

    [1] W. Wei, Y. Cheng, J. He, and X. Zhu, “A review of small object detection based on deep learning,” Neural Computing and Applications, vol. 36, no. 12, pp. 6283–6303, 2024.

    [2] R. Varghese and M. Sambath, “Yolov8: A novel object detection algorithm with enhanced performance and robustness,” in 2024 International conference on advances in data engineering and intelligent computing systems (ADICS). IEEE, 2024, pp. 1–6.

    [3] C.-Y. Wang, I.-H. Yeh, and H.-Y. Mark Liao, “Yolov9: Learning what you want to learn using programmable gradient information,” in European conference on computer vision. Springer, 2024, pp. 1–21.

    [4] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han et al., “Yolov10: Real-time end-to-end object detection,” Advances in Neural Information Processing Systems, vol. 37, pp. 107 984–108 011, 2024.

    [5] R. Khanam and M. Hussain, “Yolov11: An overview of the key architectural enhancements,” arXiv preprint arXiv:2410.17725, 2024.

    [6] Y. Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time object detectors,” arXiv preprint arXiv:2502.12524, 2025.

    [7] X. Xu, Q. Li, J. Pan, X. Lu, H. Wei, M. Sun, and H. Zhang, “Esod-yolo: an enhanced efficient small object detection framework for aerial images,” Computing, vol. 107, no. 2, p. 54, 2025.
    [8] M. Chao, C. Peng, L. Yun, C. Zhang, H. Wang, and Z. Chen, “A lightweight small object detection model for uav images based on deep semantic integration,” Scientific Reports, vol. 15, no. 1, p. 31888, 2025.

    [9] M. Hu, Z. Li, J. Yu, X. Wan, H. Tan, and Z. Lin, “Efficient-lightweight yolo: improving small object detection in yolo for aerial images,” Sensors, vol. 23, no. 14, p. 6423, 2023.

    [10] B. Du, Y. Huang, J. Chen, and D. Huang, “Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 13 435–13 444.

    [11] C. Zhang and J. Yang, “Emsd-detr: efficient small object detection for uav aerial images based on enhanced rt-detr model,” The Journal of Supercomputing, vol. 81, no. 9, pp. 1–33, 2025.

    [12] D. Chen and L. Zhang, “Sl-yolo: A stronger and lighter drone target detection model,” arXiv preprint arXiv:2411.11477, 2024.

    [13] H.-K. Jung, “Yolo-drone: An efficient object detection approach using the ghosthead network for drone images,” arXiv preprint arXiv:2511.10905, 2025.

    [14] G. Sun and F. Zhang, “Drse-yolo: Efficient and lightweight architecture for accurate waste detection,” IET Image Processing, vol. 19, no. 1, p. e70022, 2025.

    [15] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap operations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, arXiv:1911.11907.

    [16] X. Chen, C. Yang, J. Mo, Y. Sun, H. Karmouni, Y. Jiang, and Z. Zheng, “Cspnext: A new efficient token hybrid backbone,” Engineering Applications of Artificial Intelligence, vol. 132, p. 107886, 2024.

    [17] X. Li et al., “Slim-neck by gsconv: A better design paradigm of feature fusion for object detectors,” arXiv preprint arXiv:2206.02424, 2022.

    [18] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.

    [19] P. Proenc¸a and P. Sim˜oes, “Taco: Trash annotations in context for litter detection,” CVPR Workshops, 2020.

    [20] P. Consortium, “Plastopol: A dataset for plastic pollution detection in coastal environments,” 2022. [Online]. Available: https://research.wur.nl/en/datasets/ plastopol-a-dataset-for-litter-detection/

    [21] D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, T. Peng, J. Zheng, X. Wang, Y. Zhang et al., “Visdrone-det2019: The vision meets drone object detection in image challenge results,” in Proceedings of the IEEE/CVF international conference on computer vision workshops, 2019, pp. 0–0.

    下載圖示
    校外:立即公開
    QR CODE