成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	盧柏翰 LU, POHAN
論文名稱：	面向多場景的小物件偵測之高效率 YOLOv11 輕量化模型 An Efficient Lightweight YOLOv11 Model for Small Object Detection in Multi-Scene Environments
指導教授：	戴顯權 Tai, Shen-Chuan
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2026
畢業學年度：	114
語文別：	英文
論文頁數：	91
中文關鍵詞：	小物件偵測、輕量化模型、多尺度偵測架構、GhostConv 、YOLO 、多場景學習
外文關鍵詞：	Small Object Detection, Lightweight Model, Multi-scale Detection Architecture, GhostConv, YOLO, Multi-scene Learning
ORCID：	0009-0002-9151-181X
ResearchGate：	Deep Learning
相關次數：	點閱：52 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在多場景環境中進行小物件偵測，常因目標尺度有限與視覺條件複雜而面臨挑戰，而實際部署情境亦對模型的體積與計算效率提出嚴格限制。本論文提出一種以 YOLOv11 為基礎的輕量化、部署導向小物件偵測框架，適用於多元場景下的實務應用。本研究以 YOLOv11-nano 為基準模型，透過效率導向的架構調整策略，包括通道重新配置（Channel Reallocation）、基於 GhostConv 的特徵融合，以及加入 P2 分支的多尺度偵測設計，以在不增加網路深度或引入重量級模組的前提下，保留高解析度空間特徵。實驗結果顯示，於 TACO、PlastOPol 與 VisDrone 等資料集上，所提出之方法能在顯著降低模型體積與參數量的情況下，維持良好的小物件偵測能力。整體而言，本方法在模型體積約降低40%、參數量控制於1.5M的條件下，提供一個具備實際部署可行性的多場景小物件偵測解決方案。

Small object detection in multi-scene environments is challenging due to limited object scale and complex visual conditions, while practical deployment requires compact and efficient detection models. This thesis presents a lightweight, deployment-oriented object detection framework based on YOLOv11 for small-object detection across diverse scenes. Built on the YOLOv11-nano baseline, the proposed method applies efficiency-driven architectural refinements, including channel reallocation, GhostConv-based feature fusion, and an extended multi-scale design with an additional P2 branch to preserve high-resolution spatial information, without increasing network depth or introducing heavyweight modules. Experiments on the TACO, PlastOPol, and VisDrone datasets show that the proposed framework maintains effective small-object detection capability under significantly reduced model size and parameter budgets. With approximately 40% reduction in model size and a parameter count of 1.5M, the proposed method offers a practical solution for deployment-oriented small-object detection in multi-scene environments.

中文摘要 i
Abstract ii
Acknowledgements iii
Contents iv
List of Tables viii
List of Figures ix
Introduction 1
1 Research Background and Motivation 1
2 Problem Statement 3
3 Research Objective and Contributions 4
4 Thesis Organization 5
Related Works 6
1 Two-Stage and One-Stage Object Detection 6
2 Evolution of the YOLO Series 8
2.1 YOLOv8: Anchor-Free Design and Task Decoupling 8
2.2 YOLOv9: Gradient Regulation and Eﬃcient Feature Aggregation 8
2.3 YOLOv10: Consistent Optimization between Training and Inference 9
2.4 YOLOv11: Uniﬁed Accuracy–Eﬃciency Optimization 10
2.5 YOLOv12: Enhanced Multi-Scale Reasoning and Transformer Integration 12
3 Small Object Detection 13
3.1 Challenges in Small Object Detection 13
3.2 Existing Strategies for Small Object Detection 14
4 Eﬃcient Feature Extraction for Lightweight Object Detection 16
4.1 CNN-Based Feature Extraction and Its Limitations 16
4.2 Lightweight Convolutions and Feature Redundancy Reduction 17
4.3 Ghost Convolution for Eﬃcient Feature Generation 17
4.4 Channel Redundancy and Channel Reallocation 19
5 Attention Mechanisms in Multi-Scale Vision Networks 20
5.1 Overview of Attention Mechanisms in Vision Models 20
5.2 Attention Placement within Hierarchical Architectures 21
5.3 Multi-Scale Feature Aggregation with Attention 22
The Proposed Method 24
1 Design Motivation and Overall Framework 24
2 Lightweight Backbone via Channel Reallocation 27
2.1 Motivation and Design Rationale 27
2.2 Stage-aware Channel Reallocation Strategy 28
2.3 Empirical Trade-oﬀ and Channel Cap Selection 31
2.4 Relation to Existing Compression Methods 31
3 GhostConv-based Bi-directional Feature Fusion 32
3.1 Design Motivation 32
3.2 GhostConv for Eﬃcient Feature Generation 33
3.3 Bi-directional Feature Fusion Strategy 34
4 Multi-scale Feature Pyramid Design 35
4.1 P2 Branch 36
4.2 4-Scale Detection Head 37
4.3 Accuracy–Eﬃciency Trade-oﬀ 37
5 Attention Module Integration 38
5.1 SPPF for Global Context Aggregation 38
5.2 C2PSA for Context-aware Feature Reﬁnement 39
5.3 Design Rationale and Extensibility 41
6 Loss Function 41
Performance Evaluation 44
1 Experimental Datasets 44
1.1 TACO Dataset 44
1.2 VisDrone Dataset 45
1.3 PlastOPol Dataset 46
2 Evaluation Metrics 47
2.1 Mean Average Precision (mAP@0.5) 48
2.2 Model Size 49
2.3 Computational Complexity (GFLOPs) 49
3 Implementation Setting 49
4 Quantitative Results 51
4.1 Results on the TACO Dataset 51
4.2 Results on the VisDrone Dataset 56
4.3 Results on the PlastOPol Dataset 65
4.4 Cross-Dataset Discussion 69
5 Ablation Experimental Results 70
5.1 Eﬀect of Channel Reallocation (CR) 70
5.2 Eﬀect of GhostConv (GC) 71
5.3 Contribution of Extended Pyramid Levels (P2-P5) 71
5.4 Eﬀect of the 4-Scale Detect Head 72
5.5 Overall Analysis 72
Conclusion and Future Work 73
1 Conclusion 73
2 Future Work 75
References 77
                                    

[1] W. Wei, Y. Cheng, J. He, and X. Zhu, “A review of small object detection based on deep learning,” Neural Computing and Applications, vol. 36, no. 12, pp. 6283–6303, 2024.

[2] R. Varghese and M. Sambath, “Yolov8: A novel object detection algorithm with enhanced performance and robustness,” in 2024 International conference on advances in data engineering and intelligent computing systems (ADICS). IEEE, 2024, pp. 1–6.

[3] C.-Y. Wang, I.-H. Yeh, and H.-Y. Mark Liao, “Yolov9: Learning what you want to learn using programmable gradient information,” in European conference on computer vision. Springer, 2024, pp. 1–21.

[4] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han et al., “Yolov10: Real-time end-to-end object detection,” Advances in Neural Information Processing Systems, vol. 37, pp. 107 984–108 011, 2024.

[5] R. Khanam and M. Hussain, “Yolov11: An overview of the key architectural enhancements,” arXiv preprint arXiv:2410.17725, 2024.

[6] Y. Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time object detectors,” arXiv preprint arXiv:2502.12524, 2025.

[7] X. Xu, Q. Li, J. Pan, X. Lu, H. Wei, M. Sun, and H. Zhang, “Esod-yolo: an enhanced eﬃcient small object detection framework for aerial images,” Computing, vol. 107, no. 2, p. 54, 2025.
[8] M. Chao, C. Peng, L. Yun, C. Zhang, H. Wang, and Z. Chen, “A lightweight small object detection model for uav images based on deep semantic integration,” Scientiﬁc Reports, vol. 15, no. 1, p. 31888, 2025.

[9] M. Hu, Z. Li, J. Yu, X. Wan, H. Tan, and Z. Lin, “Eﬃcient-lightweight yolo: improving small object detection in yolo for aerial images,” Sensors, vol. 23, no. 14, p. 6423, 2023.

[10] B. Du, Y. Huang, J. Chen, and D. Huang, “Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 13 435–13 444.

[11] C. Zhang and J. Yang, “Emsd-detr: eﬃcient small object detection for uav aerial images based on enhanced rt-detr model,” The Journal of Supercomputing, vol. 81, no. 9, pp. 1–33, 2025.

[12] D. Chen and L. Zhang, “Sl-yolo: A stronger and lighter drone target detection model,” arXiv preprint arXiv:2411.11477, 2024.

[13] H.-K. Jung, “Yolo-drone: An eﬃcient object detection approach using the ghosthead network for drone images,” arXiv preprint arXiv:2511.10905, 2025.

[14] G. Sun and F. Zhang, “Drse-yolo: Eﬃcient and lightweight architecture for accurate waste detection,” IET Image Processing, vol. 19, no. 1, p. e70022, 2025.

[15] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap operations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, arXiv:1911.11907.

[16] X. Chen, C. Yang, J. Mo, Y. Sun, H. Karmouni, Y. Jiang, and Z. Zheng, “Cspnext: A new eﬃcient token hybrid backbone,” Engineering Applications of Artiﬁcial Intelligence, vol. 132, p. 107886, 2024.

[17] X. Li et al., “Slim-neck by gsconv: A better design paradigm of feature fusion for object detectors,” arXiv preprint arXiv:2206.02424, 2022.

[18] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.

[19] P. Proenc¸a and P. Sim˜oes, “Taco: Trash annotations in context for litter detection,” CVPR Workshops, 2020.

[20] P. Consortium, “Plastopol: A dataset for plastic pollution detection in coastal environments,” 2022. [Online]. Available: https://research.wur.nl/en/datasets/ plastopol-a-dataset-for-litter-detection/

[21] D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, T. Peng, J. Zheng, X. Wang, Y. Zhang et al., “Visdrone-det2019: The vision meets drone object detection in image challenge results,” in Proceedings of the IEEE/CVF international conference on computer vision workshops, 2019, pp. 0–0.

校外：立即公開

簡易檢索 / 詳目顯示

相關論文