| 研究生: |
盧聖周 Lu, Sheng-Jhou |
|---|---|
| 論文名稱: |
一個基於增強型YOLOv11的無人機空拍影像輕量化小物體偵測方法 A Lightweight Small Object Detection Method for Drone Aerial Images Based on Enhanced YOLOv11 |
| 指導教授: |
戴顯權
Tai, Shen-Chuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 無人機影像 、小物件偵測 、輕量化架構 |
| 外文關鍵詞: | Drone Imagery, Small object detection, Lightweight architecture |
| 相關次數: | 點閱:6 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於目標尺寸小且背景複雜,無人機影像中的目標偵測面臨獨特的挑戰。 YOLO(You Only Look Once)是一種廣泛採用的即時目標檢測框架,以其速度和準確率之間的良好平衡而聞名。本研究基於YOLOv11,提出了一種針對小目標偵測最佳化的增強架構。該模型利用淺層特徵層,保留了通常在較深層中丟失的高解析度空間細節,使其在識別小目標方面特別有效。為了在降低冗餘度的同時提升效能,此網路採用了效能增強型捲積設計和自適應特徵聚合技術。與原始YOLOv11相比,所提出的設計在保持甚至提高檢測準確率的同時,將模型參數數量減少了50%以上。在公開航拍影像資料集上的實驗表明,在輕量級約束下,偵測精度提高了1.7%,這使得該模型非常適合在資源有限的平台上部署。
Object detection in drone imagery presents unique challenges due to small object sizes and complex backgrounds. You Only Look Once (YOLO) is a widely adopted real-time object detection framework known for its strong balance between speed and accuracy. Building on YOLOv11, this work proposes an enhanced architecture optimized for small object detection. The model leverages shallow feature layers, which preserve high-resolution spatial details that are often lost in deeper layers—making them especially effective for identifying small targets. To improve performance while reducing redundancy, the network incorporates performance-enhancing convolutional designs and adaptive feature aggregation. Compared to the original YOLOv11, the proposed design reduces the number of model parameters by over 50% while maintaining or improving detection accuracy. Experiments on public aerial image datasets demonstrate a 1.7% improvement in detection precision under lightweight constraints, making the model suitable for deployment on resource-limited platforms.
[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” *Proceedings of the IEEE*, vol. 86, no. 11, pp. 2278–2324, 2002.
[2] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, pp. 580–587, 2014.
[3] R. Girshick, “Fast R-CNN,” in *Proceedings of the IEEE International Conference on Computer Vision*, pp. 1440–1448, 2015.
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” *Advances in Neural Information Processing Systems*, vol. 28, 2015.
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, pp. 779–788, 2016.
[6] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, pp. 7263–7271, 2017.
[7] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” *arXiv preprint arXiv:1804.02767*, 2018.
[8] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” *arXiv preprint arXiv:2004.10934*, 2020.
[9] G. Jocher, “Ultralytics YOLOv5,” 2020.
[10] C. Li et al., “YOLOv6: A single-stage object detection framework for industrial applications,” *arXiv preprint arXiv:2209.02976*, 2022.
[11] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 7464–7475, 2023.
[12] G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” 2023.
[13] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “YOLOv9: Learning what you want to learn using programmable gradient information,” in *European Conference on Computer Vision*, pp. 1–21, Springer, 2024.
[14] A. Wang et al., “YOLOv10: Real-time end-to-end object detection,” *Advances in Neural Information Processing Systems*, vol. 37, pp. 107984–108011, 2024.
[15] R. Khanam and M. Hussain, “YOLOv11: An overview of the key architectural enhancements,” *arXiv preprint arXiv:2410.17725*, 2024.
[16] Y. Tian, Q. Ye, and D. Doermann, “YOLOv12: Attention-centric real-time object detectors,” *arXiv preprint arXiv:2502.12524*, 2025.
[17] W. Liu et al., “SSD: Single shot multibox detector,” in *Computer Vision–ECCV 2016*, pp. 21–37, Springer, 2016.
[18] T.-Y. Lin et al., “Feature pyramid networks for object detection,” in *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, pp. 2117–2125, 2017.
[19] S. Woo et al., “ConvNeXt V2: Co-designing and scaling convnets with masked autoencoders,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 16133–16142, 2023.
[20] Y. Xiong et al., “Efficient deformable convnets: Rethinking dynamic and sparse operator for vision applications,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 5652–5661, 2024.
[21] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” *arXiv preprint arXiv:1704.04861*, 2017.
[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” *arXiv preprint arXiv:1409.1556*, 2014.
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, pp. 770–778, 2016.
[24] C.-Y. Wang et al., “CSPNet: A new backbone that can enhance learning capability of CNN,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops*, pp. 390–391, 2020.
[25] A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” *arXiv preprint arXiv:2010.11929*, 2020.
[26] Z. Liu et al., “Swin Transformer: Hierarchical vision transformer using shifted windows,” in *Proceedings of the IEEE/CVF International Conference on Computer Vision*, pp. 10012–10022, 2021.
[27] X. Ding et al., “Scaling up your kernels to 31×31: Revisiting large kernel design in CNNs,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 11963–11975, 2022.
[28] Y. Rao et al., “HorNet: Efficient high-order spatial interactions with recursive gated convolutions,” *Advances in Neural Information Processing Systems*, vol. 35, pp. 10353–10366, 2022.
[29] Z. Liu et al., “A ConvNet for the 2020s,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 11976–11986, 2022.
[30] J. Dai et al., “Deformable convolutional networks,” in *Proceedings of the IEEE International Conference on Computer Vision*, pp. 764–773, 2017.
[31] X. Zhu et al., “Deformable ConvNets V2: More deformable, better results,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 9308–9316, 2019.
[32] W. Wang et al., “InternImage: Exploring large-scale vision foundation models with deformable convolutions,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 14408–14419, 2023.
[33] S. Liu et al., “Path aggregation network for instance segmentation,” in *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, pp. 8759–8768, 2018.
[34] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, pp. 10781–10790, 2020.
[35] N. Carion et al., “End-to-end object detection with transformers,” in *European Conference on Computer Vision*, pp. 213–229, Springer, 2020.
[36] C. M. Bishop and N. M. Nasrabadi, *Pattern Recognition and Machine Learning*, vol. 4, Springer, 2006.
[37] Z. Zheng et al., “Enhancing geometric factors in model learning and inference for object detection and instance segmentation,” *IEEE Transactions on Cybernetics*, vol. 52, no. 8, pp. 8574–8586, 2021.
[38] X. Li et al., “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” *Advances in Neural Information Processing Systems*, vol. 33, pp. 21002–21012, 2020.
[39] D. Du et al., “VisDrone-DET2019: The vision meets drone object detection in image challenge results,” in *Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops*, pp. 0–0, 2019.
[40] G.-S. Xia et al., “DOTA: A large-scale dataset for object detection in aerial images,” in *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, pp. 3974–3983, 2018.