簡易檢索 / 詳目顯示

研究生: 柯尚廷 Maisang・Piyaw
Ke, Shang-Ting
論文名稱: 基於YOLOv5模型與Runge-Kutta近似二階優化之損失收斂與性能評估
Loss Convergence and Performance Evaluation of the YOLOv5 Model using Runge-Kutta Approximate Second-Order Optimization
指導教授: 黃吉川
Hwang, Chi-Chuan
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 83
中文關鍵詞: Runge-Kutta方法二階曲率近似深度學習影像辨識YOLOv5
外文關鍵詞: YOLOv5, Runge-Kutta method, Second-order curvature approximation, Deep Learning, Image Recognition
相關次數: 點閱:16下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在深度學習領域,優化器是提升神經網路模型訓練效率與預測精度的關鍵。傳統優化器如隨機梯度下降法(SGD)雖具高效性 ,卻易受局部極小值或損失地貌複雜性影響,收斂速度與精度間存在顯著權衡。
    本研究針對此挑戰,設計了基於Runge-Kutta方法的KT4優化器,實現二階曲率近似,通過多次梯度評估與加權平均提供更精確的參數更新方向,與傳統隨機梯度下降法(SGD)進行系統性比較,實現於YOLOv5影像辨識模型上,有效加速模型收斂並且提升預測表現。
    實驗結果顯示,KT4優化器在PASCAL VOC數據集上基於YOLOv5s模型的訓練中,相較於SGD展現出顯著的性能優勢。KT4優化器實現了精度提升,相較於SGD優化器,在 mAP@0.5:0.95% 上明顯提升2.4 %,且在各項YOLOv5損失中收斂速度明顯加快,KT4優化器在前十個epoch就超越SGD優化器,達到更低的損失,驗證了其二階曲率近似在加速收斂上的有效性。然而,KT4優化器的單次訓練時間較SGD優化器增加,這與其多次梯度評估的計算需求密切相關,反映了精度提升與計算成本之間的權衡,但其準確度在第47個epoch就已超越SGD優化器最終訓練結果,KT4 優化器在訓練完第47個epoch時,共花費了434.68分鐘,而SGD優化器在訓練完最終epoch時,共花費477.3分鐘。相較於 SGD 優化器,KT4 優化器花費較少的epoch達到了更好的準確度,花費時間減少了8.92%
    展望未來,KT4優化器的創新設計為深度學習優化開闢了新方向,未來可通過並行化技術或動態調整策略,平衡性能與資源需求,並進一步拓展其在其他深度學習模型中的應用潛力。

    In the field of deep learning, optimizers play a crucial role in enhancing the training efficiency and prediction accuracy of neural network models. Traditional optimizers, such as Stochastic Gradient Descent (SGD), are efficient but are susceptible to local minima and complex loss landscapes, leading to a significant trade-off between convergence speed and precision.
    To address this challenge, this study designed the KT4 optimizer, based on the Runge-Kutta method, which implements a second-order curvature approximation. By employing multiple gradient evaluations and weighted averaging, KT4 provides more accurate parameter update directions. It was systematically compared with the traditional SGD optimizer and implemented on the YOLOv5 image recognition model, effectively accelerating model convergence and improving prediction performance.
    Experimental results demonstrate that the KT4 optimizer exhibits significant performance advantages over SGD when trained on the PASCAL VOC dataset using the YOLOv5s model. KT4 achieved a precision improvement, increasing the mAP@0.5:0.95% by 2.4% compared to SGD, and significantly accelerated convergence across various YOLOv5 loss metrics. KT4 surpassed SGD within the first ten training epochs, achieving lower loss values, thus validating the effectiveness of its second-order curvature approximation in speeding up convergence. However, the single-training time of KT4 is longer than that of SGD due to its multiple gradient evaluation requirements, reflecting a trade-off between precision enhancement and computational cost. Notably, KT4’s accuracy surpassed SGD’s final training results by the 47th epoch, with a total training time of 434.68 minutes, while SGD required 477.3 minutes for its final epoch. Compared to SGD, KT4 achieved better accuracy with fewer epochs and reduced training time by 8.92%.

    摘要 i Extended ABSTRACT ii 誌謝 viii 目錄 ix 表目錄 xii 圖目錄 xiii 第一章 緒論 1 1.1 研究背景(介紹問題的內容及重要性) 1 1.2 文獻回顧(由遠到近由淺入深地介紹前人研究的成果) 2 1.3 研究動機(指出前人研究成果的缺點或不足處) 3 1.4 論文架構 5 第二章相關研究 6 2.1 機器學習 6 2.2 深度學習 8 2.3 卷積神經網路 10 2.3.1 卷積層 11 2.3.2 池化層 12 2.3.3 激活層 13 2.3.4 全連接層 14 2.3.5 二元交叉熵Binary Cross-Entropy 15 2.4 YOLO物件偵測模型 16 2.4.1 YOLOv1至YOLOv5的技術演進 17 2.4.2 YOLOv5訓練流程圖 18 2.4.3 YOLOv5前向傳播核心架構介紹 19 2.4.4 YOLOv5構成模組 25 2.4.5 YOLOv5損失函數 26 2.4.6 反向傳播與梯度計算 33 2.4.7 梯度更新優化器 35 2.5 GPU 36 2.6 二階梯度優化理論 37 2.7 Runge-Kutta 數值計算 39 2.7.1 Runge-Kutta四階計算方法原理 40 2.7.2 多種優化算法在二次函數上的收斂性比較 41 第三章 研究方法 43 3.1 研究流程圖 43 3.2 數據標註 44 3.2.1 數據集 45 3.2.2 數據集分割 46 3.3 KT4優化器 47 3.4 模型訓練 48 3.5 超參數設置 50 3.6 評估模型 51 第四章 研究成果與討論 52 4.1 模型訓練收斂性探討 52 4.2 模型檢測準確度比較 54 4.3 模型訓練時間成本分析 57 第五章 結論與未來展望 59 參考文獻 61

    [1] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
    [2] R. Khanam and M. Hussain, "What is YOLOv5: A deep look into the internal features of the popular object detector," arXiv preprint arXiv:2407.20892, 2024.
    [3] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, "On the importance of initialization and momentum in deep learning," in International Conference on Machine Learning, 2013: PMLR, pp. 1139-1147.
    [4] W. Su, S. Boyd, and E. J. Candes, "A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights," Journal of Machine Learning Research, vol. 17, no. 153, pp. 1-43, 2016.
    [5] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, "Neural ordinary differential equations," Advances in Neural Information Processing Systems, vol. 31, 2018.
    [6] H. Robbins and S. Monro, "A stochastic approximation method," The Annals of Mathematical Statistics, pp. 400-407, 1951.
    [7] B. T. Polyak, "Some methods of speeding up the convergence of iteration methods," Ussr Computational Mathematics and Mathematical Physics, vol. 4, no. 5, pp. 1-17, 1964.
    [8] J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, vol. 12, no. 7, 2011.
    [9] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
    [10] S. J. Wright, "Numerical optimization," ed, 2006.
    [11] D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Mathematical Programming, vol. 45, no. 1, pp. 503-528, 1989.
    [12] L. Bottou, F. E. Curtis, and J. Nocedal, "Optimization methods for large-scale machine learning," SIAM Review, vol. 60, no. 2, pp. 223-311, 2018.
    [13] Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio, "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization," Advances in Neural Information Processing Systems, vol. 27, 2014.
    [14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
    [15] C. Meng, S. Liu, and Y. Yang, "Dmac-yolo: A high-precision yolo v5s object detection model with a novel optimizer," in 2024 International Joint Conference on Neural Networks (IJCNN), 2024: IEEE, pp. 1-8.
    [16] J. C. Butcher, "A history of Runge-Kutta methods," Applied Numerical Mathematics, vol. 20, no. 3, pp. 247-260, 1996.
    [17] A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM Journal of Research and Development, vol. 44, no. 1.2, pp. 206-226, 2000.
    [18] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015.
    [19] W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," The Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943.
    [20] Y. Bengio, "Learning deep architectures for AI," Foundations and trends® in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
    [21] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986.
    [22] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1 (long and short papers), 2019, pp. 4171-4186.
    [23] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, "Language models are few-shot learners," Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.
    [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017.
    [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, vol. 25, 2012.
    [26] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, "Deep unsupervised learning using nonequilibrium thermodynamics," in International Conference on Machine Learning, 2015: pmlr, pp. 2256-2265.
    [27] J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in Neural Information Processing Systems, vol. 33, pp. 6840-6851, 2020.
    [28] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 2002.
    [29] K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biological Cybernetics, vol. 36, no. 4, pp. 193-202, 1980.
    [30] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
    [31] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.
    [32] ujjwalkarn. "An Intuitive Explanation of Convolutional Neural Networks." https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
    [33] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, 2014: Springer, pp. 818-833.
    [34] D. Scherer, A. Müller, and S. Behnke, "Evaluation of pooling operations in convolutional architectures for object recognition," in International conference on Artificial Neural Networks, 2010: Springer, pp. 92-101.
    [35] xiaowuhu. "A2-神經網路基本原理." https://github.com/microsoft/ai-edu/tree/master/%E5%9F%BA%E7%A1%80%E6%95%99%E7%A8%8B/A2-%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%9F%BA%E6%9C%AC%E5%8E%9F%E7%90%86/%E7%AC%AC8%E6%AD%A5%20-%20%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C
    [36] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 807-814.
    [37] S. Elfwing, E. Uchibe, and K. Doya, "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning," Neural Networks, vol. 107, pp. 3-11, 2018.
    [38] Y. James. "[資料分析&機器學習] 第5.1講: 卷積神經網絡介紹(Convolutional Neural Network)." https://medium.com/jameslearningnote/%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90-%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%AC%AC5-1%E8%AC%9B-%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E7%B5%A1%E4%BB%8B%E7%B4%B9-convolutional-neural-network-4f8249d65d4f
    [39] C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine Learning (no. 4). Springer, 2006.
    [40] C. E. Shannon, "A mathematical theory of communication," The Bell System Technical Journal, vol. 27, no. 3, pp. 379-423, 1948.
    [41] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263-7271.
    [42] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
    [43] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
    [44] wisdom_zhe. "YOLOv5s網路模型講解." https://blog.csdn.net/qq_44231797/article/details/129408786
    [45] T. Huang. "卷積神經網路(Convolutional neural network, CNN) — 卷積運算、池化運算." https://chih-sheng-huang821.medium.com/%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-convolutional-neural-network-cnn-%E5%8D%B7%E7%A9%8D%E9%81%8B%E7%AE%97-%E6%B1%A0%E5%8C%96%E9%81%8B%E7%AE%97-856330c2b703
    [46] Ultralytics. "Ultralytics YOLOv5 Architecture." https://docs.ultralytics.com/zh/yolov5/tutorials/architecture_description/
    [47] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU loss: Faster and better learning for bounding box regression," in Proceedings of The AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12993-13000.
    [48] 望舒同學. "反向傳播算法詳解(手算詳解傳播過程)." https://zhuanlan.zhihu.com/p/464268270
    [49] J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, "GPU computing," Proceedings of the IEEE, vol. 96, no. 5, pp. 879-899, 2008.
    [50] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
    [51] J. Nocedal and S. J. Wright, Numerical Optimization. Springer, 1999.
    [52] S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
    [53] Sany何灿. "Runge-Kutta(龍格-庫塔)方法 | 基本思想 + 二階格式 + 四階格式." https://blog.csdn.net/SanyHo/article/details/107017076
    [54] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, 2014: Springer, pp. 740-755.
    [55] Tzutalin. "LabelImg." https://github.com/HumanSignal/labelImg
    [56] T. Hastie, "The elements of statistical learning: data mining, inference, and prediction," ed: Springer, 2009.
    [57] A. Torralba and A. A. Efros, "Unbiased look at dataset bias," in CVPR 2011, 2011: IEEE, pp. 1521-1528.
    [58] K. Inc. "Kaggle: Your Machine Learning and Data Science Community. Retrieved." https://www.kaggle.com/
    [59] Google. "Google Dataset Search." https://datasetsearch.research.google.com/
    [60] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International Journal of Computer Vision, vol. 88, pp. 303-338, 2010.
    [61] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International Journal of Computer Vision, vol. 111, pp. 98-136, 2015.

    無法下載圖示 校內:2030-08-06公開
    校外:2030-08-06公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE