| 研究生: |
陳奕安 Chen, Yi-An |
|---|---|
| 論文名稱: |
可辨識羽球球路之深度學習網路模型 A Badminton Strokes Recognition Deep Learning Network Model |
| 指導教授: |
王宗一
Wang, Tzone-I |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 球路辨識 、人體骨架 、特徵點座標差 、神經網路 、深度學習 |
| 外文關鍵詞: | Strokes recognition, Human skeleton, Coordinate differences, Neural network, Deep learning |
| 相關次數: | 點閱:112 下載:17 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在東京奧運過後,尤其是在羽球項目因為受到台灣羽球選手戴資穎、周天成等人的優異表現帶動下,讓許多人想投入此項運動,興起一波台灣羽球的熱潮。為了協助觀眾在觀看羽球比賽時更了解羽球的球路資訊,提升觀賞體驗,此外提供專業選手統計分析對手的球路,可幫助球員更好地應對對手的戰術提高技術水平與比賽成績。
本研究提出一個基於深度學習網路的羽球球路辨識方法,透過蒐集實際羽球比賽影片資料集,並進行人工的球路判讀與標記,接著使用人體姿態估計網路提取出目標人物在影片中每幀畫面的人體骨架特徵點座標,透過連續幀畫面間之特徵點座標差當作特徵向量來訓練卷積神經網路,在取得模型的訓練權重後就可以將羽球影片輸入後了解比賽中球員所打的球路並記錄之,有效節省人工判讀紀錄的時間與人力。另外因為人體骨架在不同的人體上具有相似的形狀和位置,模型在加入不同球員的資料後仍然維持水準之上的泛化能力,因此對於不同的未知目標人物皆可進行準確的球路辨識。
本研究是蒐集採用在Youtube上世界羽球聯盟官方頻道裡的比賽直播影片作為訓練資料,先以人工判讀的方式取得目標人物每個球路31幀畫面的影片,進一步採用OpenPose來提取每幀畫面內目標人物的人體關鍵點座標,在資料前製過程中為了克服人體骨架可能產生的錯置,本研究自行設計一線性插補演算法來修正錯置骨架,使得每個時間點的人體骨架座標均可準確提取。修正後得到的每幀畫面內人體骨架座標,再計算連續幀畫面間之特徵點座標差,最後把得到的特徵點座標差當作經判讀的羽球球路的特徵向量資料集並以卷積神經網路的球路辨識模型訓練之。本研究的深度學習網路是根據VGG16的網路架構並進行了修改,經過修改訓練後的模型最終在驗證資料集的辨識率達到94.7%,而在測試集的辨識率也達到92.8%,接著針對系統評估再另外蒐集的測試影片資料集中準確率也達到90%,這些結果證明了此模型在實際應用的可行性,有助於實現現場比賽時羽球球路的即時準確辨識。
Following the success of Taiwanese badminton players, such as Tai Tzu-Ying and Chou Tien-Chen, in the Tokyo Olympics in 2021, the popularity of badminton has surged in Taiwan. In the Sports Administration's survey of sports participation in Taiwan in 2022, badminton ranked first among ball sports. To facilitate beginners and spectators in understanding badminton strokes, accelerate their learning curve, and provide professional players with statistical analysis of opponents and their own strokes, we propose a deep learning-based method for recognizing badminton strokes.
Our approach utilizes a dataset of real badminton match videos, where manual annotation was performed to label the strokse. Human body pose estimation networks are employed to extract the coordinates of skeletal keypoints from each frame of the videos. We then utilize the coordinate differences between consecutive frames as feature vectors to train a convolutional neural network (CNN) model. Once the model is trained, it can recognize and record the strokes played by players in badminton videos, significantly reducing the time and manpower required for manual annotation. Moreover, due to the similar shapes and positions of human skeletons across different individuals, the model maintains a high level of generalization even after incorporating data from various players, enabling accurate strokes recognition for unknown target individuals.
For data collection, we utilized official live streaming videos of badminton matches from the Badminton World Federation (BWF) YouTube channel. By manually annotating the strokes of the target individuals in each video frame (a total of 31 frames per stroke), we obtained the coordinates of human body keypoints using the OpenPose human pose estimation network. To overcome potential misalignment issues in the skeletal keypoints, we designed a linear interpolation algorithm for correcting the misaligned skeletons, ensuring accurate extraction of skeletal coordinates at each time point. The resulting coordinate differences between consecutive frames were used as feature vectors to train the CNN-based strokes recognition model. Our deep learning network, based on the modified VGG16 architecture, utilizes VGG16 for spatial feature extraction and the coordinate differences for temporal feature extraction. The trained model achieved a recognition accuracy of 94.7% on the validation dataset and 92.8% on the testing dataset. Furthermore, it achieved a recognition accuracy of 90% on an additional test video dataset collected for system evaluation, demonstrating the feasibility and practical applicability of the proposed model for real-time and accurate recognition of badminton strokes.
[1] S. Administration. "Sports Statistics." Sports Administration. https://isports.sa.gov.tw/apps/Download.aspx?SYS=TIS&MENU_CD=M07&ITEM_CD=T01&MENU_PRG_CD=4&ITEM_PRG_CD=2 (accessed 2023).
[2] S. Aubry, S. Laraba, J. Tilmanne, and T. Dutoit, "Action recognition based on 2D skeletons extracted from RGB videos," in MATEC Web of Conferences, 2019, vol. 277: EDP Sciences, p. 02034.
[3] F. Baradel, C. Wolf, J. Mille, and G. W. Taylor, "Glimpse clouds: Human activity recognition from unstructured feature points," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 469-478.
[4] N. A. binti Rahmad, N. A. J. binti Sufri, M. A. bin As' ari, and A. binti Azaman, "Recognition of badminton action using convolutional neural network," Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 7, no. 4, pp. 750-756, 2019.
[5] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields," IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 172-186, 2021.
[6] B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang, and L. Zhang, "Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5386-5395.
[7] J. Donahue et al., "Long-term recurrent convolutional networks for visual recognition and description," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625-2634.
[8] Y. Du, W. Wang, and L. Wang, "Hierarchical recurrent neural network for skeleton based action recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1110-1118.
[9] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, "Rmpe: Regional multi-person pose estimation," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2334-2343.
[10] H. Gholamalinezhad and H. Khosravi, "Pooling methods in deep neural networks, a review," arXiv preprint arXiv:2009.07485, 2020.
[11] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[12] G. Hidalgo et al., "Single-network whole-body pose estimation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6982-6991.
[13] Y. Hou, Z. Li, P. Wang, and W. Li, "Skeleton optical spectra-based action recognition using convolutional neural networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 3, pp. 807-811, 2016.
[14] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning, 2015: pmlr, pp. 448-456.
[15] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014, pp. 1725-1732.
[16] L. Pishchulin et al., "Deepcut: Joint subset partition and labeling for multi person pose estimation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4929-4937.
[17] N. A. Rahmad and M. A. As’ari, "The new Convolutional Neural Network (CNN) local feature extractor for automated badminton action recognition on vision based data," in Journal of Physics: Conference Series, 2020, vol. 1529, no. 2: IOP Publishing, p. 022021.
[18] S. Ramasinghe, K. M. Chathuramali, and R. Rodrigo, "Recognition of badminton strokes using dense trajectories," in 7th International Conference on Information and Automation for Sustainability, 2014: IEEE, pp. 1-6.
[19] H. Ramirez, S. A. Velastin, I. Meza, E. Fabregas, D. Makris, and G. Farias, "Fall detection and activity recognition using human skeleton features," IEEE Access, vol. 9, pp. 33532-33542, 2021.
[20] S. Shuai, M. Kavitha, J. Miyao, and T. Kurita, "Action classification based on 2D coordinates obtained by real-time pose estimation," in Proc. Int. Workshop Frontiers Comput. Vis., 2019, pp. 1-6.
[21] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in neural information processing systems, vol. 27, 2014.
[22] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[23] V. Sport. "Badminton Strokes Introdution." Victor Sport. https://www.victorsport.com.tw/badmintonaz/5190/Training-guide-for-badminton-beginners-Part-2 (accessed 2023).
[24] Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, "Human action recognition from various data modalities: A review," IEEE transactions on pattern analysis and machine intelligence, 2022.
[25] H. Y. Ting, K. S. Sim, and F. S. Abas, "Automatic badminton action recognition using RGB-D sensor," Advanced Materials Research, vol. 1042, pp. 89-93, 2014.
[26] H. Y. Ting, Y. W. D. Tan, and B. Y. S. Lau, "Potential and limitations of Kinect for badminton performance analysis and profiling," Indian J. Sci. Technol, vol. 9, pp. 1-5, 2016.
[27] P. Wang, Z. Li, Y. Hou, and W. Li, "Action recognition based on joint trajectory maps using convolutional neural networks," in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 102-106.
[28] Y. Wang, W. Fang, J. Ma, X. Li, and A. Zhong, "Automatic badminton action recognition using cnn with adaptive feature extraction on sensor data," in Intelligent Computing Theories and Application: 15th International Conference, ICIC 2019, Nanchang, China, August 3–6, 2019, Proceedings, Part I 15, 2019: Springer, pp. 131-143.
[29] Z. Wang, M. Guo, and C. Zhao, "Badminton stroke recognition based on body sensor networks," IEEE Transactions on Human-Machine Systems, vol. 46, no. 5, pp. 769-775, 2016.
[30] Z. Wang, Y. Yang, Z. Liu, and Y. Zheng, "Deep Neural Networks in Video Human Action Recognition: A review," arXiv preprint arXiv:2305.15692, 2023.
[31] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, "Convolutional pose machines," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, pp. 4724-4732.