| 研究生: |
蕭惟謙 XIAO, WEI-QIAN |
|---|---|
| 論文名稱: |
透過智慧型手機進行瑜伽姿勢矯正 Yoga posture correction on a smartphone |
| 指導教授: |
藍崑展
Lan, Kun-Chan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 關鍵點偵測 、瑜伽姿勢矯正 、姿勢估計 、動作分類 |
| 外文關鍵詞: | Keypoint detection, Yoga posture correction, Pose estimation, Action classification |
| 相關次數: | 點閱:4 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究提出一套基於智慧型手機的瑜伽姿勢矯正系統,旨在協助自學者於無專業教練指導下安全且有效地練習瑜伽。現有AI姿勢估計工具多數僅針對一般運動設計,對於瑜伽這類需精細對齊的靜態動作,常無法偵測細微錯誤,且多僅能於練習後提供回饋,難以及時修正動作。為解決上述問題,本系統採用輕量化的RTMPose關鍵點偵測模型,結合自建23點瑜伽專屬資料集,並設計自動化關鍵幀抽取與姿勢分類演算法。使用者僅需透過手機拍攝練習過程,系統即能自動分析動作關鍵幀,偵測並分類常見瑜伽姿勢,並根據關鍵點角度計算,提供近即時的姿勢矯正建議。實驗結果顯示,本系統於十種常見瑜伽動作的分類精度達99%以上,關鍵點定位平均誤差低於國際標準(10公分),證明其準確度與實用性。未來將持續擴增資料多樣性並開發即時回饋功能,提升系統泛化能力與互動性,促進數位健康自我管理之應用。
This study presents a smartphone-based yoga posture correction system designed to help self-learners practice yoga safely and effectively without professional supervision. Existing AI-based pose estimation tools are generally tailored for conventional sports and often fail to detect subtle misalignments in yoga, especially for static poses that require precise alignment. Furthermore, most current solutions provide only offline feedback after practice, making real-time correction difficult. To address these challenges, our system employs a lightweight RTMPose keypoint detection model, integrates a custom yoga dataset with 23 annotated keypoints, and implements automated keyframe extraction and pose classification algorithms. Users can simply record their practice with a smartphone; the system automatically analyzes representative keyframes, detects and classifies common yoga poses, and provides near real-time correction suggestions based on keypoint angle calculations. Experimental results demonstrate that the system achieves over 99% classification accuracy for ten common yoga poses, with keypoint localization errors well below the international standard of 10 cm, confirming its accuracy and practicality. Future work will focus on expanding data diversity and developing real-time feedback features to enhance system generalizability and user interactivity, promoting applications in digital health self-management.
[1] Woodyard, C. (2011). Exploring the therapeutic effects of yoga and its ability to increase quality of life. International journal of yoga, 4(2), 49-54.
[2] Büssing, A., Michalsen, A., Khalsa, S. B. S., Telles, S., & Sherman, K. J. (2012). Effects of yoga on mental and physical health: a short summary of reviews. Evidence‐Based Complementary and Alternative Medicine, 2012(1), 165410..
[3] Raut, P. D. (2024). Effect of yoga practice on mental health, anxiety and mental depression of graduate college students. Indian Psychology. https://doi.org/10.25215/1201.258
[4] Chamola, V., Gummana, E. P., Madan, A., Rout, B. K., & Coelho Rodrigues, J. J. P. (2024). Advancements in Yoga Pose Estimation Using Artificial Intelligence: A Survey. Current Bioinformatics, 19(3), 264-280.
[5] Labde, S. (2025). Real time yoga posture recognition and correction. International Journal for Research in Applied Science and Engineering Technology, 13(5), 4922-4927. https://doi.org/10.22214/ijraset.2025.71355
[6] Maddi Bhargavi, B. V., Majji, H., Kumar, S. K., Sumanth, N. S., & Kumar, T. R. (2024). Advancing yoga practice: Leveraging OpenCV and MediaPipe for pose detection. International Research Journal of Modernization in Engineering Technology and Science. https://doi.org/10.56726/irjmets51313
[7] Jadhav, R., Ligde, V., Malpani, R., Mane, P., & Borkar, S. (2023). Aasna: kinematic yoga posture detection and correction system using CNN. In ITM Web of Conferences (Vol. 56, p. 05007). EDP Sciences.
[8] Pradhan, P. M. A., Tendolkar, V., Pazade, Y., Yeole, V., & Rengade, S. (2024). Yoga pose detection and feedback generation. International Journal for Research in Applied Science and Engineering Technology, 12(11), 1808-1814. https://doi.org/10.22214/ijraset.2024.65309
[9] A, A. K. (2025). Real-time feedback system for accurate yoga pose detection. International Journal of Research Publication and Reviews, 6(5), 11297-11303. https://doi.org/10.55248/gengpi.6.0525.1898
[10] Akash, G., Kumar, A., Dheeraj, A., Hemanth, M. N., & Eduru, S. (2024, August). Smart Jacket for Yoga Posture Correction. In 2024 2nd International Conference on Networking, Embedded and Wireless Systems (ICNEWS) (pp. 1-6). IEEE.
[11] Tikone, T. P. S., Shinde, A., & Dhavase, N. (2025). Yoga pose detection and correction application using AI & ML. IRJMETS.
[12] Chen, K. (2019, December). Sitting posture recognition based on openpose. In IOP Conference Series: Materials Science and Engineering (Vol. 677, No. 3, p. 032057). IOP Publishing.
[13] Kulikajevas, A., Maskeliunas, R., & Damaševičius, R. (2021). Detection of sitting posture using hierarchical image composition and deep learning. PeerJ computer science, 7, e442.
[14] Srivastav, V., Chen, K., & Padoy, N. (2024). SelfPose3d: self-supervised multi-person multi-view 3D pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2502-2512).
[15] Ma, C., Li, W., Gravina, R., & Fortino, G. (2017). Posture detection based on smart cushion for wheelchair users. Sensors, 17(4), 719.
[16] Lee, J., Joo, H., Lee, J., & Chee, Y. (2020). Automatic classification of squat posture using inertial sensors: Deep learning approach. Sensors, 20(2), 361.
[17] Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). Ieee.
[18] Otero, I. R. (2015). Anatomy of the SIFT Method (Doctoral dissertation, École normale supérieure de Cachan-ENS Cachan).
[19] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
[20] Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
[21] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, September). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Cham: Springer International Publishing.
[22] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[23] Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.
[24] Lyu, C., Zhang, W., Huang, H., Zhou, Y., Wang, Y., Liu, Y., ... & Chen, K. (2022). Rtmdet: An empirical study of designing real-time object detectors. arXiv preprint arXiv:2212.07784.
[25] Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7464-7475).
[26] Xu, S., Wang, X., Lv, W., Chang, Q., Cui, C., Deng, K., ... & Lai, B. (2022). PP-YOLOE: An evolved version of YOLO. arXiv preprint arXiv:2203.16250.
[27] Yu, C., Xiao, B., Gao, C., Yuan, L., Zhang, L., Sang, N., & Wang, J. (2021). Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10440-10450).
[28] Xu, Y., Zhang, J., Zhang, Q., & Tao, D. (2022). Vitpose: Simple vision transformer baselines for human pose estimation. Advances in neural information processing systems, 35, 38571-38584.
[29] H. W. Bin Xiao, Yichen Wei, "Simple Baselines for Human Pose Estimation and Tracking," Proceedings of the European Conference on Computer Vision (ECCV), 2018.
[30] Yuan, Y., Fu, R., Huang, L., Lin, W., Zhang, C., Chen, X., & Wang, J. (2021). Hrformer: High-resolution vision transformer for dense predict. Advances in neural information processing systems, 34, 7281-7293.
[31] Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5693-5703).
[32] Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., ... & Sun, J. (2020, August). Learning delicate local representations for multi-person pose estimation. In European conference on computer vision (pp. 455-472). Cham: Springer International Publishing.
[33] Jiang, T., Lu, P., Zhang, L., Ma, N., Han, R., Lyu, C., ... & Chen, K. (2023). Rtmpose: Real-time multi-person pose estimation based on mmpose. arXiv preprint arXiv:2303.07399.
[34] Artacho, A. S. B. (2021). OmniPose: A 3D human pose estimation model [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2103.10180
[35] Zhao, S., Liu, K., Huang, Y., Bao, Q., Zeng, D., & Liu, W. (2022, August). Dpit: Dual-pipeline integrated transformer for human pose estimation. In CAAI International Conference on Artificial Intelligence (pp. 559-576). Cham: Springer Nature Switzerland.
[36] Zhang, F., Zhu, X., Dai, H., Ye, M., & Zhu, C. (2020). Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7093-7102).
[37] Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In Proceedings of the European conference on computer vision (ECCV) (pp. 269-286).
[38] Hachiuma, R., Sato, F., & Sekii, T. (2023). Unified keypoint-based action recognition framework via structured keypoint pooling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 22962-22971).
[39] Phan, H. H., Nguyen, T. T., Phuc, N. H., Nhan, N. H., Hieu, D. M., Tran, C. T., & Vi, B. N. (2021, August). Key frame and skeleton extraction for deep learning-based human action recognition. In 2021 RIVF International Conference on Computing and Communication Technologies (RIVF) (pp. 1-6). IEEE.
[40] Zhang, X., Fu, D., & Liu, N. (2024). Shot Segmentation Based on Von Neumann Entropy for Key Frame Extraction. arXiv preprint arXiv:2408.15844.
[41] Cao, J., Yu, L., Chen, M., & Cui, X. (2016, December). A key frame selection algorithm based on sliding window and image features. In 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (pp. 956-962). IEEE.
[42] Mentzelopoulos, M., & Psarrou, A. (2004, October). Key-frame extraction algorithm using entropy difference. In Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval (pp. 39-45).
[43] Ding, Y., Hou, S., Yang, X., Du, W., Wang, C., & Yin, G. (2021, October). Key frame extraction based on frame difference and cluster for person re-identification. In 2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI) (pp. 573-578). IEEE.
[44] Yadav, S. K., Agarwal, A., Kumar, A., Tiwari, K., Pandey, H. M., & Akbar, S. A. (2022). YogNet: A two-stream network for realtime multiperson yoga action recognition and posture correction. Knowledge-Based Systems, 250, 109097.
[45] Habib, K. V., Meharwade, S., Patil, K. S., Nadaf, S. M., Varur, S., & Desai, P. D. (2024, April). PosePerfect: Integrating HRNet and Gemini Vision Pro for Enhanced Yoga Posture Classification and Posture correction Analysis. In 2024 IEEE 9th International Conference for Convergence in Technology (I2CT) (pp. 1-7). IEEE.
[46] Potdar, A., Bhanushali, J., Singh, S., & Dabre, K. (2024, March). Yoga Pose Classification and Correction using PoseNet. In 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI) (Vol. 2, pp. 1-6). IEEE.
[47] Wang, T., Jin, L., Wang, Z., Li, J., Li, L., Zhao, F., ... & Zhao, J. (2024). SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1824-1833).
[48] Handrich, S., & Al-Hamadi, A. (2017, September). Localizing body joints from single depth images using geodetic distances and random tree walk. In 2017 IEEE International Conference on Image Processing (ICIP) (pp. 146-150). IEEE.