研究生: |
陳品崴 Chen, Pin-Wei |
---|---|
論文名稱: |
基於深度學習二維相機影像對應三維骨架重建應用於中風病人之運動功能檢測 Deep Learning Based 2D-3D Human Pose Reconstruction for Stroke’s Motor Function Assessment |
指導教授: |
孫永年
Sun, Yung-Nien |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 59 |
中文關鍵詞: | 人體關鍵點 、骨架姿態估計 、異常動作 、職能復健 、中風恢復評估 |
外文關鍵詞: | human keypoints, human pose reconstruction, anomaly detection, occupational therapy, stroke rehabilitation assessment |
相關次數: | 點閱:119 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
職能治療指的是針對日常生活的活動進行復健的方式,透過治療、改善來維持對象的行動能力,使其能自主活動。職能治療的對象包括因生理、心理及發展遲緩、學習障礙等導致能力受限者。其一大治療病患以腦中風復健為主,腦中風好發於中高年齡層,根據國內外分析,高血壓、糖尿病、肥胖皆為發病之危險因子。
中風治療的復健恢復評估標準會依據表定動作的準確來檢驗病人的狀態階段,根據不同恢復狀態可以做到手臂外展 90^°、手觸碰耳朵膝蓋、手觸碰腰椎、手臂肩膀屈曲 180^°作為一連串評估參考動作。鑒於大型醫院復健科入院病例以急性期、亞急性期與慢性期進行區分 職能治療屬於慢性期復健活動,若是能在穩定期間輔助在家自行復健,能大大增進醫療效率。
基於深度學習的電腦視覺領域發展迅速,透過自動化骨架姿態檢視系統可以快速分析、評估姿態的準確程度與復健階段,且在資料收集方面並不僅且不止於透過精準卻繁複的 3D 座標系統對病患貼上數個標籤球來計算骨架的相對位置,也可透過 2D 相機取得動作影片資料進行姿態估計分析,得到重建的 3D 影像座標系統進行中風恢復評估。
本研究提出的系統由三個深度學習模型組成,階段一透過人體在影片的座標位置,階段二捕捉人體由上至下的17個關節、骨架二維特徵關鍵點,階段三將二維關鍵點重建成三維關鍵點。階段四,對三維關鍵點計算相對向量座標求得該病患對於運動學上的動作關節角度,並使用此角度差別偵測捕捉異常角度的動作。在實驗數據中表示,各個動作姿態的平均關節點速度(MPJVE)誤差皆在 3mm 之內,在 2D 關節點資料集 Coco val-2017與三維公開資料集 Human3.6m 上,皆取得優異的效能。實驗證明系統對穩定度與效能皆與典型傳統分析方法有所提升。
Occupational therapy refers to the rehabilitation of daily activities to maintain individuals' functional abilities and enable them to engage in independent activities. Occupational therapy targets individuals with limited abilities due to physical, psychological, developmental delays, learning disabilities, and other factors. One major focus of occupational therapy is stroke rehabilitation. Stroke commonly affects middle-aged and older adults, and according to domestic and international analysis, risk factors for stroke include hypertension, diabetes, and obesity.
The rehabilitation and recovery assessment criteria for stroke treatment will examine the patient's stage of condition based on the accuracy of predetermined movements. Depending on the different stages of recovery, assessment reference movements can include arm abduction of 90°, touching the ear or knee with the hand, touching the lumbar spine with the hand, and arm flexion of 180° at the shoulder. Given that in large hospitals, rehabilitation cases are classified into acute, subacute, and chronic stages, occupational therapy falls under the chronic stage of rehabilitation activities. If assistance can be provided during the stable period for self-rehabilitation at home, it can greatly enhance medical efficiency.
The field of computer vision based on deep learning has been rapidly advancing. Through automated human pose estimation systems, it is possible to quickly analyze and evaluate the accuracy of postures and rehabilitation stages. In terms of data collection, it is not limited to precise but complex 3D coordinate systems where multiple labeled markers are attached to patients to calculate the relative positions of the skeleton. It is also possible to use 2D cameras to obtain motion video data for posture estimation analysis and obtain reconstructed 3D human pose for stroke’s recovery assessment.
The proposed system in this study consists of three deep learning models. In Stage 1, the system utilizes the coordinates of the human body in the video. In Stage 2, it captures 17 joints and skeleton 2D feature keypoints of the human body from top to bottom. In Stage 3, it reconstructs the 2D keypoints into 3D keypoints. In Stage 4, it calculates the relative vector coordinates of the 3D keypoints to obtain the joint angles of the patient's movements in kinematics. These angles are then used to detect and capture abnormal movements based on the angle differences. The experimental results show that the MPJVE (mean per joint velocity error) for motion subjects are within 3mm. The system demonstrates excellent performance on the COCO-2017 2D keypoint dataset and the 3D publicly available dataset Human3.6m. The experiments prove that the system improves both stability and performance compared to typical traditional analysis methods.
[1] W. members. "Definitions of Occupational Therapy from Member Organisations." https://wfot.org/resources/definitions-of-occupational-therapy-from-member-organisations (accessed.
[2] S. K. Shah, S. J. Harasymiw, and P. L. Stahl, "Stroke Rehabilitation: Outcome Based on Brunnstrom Recovery Stages," The Occupational Therapy Journal of Research, vol. 6, no. 6, pp. 365-376, 1986, doi: 10.1177/153944928600600604.
[3] A. R. Fugl-Meyer, L. Jääskö, I. Leyman, S. Olsson, and S. Steglind, "The post-stroke hemiplegic patient. 1. a method for evaluation of physical performance," (in eng), Scand J Rehabil Med, vol. 7, no. 1, pp. 13-31, 1975.
[4] D. J. Gladstone, C. J. Danells, and S. E. Black, "The fugl-meyer assessment of motor recovery after stroke: a critical review of its measurement properties," (in eng), Neurorehabil Neural Repair, vol. 16, no. 3, pp. 232-40, Sep 2002, doi: 10.1177/154596802401105171.
[5] J. L. Crow, G. Kwakkel, J. B. Bussmann, J. A. Goos, and B. C. Harmeling-van der Wel, "Are the hierarchical properties of the Fugl-Meyer assessment scale the same in acute stroke and chronic stroke?," (in eng), Phys Ther, vol. 94, no. 7, pp. 977-86, Jul 2014, doi: 10.2522/ptj.20130170.
[6] B. Pan, Z. Huang, T. Jin, J. Wu, Z. Zhang, and Y. Shen, "Motor Function Assessment of Upper Limb in Stroke Patients," (in eng), J Healthc Eng, vol. 2021, p. 6621950, 2021, doi: 10.1155/2021/6621950.
[7] W.-S. Kim, S. Cho, D. Baek, H. Bang, and N.-J. Paik, "Upper extremity functional evaluation by Fugl-Meyer assessment scoring using depth-sensing camera in hemiplegic stroke patients," PloS one, vol. 11, no. 7, p. e0158640, 2016.
[8] S. Lee, Y.-S. Lee, and J. Kim, "Automated evaluation of upper-limb motor function impairment using Fugl-Meyer assessment," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 26, no. 1, pp. 125-134, 2017.
[9] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
[10] W. Liu et al., "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016: Springer, pp. 21-37.
[11] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988.
[12] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
[13] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[14] D. G. Lowe, "Object recognition from local scale-invariant features," in Proceedings of the seventh IEEE international conference on computer vision, 1999, vol. 2: Ieee, pp. 1150-1157.
[15] A. Newell, K. Yang, and J. Deng, "Stacked Hourglass Networks for Human Pose Estimation," p. arXiv:1603.06937doi: 10.48550/arXiv.1603.06937.
[16] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[17] K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep high-resolution representation learning for human pose estimation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693-5703.
[18] Y. Xu, J. Zhang, Q. Zhang, and D. Tao, "Vitpose: Simple vision transformer baselines for human pose estimation," Advances in Neural Information Processing Systems, vol. 35, pp. 38571-38584, 2022.
[19] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291-7299.
[20] M. Kocabas, S. Karagoz, and E. Akbas, "Multiposenet: Fast multi-person pose estimation using pose residual network," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 417-433.
[21] G. Papandreou, T. Zhu, L.-C. Chen, S. Gidaris, J. Tompson, and K. Murphy, "Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 269-286.
[22] G. Moon and K. M. Lee, "I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, 2020: Springer, pp. 752-768.
[23] G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis, "Learning to estimate 3D human pose and shape from a single color image," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 459-468.
[24] S. Bai, J. Z. Kolter, and V. Koltun, "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling," arXiv preprint arXiv:1803.01271, 2018.
[25] A. Sherstinsky, "Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network," Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020.
[26] D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, "3d human pose estimation in video with temporal convolutions and semi-supervised training," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7753-7762.
[27] C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, and Z. Ding, "3D Human Pose Estimation with Spatial and Temporal Transformers," p. arXiv:2103.10455doi: 10.48550/arXiv.2103.10455.
[28] 莊凱予, "利用深度學習從二維到三維之關鍵點重建與異常動作自動檢驗系統," 碩士, 資訊工程學系, 國立成功大學, 台南市, 2022. [Online]. Available: https://hdl.handle.net/11296/z3fzu4
[29] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
[30] G. Moon, J. Y. Chang, and K. M. Lee, "Posefix: Model-agnostic general human pose refinement network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7773-7781.
[31] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[32] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014: Springer, pp. 740-755.
[33] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, "Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments," IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325-1339, 2013.
[34] J. Zhang, Z. Tu, J. Yang, Y. Chen, and J. Yuan, "MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video," p. arXiv:2203.00859doi: 10.48550/arXiv.2203.00859.
[35] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[36] Y. Lin. "了解自己的上肢動作程度─布氏上肢近端動作評估." http://www.strokerehab.com.tw/2018/03/blog-post.html (accessed.
[37] P. contributors. "Brunnstrom Movement Therapy." http:///index.php?title=Brunnstrom_Movement_Therapy&oldid=321024 (accessed 25 August, 2023).
[38] M. Ruggero Ronchi and P. Perona, "Benchmarking and error diagnosis in multi-instance pose estimation," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 369-378.
[39] M. Contributors, "Openmmlab pose estimation toolbox and benchmark," https://github. com/open-mmlab/mmpose, 2020.
[40] I. Loshchilov and F. Hutter, "Fixing weight decay regularization in adam," 2018.
[41] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," in European conference on computer vision, 2020: Springer, pp. 213-229.
[42] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[43] H. Zhang et al., "Resnest: Split-attention networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 2736-2746.
[44] J. Wang, S. Yan, Y. Xiong, and D. Lin, "Motion guided 3d pose estimation from videos," in European Conference on Computer Vision, 2020: Springer, pp. 764-780.
[45] R. Liu, J. Shen, H. Wang, C. Chen, S.-c. Cheung, and V. Asari, "Attention mechanism exploits temporal contexts: Real-time 3d human pose reconstruction," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5064-5073.
[46] T. Chen, C. Fang, X. Shen, Y. Zhu, Z. Chen, and J. Luo, "Anatomy-aware 3d human pose estimation in videos," arXiv preprint arXiv:2002.10322, vol. 29, 2020.