簡易檢索 / 詳目顯示

研究生: 劉宜珊
Liu, Yi-Shan
論文名稱: 基於深度學習和Mediapipe之動作遲緩影像識別
Movement Delay Image Recognition based on Deep Learning and Mediapipe
指導教授: 陳牧言
Chen, Mu-Yen
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 48
中文關鍵詞: 動作遲緩辨識基於骨架動作辨識物件偵測深度學習
外文關鍵詞: Movement Delay Recognition, Skeleton Based Action recognition, Object Detection, Deep Learning
相關次數: 點閱:147下載:56
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 據全球WHO統計,兒童發展遲緩機率約為6~8%,大多數人認為遲緩可能是無法改變的,且會造成無法適應社會生活的問題,但其實在孩童時期趁早發現遲緩問題並及時早期療育的話,是有機會恢復大部分的功能。
    動作遲緩是兒童發展遲緩中常見的面向,其中又細分為大動作和小動作發展遲緩。判定一個兒童是否為動作遲緩,需經過標準化的動作評估。學齡前兒童常用的標準化的大小動作評估分別為皮巴迪動作發展量表(PDMS-2)和兒童動作評估測驗(Movement ABC)兩種。兩個評估中又包含數個動作去做判定,藉由職能與物理治療師觀察和判定後,針對各個動作給予0到2分的分數,最後將分數加總後,再對比測量表去算出評估分數,接著去對照百分比來確定兒童是否達到遲緩的標準。
    為了省去人工判斷和一連串的分數對比評估,本研究將族群鎖定在四到六歲的兒童,導入遲緩動作評估自動判定的方法,並針對平衡動作建立預測模型。第一階段先進行臨床資料收集與收案標準的訂定,第二階則進行動作評估動作影片前處理與分析,以物件偵測模型YOLOv4(You Only Look Once version 4)和人體骨架模型MediaPipe取得到兒童的骨架和關節點位置座標以利後續四個平衡動作評斷動作特徵的提取,第三階段則採用滑動窗口(Sliding Windows)方式進行動作特徵提取,分別以不同窗口大小來提取靜態平衡動作(左腳單腳站立、右腳單腳站立)和動態平衡動作(踮腳尖走直線、跳格子),再將特徵做降維提取出關鍵重要特徵。第四階段將第三階段所得到的關鍵重要特徵轉為向量,並將靜態平衡動作(左腳單腳站立和右腳單腳站立)特徵合併輸入多個機器學習分類器進行分類,實驗結果顯示準確率最高為LightGBM (Light Gradient Boosting Machine, LightGBM)分類器,準確率達88%,而動態平衡動作(踮腳尖走直線和跳格子)特徵合併後,則輸入LSTM-CNN (Long Short-Term Memory Convolutional Neural Network, LSTM-CNN) 進行動作的分類和測試。最後實驗結果顯示準確率最高為當特徵提取滑動窗口設定為15時準確率最高達87%。

    According to global statistics from the World Health Organization (WHO), the probability of children experiencing developmental delays is approximately 6-8%. Many people believe that developmental delays are likely unchangeable and can lead to difficulties in adapting to social life. However, in reality, if delays are detected early during childhood and timely early intervention is provided, there is a chance to recover.
    The identification of movement delay, including fine motor or gross motor delay in children, relies on standard movement assessments. The common standard movement assessments were Peabody Developmental Motor Scales-2nd ed. and Movement Assessment Battery for Children. Both assessments are composed of several subtests, which were scored by therapists. Subtests scores are summed and generated indexes of motor performance. If the indexes were below 1-2 standard deviations to the norm, the motor delay was defined.
    In order to avoid subject judgment and simplify the scoring, the study would like to introduce automatic detection of movements, and the population will enroll children aged 4-6 years whose video during motor assessments will be recorded for analysis. The model for automatic detection of movements is divided into four stages. In the first stage, clinical data collection and case inclusion criteria are established. In the second stage, pre-processing and analysis of video recordings for the motor assessments are conducted.Object detection model YOLOv4 (You Only Look Once version 4) and human skeleton model MediaPipe are used to obtain skeletal and joint coordinates of the children for the extraction of four balance task features. The third stage involves feature extraction using a sliding windows approach, where different window sizes are used to extract features for static balance tasks (standing on one foot with the left or right foot) and dynamic balance tasks (walking heels raised, jumping on mats). The extracted features are then dimensionally reduced to extract key important features. In the fourth stage, the key important features obtained in the third stage are converted into vectors. The features from static balance tasks are combined and input into multiple machine learning classifiers for classification. The experimental results show that the highest accuracy is achieved with the Light Gradient Boosting Machine (LightGBM) classifier, reaching 88.8%. For the features from dynamic balance tasks combined and input into an LSTM-CNN (Long Short-Term Memory Convolutional Neural Network) for action classification and testing. The final experimental results show that the highest accuracy is achieved when the sliding window size is set to 15, reaching 87%.

    摘要 I Abstract II 誌謝 VII 目錄 VIII 表目錄 XII 圖目錄 XIII 第一章、緒論 1 1.1. 研究背景與動機 1 1.2. 研究目的 2 1.3. 章節摘要 3 第二章、文獻探討 5 2.1. 標準動作評估與遲緩判定 5 2.2. 人工智慧應用於醫療與動作偵測 6 2.3. 機器學習與深度學習介紹 6 2.3.1. 資料增生 7 2.3.2. 特徵降維與重要特徵提取 8 2.3.3. 機器學習分類器 9 2.3.4. LSTM-CNN 10 2.4. 目標檢測與人體骨架偵測 12 2.4.1. YOLOv4 13 2.4.2. Mediapipe 14 第三章、研究方法 17 3.1. 研究流程 17 3.2. 臨床收案評估與影片蒐集 18 3.2.1. 臨床資料與收案標準 18 3.2.2. 動作評估影片 18 3.3. 動作評估影片前處理與分析 19 3.3.1. 影片裁切與前處理 19 3.3.2. 骨架和關節點座標的提取 20 3.4. 動作特徵提取 21 3.4.1. 定義特徵21 3.4.2. 平衡動作提取 22 3.4.3. 補缺失值 24 3.4.4. 正規化 25 3.4.5. 相關係數分析 25 3.4.6. 降維驗證 26 3.4.7. SMOTE資料增生 26 3.5. 動作特徵提取 27 3.5.1. 機器學習分類器 27 3.5.2. LSTM-CNN 27 第四章、實驗設計與結果分析 29 4.1. 實驗環境及參數設定 29 4.2. 實驗資料集 30 4.2.1. 影片資料集 30 4.2.2. 資料表資料集 30 4.3. 實驗參數 31 4.3.1. YOLOv4權重與參數設定 31 4.3.2. 機器學習分類器與LSTM-CNN訓練參數設定 32 4.4. 評估績效指標 33 4.5. 實驗結果與討論 34 4.5.1. 靜態平衡分類結果 34 4.5.2. 動態平衡分類結果 37 第五章、結論與未來展望 40 5.1. 結論 40 5.2. 未來展望 41 參考文獻 43

    1. 王天苗(2013)。家長支援療育方案對零至三歲幼兒與家庭之成效研究。特殊教育研究學刊, 38(2),頁 1-28。
    2. 行政院新聞局. (2023, 5月). 出席「數位健康新未來 關鍵趨勢論壇」 鄭副院長:結合醫療、資通訊及半導體技術優勢 打造臺灣生技醫療成新興兆元產業. 行政院新聞稿 [2023-05-001]. 擷取自https://www.ey.gov.tw/Page/9277F759E41CCD91/09c62570-3d74-43cf-9044-09ecb1404b9e
    3. 汪家琦、張鑑如、陳柏熹、劉文瑜、陳嘉玲、蔡志謙, . . . 朱世明(2015)。學齡前兒童動作發展篩檢量表之開發與驗證-初步研究。台灣復健醫學雜誌, 43(4),頁 239-250。
    4. 徐敏、劉萌容(2021)。以家庭為中心專業團隊服務模式對早期療育之成效探討── 以南部某一區域醫院為例。特殊教育學報(54),頁 61-88。
    5. 陳妍彣、林克忠、謝妤葳、王湉妮(2013)。一般發展孩童執行功能與動作表現相關性研究之探討: 文獻回顧。臺灣職能治療研究與實務雜誌, 9(2),頁 126-139。
    6. 郭逸玲、卓妙如(2004)。發展遲緩兒早期療育之概念與模式。身心障礙研究季刊, 2(2),頁 68-76。
    7. 張玉芳、万斌候、熊忠陽. (2012). 文本分類中的特徵降维方法研究. 計算機應用研究, 29(7), 2541-2543.
    8. 葉俊延(2022)。機器學習分析肝臟超音波影像組學的主要特徵之研究。
    9. 楊智傑(2022)。精神醫療再進化: 結合人工智慧與精神醫學。醫療品質雜誌, 16(6),頁 68-75。
    10. 鄭惠文、林巾凱、遲景上、唐美華. (2012). 動作發展遲緩兒童合併扁平足相關因子之研究. 台灣復健醫學雜誌, 40(4), 205-213.
    11. 鄭艾葦. (2016). 鞋墊對動作發展遲緩兒童在平衡表現的影響.
    12. HealthNews健康醫療網. (2022, 11月). 每15個孩子就有1人遲緩 醫籲把握3歲前治療黃金期. 擷取自 https://www.healthnews.com.tw/article/55690
    13. Abubeker, K., Joshy, A., George, A. T., & Gopika, G. (2022). Internet of Healthcare Things (IoHT) Enabled Incessant Real Time Patient Monitoring System Using Non-Invasive Sensors. Paper presented at the 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO).
    14. Acharya, D., Khoshelham, K., & Winter, S. (2019). BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS journal of photogrammetry and remote sensing, 150, 245-258.
    15. Adithya, V., & Deepak, G. (2021). HBlogRec: a hybridized cognitive knowledge scheme for blog recommendation infusing XGBoosting and semantic intelligence. Paper presented at the 2021 IEEE International conference on electronics, computing and communication technologies (CONECCT).
    16. Allison, P. D. (2012). Logistic regression using SAS: Theory and application: SAS institute.
    17. Aman, & Sangal, A. (2022). Drowsy Alarm System Based on Face Landmarks Detection Using MediaPipe FaceMesh. Paper presented at the Proceedings of First International Conference on Computational Electronics for Wireless Communications: ICCWC 2021.
    18. Barua, A., Sharif, O., & Hoque, M. M. (2021). Multi-class Sports News Categorization using Machine Learning Techniques: Resource Creation and Evaluation. Procedia Computer Science, 193, 112-121.
    19. Bruininks, R. H., & Bruininks, B. D. (1978). Bruininks-Oseretsky test of motor proficiency.
    20. Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
    21. Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
    22. Chauhan, R., Ghanshala, K. K., & Joshi, R. (2018). Convolutional neural network (CNN) for image detection and recognition. Paper presented at the 2018 first international conference on secure cyber computing and communication (ICSCCC).
    23. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
    24. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
    25. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
    26. Cunningham, J. P., & Ghahramani, Z. (2015). Linear dimensionality reduction: Survey, insights, and generalizations. The Journal of Machine Learning Research, 16(1), 2859-2900.
    27. Dada, E. G., Bassi, J. S., Chiroma, H., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6), e01802.
    28. Deep Man. On-device Machine Learning Inference. Available online: https://zhuanlan.zhihu.com/p/423551635 (accessed on 20 Oct 2021).
    29. Elhoseny, M. (2020). Multi-object detection and tracking (MODT) machine learning model for real-time video surveillance systems. Circuits, Systems, and Signal Processing, 39, 611-630.
    30. Fang, H. S., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., ... & Lu, C. (2022). Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence.
    31. Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., & Tran, D. (2018). Detect-and-track: Efficient pose estimation in videos. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    32. Girshick, R. (2015). Fast r-cnn. Paper presented at the Proceedings of the IEEE international conference on computer vision.
    33. Gisbrecht, A., Schulz, A., & Hammer, B. (2015). Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing, 147, 71-82.
    34. Han, H., Wang, W.-Y., & Mao, B.-H. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In D.-S. Huang, X.-P. Zhang, & G.-B. Huang (Eds.), ICIC 2005 (pp. 878–887). Springer, Heidelberg.
    35. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Paper presented at the 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence).
    36. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    37. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., . . . Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
    38. Kotenko, I., Saenko, I., & Branitskiy, A. (2018). Framework for mobile Internet of Things security monitoring based on big data processing and machine learning. IEEE Access, 6, 72714-72723.
    39. Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2021). A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems.
    40. Liu, J., Zhang, S., & Fan, H. (2022). A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network. Expert Systems with Applications, 195, 116624.
    41. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Paper presented at the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14.
    42. Louppe, G. (2014). Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502.
    43. Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., . . . Lee, J. (2019). Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172.
    44. Ming, C., & Yunbing, Y. (2022). Perception-Free Calibration of Eye Opening and Closing Threshold for Driver Fatigue Monitoring. IEEE Access, 10, 125469-125476.
    45. Moneera, A., Maram, A., Azizah, A., AlOnizan, T., Alboqaytah, D., Aslam, N., & Khan, I. U. (2021). Click through rate effectiveness prediction on mobile ads using extreme gradient boosting. Comput. Mater. Continua, 66(2), 1681-1696.
    46. Nakano, N., Sakura, T., Ueda, K., Omura, L., Kimura, A., Iino, Y., . . . Yoshioka, S. (2020). Evaluation of 3D markerless motion capture accuracy using OpenPose with multiple video cameras. Frontiers in sports and active living, 2, 50.
    47. O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.
    48. Patel, C., Shah, D., & Patel, A. (2013). Automatic number plate recognition system (anpr): A survey. International Journal of Computer Applications, 69(9).
    49. Padilla, R., Netto, S. L., & Da Silva, E. A. (2020). A survey on performance metrics for object-detection algorithms. Paper presented at the 2020 international conference on systems, signals and image processing (IWSSIP).
    50. Ramraj, S., Uzir, N., Sunil, R., & Banerjee, S. (2016). Experimenting XGBoost algorithm for prediction and classification of different datasets. International Journal of Control Theory and Applications, 9(40), 651-662.
    51. Raschka, S. (2014). Naive bayes and text classification i-introduction and theory. arXiv preprint arXiv:1410.5329.
    52. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
    53. Soomro, T. A., Zheng, L., Afifi, A. J., Ali, A., Soomro, S., Yin, M., & Gao, J. (2022). Image segmentation for MR brain tumor detection using machine learning: A Review. IEEE Reviews in Biomedical Engineering.
    54. Shubho, F. H., Iftekhar, F., Hossain, E., & Siddique, S. (2021). Real-time traffic monitoring and traffic offense detection using YOLOv4 and OpenCV DNN. Paper presented at the TENCON 2021-2021 IEEE Region 10 Conference (TENCON).
    55. Supanich, W., Kulkarineetham, S., Sukphokha, P., & Wisarnsart, P. (2023). Machine Learning-Based Exercise Posture Recognition System Using MediaPipe Pose Estimation Framework. Paper presented at the 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS).
    56. Szolovits, P., Patil, R. S., & Schwartz, W. B. (1988). Artificial intelligence in medical diagnosis. Annals of internal medicine, 108(1), 80-87.
    57. Szolovits, P., Patil, R. S., & Schwartz, W. B. (1988). Artificial intelligence in medical diagnosis. Annals of internal medicine, 108(1), 80-87.
    58. Taylor, R. (1990). Interpretation of the correlation coefficient: a basic review. Journal of diagnostic medical sonography, 6(1), 35-39.
    59. Thaman, B., Cao, T., & Caporusso, N. (2022). Face mask detection using mediapipe facemesh. Paper presented at the 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO).
    60. Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. Paper presented at the Proceedings of the AAAI conference on artificial intelligence.
    61. Van Dyk, D. A., & Meng, X.-L. (2001). The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1), 1-50.
    62. Vinay, A., Shekhar, V. S., Rituparna, J., Aggrawal, T., Murthy, K. B., & Natarajan, S. (2015). Cloud based big data analytics framework for face recognition in social networks using machine learning. Procedia Computer Science, 50, 623-630.
    63. Vo, Q.-H., Nguyen, H.-T., Le, B., & Nguyen, M.-L. (2017). Multi-channel LSTM-CNN model for Vietnamese sentiment analysis. Paper presented at the 2017 9th international conference on knowledge and systems engineering (KSE).
    64. Wang, L., Khan, L., & Thuraisingham, B. (2008). An effective evidence theory based k-nearest neighbor (knn) classification. Paper presented at the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.
    65. Xia, K., Huang, J., & Wang, H. (2020). LSTM-CNN architecture for human activity recognition. IEEE Access, 8, 56855-56866.
    66. Yan, S. Understanding LSTM and Its Diagrams. Available online: https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714 (accessed on 26 June 2018).
    67. Yu, C., Xiao, B., Gao, C., Yuan, L., Zhang, L., Sang, N., & Wang, J. (2021). Lite-hrnet: A lightweight high-resolution network. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
    68. Yunyun, L., & JIANG, W. (2021). Detection of wearing safety helmet for workers based on YOLOv4. Paper presented at the 2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI).
    69. Zeng, C., Huang, Y., Yu, L., Zeng, Q., Wang, B., & Xu, Y. (2021). Long-term assessment of rehabilitation treatment of sports through artificial intelligence research. Computational and Mathematical Methods in Medicine, 2021.
    70. Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.-L., & Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214.
    71. Zhang, J., Li, Y., Tian, J., & Li, T. (2018). LSTM-CNN hybrid model for text classification. Paper presented at the 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC).

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE