簡易檢索 / 詳目顯示

研究生: 顏咏琪
Yen, Yung-Chi
論文名稱: 基於 Vision Transformer 的穿戴式人類活動識別
Wearable Human Activity Recognition Based on Vision Transformer
指導教授: 劉任修
Liu, Ren-Shoiu
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 62
中文關鍵詞: 人類活動辨識資料融合影像辨識Vision Transformer
外文關鍵詞: Wearable Human Activity Recognition, Data Fusion, Image Recognition, Vision Transformer
相關次數: 點閱:5下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人類活動識別(Human Activity Recognition, HAR)技術能即時監測與分析個體行為,廣泛應用於醫療保健、智慧家居和安全監控等領域。當前主流方法包括影像識別、骨架關節分析、生理訊號分析、感測器數據分析和無線訊號分析等。其中,基於感測器數據分析的穿戴式人類活動識別(Wearable HAR,WHAR),因其方便監測與即時回饋的優勢,逐漸受到關注。
    穿戴式人類活動識別主要依賴慣性測量單元(Inertial Measurement Unit,IMU)收集加速度與角速度數據,這些數據彼此間蘊含互補資訊。然而,目前研究多聚焦於單一感測器數據,忽略不同感測器之間潛在的關聯性。透過資料融合,可有效整合來自不同感測器與座標軸的資訊,提升模型識別性能。
    早期識別方法多依賴機器學習,需手動特徵提取,過程繁瑣且效率有限。隨著深度學習的發展,神經網路能自動學習數據中的多層次特徵,並處理大量高維數據,成為主流方法。同時,研究嘗試將感測器數據轉為圖像,更有效地捕捉數據的時間依賴關係。
    儘管如此,隨著模型深度的增加,計算成本也同步提升。此外,傳統神經網路對小樣本數據的適應性有限。近年來,預訓練大型模型如 Vision Transformer(ViT)逐漸成為新趨勢,並在人類活動識別中展現了優異性能。因此,本研究聚焦融合多維感測器數據,並提出 OSViT 架構提取特徵,提升識別準確率的同時降低計算成本,為穿戴式人類活動識別的應用提供有效方案。

    Human Activity Recognition (HAR) enables real-time behavior monitoring, widely used in healthcare, smart homes, and security. Wearable HAR (WHAR), especially using Inertial Measurement Units (IMUs), is gaining traction due to its convenience and instant feedback. Most studies focus on single-sensor data, overlooking intersensor links. Data fusion integrates complementary signals across sensors and axes, improving recognition accuracy. Early methods relied on manual feature extraction with machine learning, but deep learning now dominates by automating multilevel feature learning. Some approaches convert sensor data into images to better capture temporal patterns. However, deeper models raise computational costs and struggle with small datasets. Pretrained models like Vision Transformer (ViT) are emerging as effective alternatives. This study introduces an architecture that fuses multi-dimensional sensor data to enhance accuracy and reduce computational load. It eventually achieved 99.27% accuracy on UCI-HAR dataset.

    摘要 i EXTENDED ABSTRACT ii 誌謝 ix 目錄 x 表目錄 xii 圖目錄 xiii 1 緒論 1 1.1 背景及動機 1 1.2 研究目的 3 1.3 研究貢獻 3 1.4 論文架構 4 2 相關文獻探討 5 2.1 資料預處理 6 2.1.1 資料增強 7 2.1.2 資料融合 8 2.1.3 資料型態轉換 10 2.2 特徵提取與活動識別 12 2.2.1 傳統神經網路 12 2.2.2 預訓練模型 13 2.2.3 Vision Transformer 16 2.3 小節 17 3 研究方法 18 3.1 資料融合 20 3.1.1 串接 20 3.1.2 相加 21 3.1.3 互補濾波 21 3.2 資料增強 21 3.3 資料型態轉換 23 3.4 Vision Transformer 架構 24 3.5 本研究神經網路架構 27 4 實驗與分析 28 4.1 實驗流程 28 4.2 實驗資料集 28 4.3 實驗環境與參數設定 29 4.4 實驗評估指標 30 4.5 實驗結果與分析 31 4.5.1 實驗一:感測器融合分析 31 4.5.2 實驗二:模型架構分析 33 4.5.3 實驗三:資料增強效果分析 34 4.5.4 實驗四:提取不同數量的編碼器輸出 35 4.5.5 實驗五:模型效能分析 37 5 結論與未來發展 40 參考文獻 41

    Ahmad, Z. and Khan, N. (2021). Inertial sensor data to image encoding for human action recognition. IEEE Sensors Journal, 21(9):10978–10988.
    Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
    BenAbdelkader, C., Cutler, R., Nanda, H., and Davis, L. (2001). Eigengait: Motionbased recognition of people using image self-similarity. In Audio-and Video-Based Biometric Person Authentication: Third International Conference, AVBPA 2001 Halmstad, Sweden, June 6–8, 2001 Proceedings 3, pages 284–294. Springer.
    Chen, L., Hu, R., Wu, M., and Zhou, X. (2023). Hmgan: A hierarchical multi-modal generative adversarial network model for wearable human activity recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 7(3).
    Chetty, G. and White, M. (2016). Body sensor networks for human activity recognition. In 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN), pages 660–665.
    Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Dirgová Luptáková, I., Kubovčík, M., and Pospíchal, J. (2022). Wearable sensor-based human activity recognition with transformer model. Sensors, 22(5).
    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
    Foote, J. (1999). Visualizing music and audio using self-similarity. In Proceedings of the Seventh ACM International Conference on Multimedia (Part 1), MULTIMEDIA ’99, page 77–80, New York, NY, USA. Association for Computing Machinery.
    Gui, P., Tang, L., and Mukhopadhyay, S. (2015). Mems based imu for tilting measurement: Comparison of complementary and kalman filter based data fusion. In 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA), pages 2004–2009.
    Han, H., Zeng, H., Kuang, L., Han, X., and Xue, H. (2024). A human activity recognition method based on vision transformer. Scientific Reports, 14(1):15310. Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput., 9(8):1735–1780.
    Iosa, M., Picerno, P., Paolucci, S., and Morone, G. (2016). Wearable inertial sensors for human movement analysis. Expert review of medical devices, 13(7):641–659.
    Jiang, W. and Yin, Z. (2015). Human activity recognition using wearable sensors by 42deep convolutional neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, page 1307–1310, New York, NY, USA. Association for Computing Machinery.
    Kamycki, K., Kapuscinski, T., and Oszust, M. (2020). Data augmentation with suboptimal warping for time-series classification. Sensors, 20(1).
    Khatun, M. A., Yousuf, M. A., Ahmed, S., Uddin, M. Z., Alyami, S. A., Al-Ashhab, S., Akhdar, H. F., Khan, A., Azad, A., and Moni, M. A. (2022). Deep cnn-lstm with self-attention model for human activity recognition using wearable sensor. IEEE Journal of Translational Engineering in Health and Medicine, 10:1–16.
    Konak, O., Wegner, P., and Arnrich, B. (2020). Imu-based movement trajectory heatmaps for human activity recognition. Sensors, 20(24).
    Koşar, E. and Barshan, B. (2023). A new cnn-lstm architecture for activity recognition employing wearable motion sensor data: Enabling diverse feature extraction.Engineering Applications of Artificial Intelligence, 124:106529.
    Lu, J. and Tong, K.-Y. (2019). Robust single accelerometer-based activity recognition using modified recurrence plot. IEEE Sensors Journal, 19(15):6317–6324.
    Minh Dang, L., Min, K., Wang, H., Jalil Piran, M., Hee Lee, C., and Moon, H. (2020). Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognition, 108:107561.
    Nweke, H. F., Teh, Y. W., Alo, U. R., and Mujtaba, G. (2018). Analysis of multi-sensor fusion for mobile and wearable sensor based human activity recognition. In Proceedings of the International Conference on Data Processing and Applications, ICDPA 2018, page 22–26, New York, NY, USA. Association for Computing Machinery.
    Qin, Z., Zhang, Y., Meng, S., Qin, Z., and Choo, K.-K. R. (2020). Imaging and fusing time series for wearable sensor-based human activity recognition. Information Fusion, 53:80–87.
    Rodrigues, J., Liu, H., Folgado, D., Belo, D., Schultz, T., and Gamboa, H. (2022). Feature-based information retrieval of multimodal biosignals with a self-similarity matrix: Focus on automatic segmentation. Biosensors, 12(12).
    Saidani, O., Alsafyani, M., Alroobaea, R., Alturki, N., Jahangir, R., and Jamel, L. (2023). An efficient human activity recognition using hybrid features and transformer model. IEEE Access, 11:101373–101386.
    Steven Eyobu, O. and Han, D. S. (2018). Feature representation and data augmentation for human activity classification based on wearable imu sensor data using a deep lstm neural network. Sensors, 18(9).
    Sun, Z., Ye, J., Wang, T., Huang, S., and Luo, J. (2020). Behavioral feature recognition of multi-task compressed sensing with fusion relevance in the internet of things environment. Computer Communications, 157:381–393.
    Torres-Huitzil, C. and Alvarez-Landero, A. (2015). Accelerometer-Based Human Activity Recognition in Smartphones for Healthcare Services, pages 147–169. Springer International Publishing, Cham.
    Tseng, S. P., Li, W.-L., Sheng, C.-Y., Hsu, J.-W., and Chen, C.-S. (2011). Motion and attitude estimation using inertial measurements with complementary filter. In 2011 8th Asian Control Conference (ASCC), pages 863–868. Tu, T.-M., Su, S.-C., Shyu, H.-C., and Huang, P. S. (2001). A new look at ihs-like image fusion methods. Information Fusion, 2(3):177–186.
    Um, T. T., Pfister, F. M. J., Pichler, D., Endo, S., Lang, M., Hirche, S., Fietzek, U., and Kulić, D. (2017). Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI ’17, page 216–220, New York, NY, USA. Association for Computing Machinery.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA. Curran Associates Inc.
    Wei, X. and Wang, Z. (2024). Tcn-attention-har: human activity recognition based on attention mechanism time convolutional network. Scientific Reports, 14(1):7414.
    Xiao, S., Wang, S., Huang, Z., Wang, Y., and Jiang, H. (2022). Two-stream transformer network for sensor-based human activity recognition. Neurocomputing, 512:253–268.
    Xu, H., Li, J., Yuan, H., Liu, Q., Fan, S., Li, T., and Sun, X. (2020). Human activity recognition based on gramian angular field and deep convolutional neural network.IEEE Access, 8:199393–199405.
    Zebin, T., Sperrin, M., Peek, N., and Casson, A. J. (2018). Human activity recognition from inertial sensor time-series using batch normalized deep lstm recurrent networks. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–4.
    Zhao, Y., Dong, F., Sun, T., Ju, Z., Yang, L., Shan, P., Li, L., Lv, X., and Lian, C. (2024). Image expression of time series data of wearable imu sensor and fusion classification of gymnastics action. Expert Systems with Applications, 238:121978.

    無法下載圖示 校內:2030-07-28公開
    校外:2030-07-28公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE