簡易檢索 / 詳目顯示

研究生: 彭盛暉
Peng, Sheng-Hui
論文名稱: 基於骨架的動態手勢識別之高效圖卷積網路
An Efficient Graph Convolution Network for Skeleton-Based Dynamic Hand Gesture Recognition
指導教授: 蔡佩璇
Tsai, Pei-Hsuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 42
中文關鍵詞: 基於骨架的動作識別圖卷積網路注意力機制動態手勢識別
外文關鍵詞: Skeleton-based action recognition, Graph convolutional network, Attention mechanism, Dynamic hand gesture recognition
相關次數: 點閱:82下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 動態手勢識別因其在人機交互、機器人技術和其他領域的廣泛應用,已經發展成為計算機視覺研究的一個重要課題。儘管目前已有許多動態手勢識別的研究,然而現有的State-Of-The-Art (SOTA)方法被過度參數化了。具體來說,現有的辨識模型參數量相當龐大,進而導致高計算成本。
    為此,本研究中提出了一種高效且輕量級的圖卷積網路(ResGCNeXt),從骨架資訊中學習豐富的特徵,並以較少的模型參數實現較高的準確度。首先,根據動態手勢的顯著特徵設計了三種資料前處理策略,為識別模型提供足夠的特徵。接著,設計了一種結合bottleneck架構和group convolution的高效圖卷積網路結構,在不損失準確度的情況下減少模型參數的數量。最後,本研究提出一個稱為 SENet-Part attention(SEPA)的注意力模組。SEPA結合了SENet和優化的PartAtt,同時增強通道特徵和空間特徵的學習能力。
    本研究在兩個動態手勢資料集上進行了驗證,實驗結果表明 ResGCNeXt 實現了具有競爭力的性能,尤其是在顯著減少模型參數數量方面。與作為SOTA 方法之一的HAN-2S相比,我們的方法僅有HAN-2S一半的模型參數並且提高了0.3%的識別率。

    Dynamic hand gesture recognition has evolved as a prominent topic of computer vision research due to its vast applications in human-computer interaction, robotics, and other domains. Although there are numerous related recognition studies, the State-Of-The-Art (SOTA) methods are over-parametrized. Specifically, the number of model parameters is quite large, which results in high computational costs. In this work, an efficient and lightweight graph convolutional network (ResGCNeXt) is proposed to learn rich features from skeleton information and achieve high accuracy with a smaller number of model parameters. First, three data preprocessing strategies according to significant features of dynamic hand gestures are designed to provide sufficient features for the recognition model. Then, an efficient graph convolutional network structure combining bottleneck and group convolution is designed to reduce the number of model parameters without loss of accuracy. Furthermore, an attention block called SENet-Part attention (SEPA) is added to improve channel and spatial feature learning. This study is validated on two benchmark datasets, and the experimental results show that ResGCNeXt provides competitive performance, especially in significantly reducing the number of model parameters. Compared to HAN-2S, which is one of the best SOTA methods, our method has half model parameters and a 0.3% higher recognition rate.

    摘要 I An Efficient Graph Convolution Network for Skeleton-Based Dynamic Hand Gesture Recognition II 致謝 VIII 目錄 IX 表目錄 XI 圖目錄 XII 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的及方法 3 1.4 論文架構 6 第2章 文獻探討 7 2.1 基於骨架的動態手勢識別 7 2.2 高效網路 8 2.3 注意力機制 8 第3章 基於骨架的動態手勢識別模型-ResGCNeXt 10 3.1 模型架構 10 3.2 資料前處理 10 3.3 Spatial Temporal Graph Convolutional Network 15 3.4 RexGCN 16 第4章 實驗結果分析與討論 19 4.1 資料集 19 4.2 實現細節 21 4.3 比較對象介紹 22 4.4 實驗結果 24 4.5 消融實驗 25 4.6 討論 27 第5章 結論與未來研究方向 34 5.1 結論 34 5.2 未來方向 34 參考文獻 36 附錄 41

    [1] J. Smisek, M. Jancosek, and T. Pajdla, “3D with Kinect,” in Consumer depth cameras for computer vision, London, Springer, 2013, pp. 3-25.
    [2] L. Keselman, J. Iselin Woodfill, A. Grunnet-Jepsen, and A. Bhowmik, “Intel realsense stereoscopic depth cameras,” in IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 1-10.
    [3] S. Qiao, Y. Wang, and J. Li, “Real-time human gesture grading based on OpenPose,” in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, 2017, pp. 1-6.
    [4] T. Feix, R. Pawlik, H. B. Schmiedmayer, J. Romero, and D. Kragic, “A comprehensive grasp taxonomy,” in Robotics, science and systems: workshop on understanding the human hand for advancing robotic manipulation, vol. 2, no. 2.3, 2009, pp. 2-3.
    [5] Q. De Smedt, H. Wannous, and J.-P. Vandeborre, “3D hand gesture recognition by analysing set-of-joints trajectories,” in International Workshop on Understanding Human Activities through 3D Sensors, 2016, pp. 86–97.
    [6] E. Ohn-Bar and M. Trivedi, “Joint angles similarities and hog2 for action recognition,” in IEEE conference on computer vision and pattern recognition workshops, 2013, pp. 465–470.
    [7] Q. De Smedt, H. Wannous, and J.-P. Vandeborre, “Skeleton-based dynamic hand gesture recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 1–9.
    [8] J. C. Núñez, R. Cabido, J. J. Pantrigo, A. S. Montemayor, and J. F. Vélez, “Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition,” Pattern Recognition, vol. 76, pp. 80–94, Apr. 2018.
    [9] T. N. Kipf, and M. Welling, “Semisupervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
    [10] S. Yan, Y. Xiong, D. Lin, and X. Tang, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in AAAI Conference on Artificial Intelligence, 2018, pp. 7444–7452.
    [11] Y. Li, Z. He, X. Ye, Z. He, and K. Han, “Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition,” Eurasip Journal on Image and Video Processing, vol. 2019, no. 1, pp. 1–7, 2019.
    [12] W. Zhang, Z. Lin, J. Cheng, C. Ma, X. Deng, and H. Wang, “Sta-gcn: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition,” in The Visual Computer, vol. 36, no. 10, 2020, pp. 2433–2444.
    [13] F. Guo, Z. He, S. Zhang, X. Zhao, J. Fang, and J. Tan, “Normalized edge convolutional networks for skeleton-based hand gesture recognition,” Pattern Recognition, vol. 118, 2021, Art. no. 108044.
    [14] J. Hou, G. Wang, X. Chen, J.-H. Xue, R. Zhu, and H. Yang, “Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition,” in European Conference on Computer Vision Workshops, Munich, Germany, Sep. 2018, pp. 273–286.
    [15] Y.-F. Song, Z. Zhang, C. Shan, and L. Wang, “Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition,” in ACM International Conference on Multimedia, 2020, pp. 1625––1633.
    [16] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in IEEE conference on computer vision and pattern recognition, 2017, pp. 1492-1500.
    [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp 1097–1105.
    [18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp 770–778.
    [19] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
    [20] J. Liu, Y. Wang, S. Xiang, and C. Pan, “HAN: An Efficient Hierarchical Self-Attention Network for Skeleton-Based Gesture Recognition,” arXiv preprint arXiv:2106.13391, 2021.
    [21] X. Chen, H. Guo, G. Wang, and L. Zhang, “Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition,” in IEEE International Conference on Image Processing, 2017, pp. 2881-2885.
    [22] F. Yang, S. Sakti, Y. Wu, and S. Nakamura, “Make skeleton-based action recognition model smaller, faster and better,” in ACM International Conference on Multimedia in Asia, 2019.
    [23] Y. Chen, L. Zhao, X. Peng, J. Yuan, and D. N. Metaxas, “Construct dynamic graphs for hand gesture recognition via spatial-temporal attention,” arXiv preprint arXiv:1907.08871, 2019.
    [24] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12018–12027.
    [25] Q. De Smedt, H. Wannous, J.-P. Vandeborre, J. Guerry, B. Le Saux, and D. Filliat, “Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset,” in Eurographics Workshop on 3D Object Retrieval, 2017.
    [26] G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim, “First-person hand action benchmark with rgb-d videos and 3d hand pose annotations,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018.
    [27] R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recognition by representing 3d skeletons as points in a lie group,” in IEEE Confer-ence on Computer Vision and Pattern Recognition, 2014, pp. 588–595.
    [28] X. Zhang, Y. Wang, M. Gou, M. Sznaier, and O. Camps, “Efficient temporal sequence comparison and classification using gram matrix em-beddings on a riemannian manifold,” in IEEE Conference on Computer ision and Pattern Recognition, 2016, pp. 4498–4507.
    [29] J. Liu, Y. Liu, Y. Wang, V. Prinet, S. Xiang, and C. Pan, “Decoupled representation learning for skeleton-based gesture recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5751-5760.
    [30] S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in European conference on computer vision, 2018.
    [31] C. Bian, W. Feng, L. Wan, and S. Wang, “Structural knowledge distillation for efficient skeleton-based action recognition,” in IEEE Transactions on Image Processing, 2021, pp. 2963-2976.
    [32] Y. Shu, D. Zhang, P. Chen, and Y. Li, “Mini neural network based on knowledge distillation for dynamic gesture recognition in real scenes,” in 2021 IEEE international conference on consumer electronics and computer engineering, 2021, pp. 630-634.

    下載圖示 校內:2024-07-20公開
    校外:2024-07-20公開
    QR CODE