簡易檢索 / 詳目顯示

研究生: 劉展憲
Liu, Jan-Shian
論文名稱: 基於卷積神經網路與遞迴神經網路之手部追蹤與動態手勢辨識研究
Study on Hand Tracking and Dynamic Hand Gesture Recognition based on Convolutional Neural Network and Recurrent Neural Network
指導教授: 鄭銘揚
Cheng, Ming-Yang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 100
中文關鍵詞: 手部偵測動態手勢辨識卷積神經網路遞迴神經網路深度學習
外文關鍵詞: Hand Detection, Dynamic Hand Gesture Recognition, Convolutional Neural Network, Recurrent Neural Network, Deep Learning
相關次數: 點閱:123下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文以工業用機械手臂為例,使用電腦視覺技術辨識作業員所給予之動態手勢,依據所辨識出之不同動態手勢給予機械手臂相對應的命令,模擬機械手臂教導器之功能。本論文總共定義19種動態手勢,分別對應機械手臂的X、Y、Z軸平移與旋轉及紀錄、執行、停止等相關的手勢。本論文使用近年來相當受歡迎之You Only Look Once這個演算法,進行複雜背景的手部偵測,並搭配卷積遞迴神經網路進行動態手勢辨識。本論文將偵測到的手部資訊轉換至三維空間以計算作業員手部與機械手臂各軸之最短距離,進而避免人機互動過程中,作業員進入機械手臂工作空間之危險範圍內。本論文所進行之實驗分別對偵測與辨識技術分析、探討。在偵測技術方面,比較不同尺度層與Anchor Box數量對精準度所造成的影響,實驗結果顯示較多尺度層的架構其偵測效能較佳,於mAP[0.8]的測試標準中達到97%以上。在辨識技術方面,則比較傳統動態影像識別方法與引入YOLO架構的差異,並以不同卷積遞迴神經網路辨識動態手勢,實驗結果顯示其最佳辨識結果可達99%以上。

    This thesis exploits computer vision technology to recognize the dynamic hand gestures given by the human operator. Based on the recognized dynamic hand gesture, suitable commands to the industrial robot manipulators can be given. This kind of solution can be seen as an alternative to “teach pendent”. There are 19 different dynamic hand gestures defined in this thesis. The popular “You Only Look Once” algorithm is employed in this thesis to detect hands from images with complex backgrounds and recognize dynamic hand gestures with convolutional recurrent neural networks. This thesis also calculates the shortest safe 3D distance between the operator's hand and the industrial robot manipulator. Thus, it further prevents the operator from the danger of crossing the boundary of the working space of the industrial robot manipulator during the human-computer interaction. Several experiments have been conducted to analyze the detection and recognition techniques developed in this thesis. In terms of the detection techniques, the experiment compares the detection precision under the influence of different scale layers and the number of Anchor Boxes. Experimental results indicate that the architecture of the multi-scale layer has better detection performance, with its 〖mAP〗_0.8 being more than 97%. As for the recognition technique, the experiment compares the recognition accuracy between the conventional dynamic image recognition approach and the approach that includes the YOLO algorithm. Furthermore, several different convolutional recurrent neural networks are employed to perform dynamic hand gesture recognition. Experimental results reveal that the best dynamic hand gesture recognition result reaches higher than 99%.

    中文摘要 I EXTENDED ABSTRACT III 致謝 XVI 目錄 XVII 圖目錄 XXI 表目錄 XXIV 第一章、緒論 1 1.1 研究動機與目的 1 1.2 文獻回顧 2 1.2.1 手部偵測 2 1.2.2 手勢辨識 4 1.3 論文架構 5 第二章、三維還原與座標系轉換 6 2.1 前言 6 2.2 攝影機模型 6 2.2.1 內部參數 8 2.2.2 外部參數 10 2.3 立體視覺模型 11 2.3.1 雙眼校正 12 2.3.2 影像矯正 12 2.3.3 視差法之深度估測 14 2.4 雙眼影像相似度 15 2.4.1 Mean Square Error影像相似度演算法 16 2.4.2 Mean Absolute Error影像相似度演算法 16 2.4.3 Peak Signal to Noise Ratio影像相似度演算法 16 2.4.4 Modify Mean Absolute Error影像相似度演算法 17 2.4.5 Bhattacharyya Distance影像相似度演算法 17 2.4.6 Cosine Similarity影像相似度演算法 17 2.4.7 dHash影像相似度演算法 18 2.4.8 pHash影像相似度演算法 18 2.5 眼對手座標系轉換之關係 20 2.5.1 間接校正法 21 2.5.2 直接校正法 23 2.5.3 線上校正法 24 2.6 虛擬力量場與座標命令 26 2.6.1 機械手臂與手部近似 26 2.6.2 三維危險測距流程 26 2.6.3 手部座標命令 27 第三章、基於You Only Look Once之手部偵測之探討 28 3.1 前言 28 3.2 You Only Look Once架構 28 3.2.1 影像區塊分割 29 3.2.2 定義初始預測框 30 3.2.2.1 K-means 群集演算法 31 3.2.2.2 LBG 群集演算法 32 3.2.3 預測框生成 32 3.2.4 成本函數 34 3.3 手部偵測架構流程 35 3.3.1 手部偵測之YOLO架構設計 35 3.3.2 手部偵測訓練流程 37 3.3.3 手部偵測推論流程 39 第四章、基於Convolutional Recurrent Neural Network之動態手勢辨識之探討.42 4.1 前言 42 4.2 Simple Recurrent Neural Network架構探討 42 4.2.1 Simple Recurrent Neural Network前向傳播 43 4.2.2 Simple Recurrent Neural Network反向傳播 44 4.3 Long Short Term Memory架構探討 47 4.3.1 Long Short Term Memory前向傳播 47 4.3.2 Long Short Term Memory反向傳播 50 4.4 Gate Recurrent Unit架構探討 53 4.4.1 Gate Recurrent Unit前向傳播 53 4.4.1 Gate Recurrent Unit反向傳播 54 4.5 卷積遞迴神經網路架構 57 4.5.1 CRNN之架構介紹 57 4.5.2 ConvRNN之架構介紹 58 4.6 動態手勢辨識架構流程 59 4.6.1 動態手勢辨識架構設計 59 4.6.2 動態手勢訓練流程 60 4.6.3 動態手勢辨識流程 61 第五章、系統架構與實驗結果分析 63 5.1 系統架構 63 5.2 實驗設備與資料分析流程 65 5.2.1 實驗設備 65 5.2.2 實驗場景 67 5.2.3 實驗資料分析流程 67 5.3 實驗結果 73 5.3.1 實驗一:基於YOLO之特徵擷取 73 5.3.2 實驗二:基於YOLO之手部偵測 74 5.3.3 實驗三:基於卷積遞迴神經網路之動態手勢辨識 81 5.4 實驗結論與分析 88 第六章、結論與未來建議 90 6.1 結論 90 6.2 未來建議 91 參考文獻 92

    [1] Kristina Grifantini, “Open-Source Data Glove.” Internet: https://www.technologyreview.com/s/414021/open-source-data-glove/, Jun. 23, 2009.
    [2] Wearable Devices, “Mudra Inspire.” Internet: http://wearabledevices.co.il/, Apr. 22, 2019.
    [3] Ring ZERO台灣官方網站, “Ring ZERO.” Internet: http://www.getringzero.com/, Apr. 22, 2019.
    [4] A. Elgammal, R. Duraiswami, D. Harwood, and L.S. Davis, “Background and Foreground Modeling Using Nonparametric Kernel Density Estimation for Visual Surveillance,” Proceedings of the IEEE, vol. 90, no. 7, pp. 1151-1163, Jul. 2002.
    [5] May, “IFR:全球工業機器人銷售2019年將達41.3萬台、均成長率13%.” Internet: http://iknow.stpi.narl.org.tw/Post/Read.aspx?PostID=13560, Jul. 7, 7.
    [6] 勞動部勞動及職業安全衛生研究所, “機器人危害預防手冊.” Internet: https://www.ilosh.gov.tw/menu/1223/1235/1237/%E6%A9%9F%E5%99%A8%E4%BA%BA%E5%8D%B1%E5%AE%B3%E9%A0%90%E9%98%B2%E6%89%8B%E5%86%8A/, Mar. 1, 2014.
    [7] I. E. Sobel, “Camera Models and Machine Perception,” PhD dissertation, Mathematics. Dept., University of Stanford, California, 1970.
    [8] J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, Nov. 1986.
    [9] B. D. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” in Proceedings of the International Joint Conference on Artificial Intelligence, 1981, pp. 674-679.
    [10] B. K. P. Horn and B. G. Schunck, “Determine Optical Flow,” Artificial Intelligence, vol. 17, no. 1-3, pp. 185-203, Aug. 1981.
    [11] J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of Optical Flow Techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43-77, Feb. 1994.
    [12] V. Markandey, A. Reid, and S. Wang, “Motion Estimation for Moving Target Detection,” IEEE Transactions on Aerospace and Electronic Systems, vol. 32, no. 3, pp. 866-874, Jul. 1996.
    [13] 陳俊壬,即時視覺伺服追蹤系統之運動偵測與估算,碩士論文,國立成功大學,機械工程學系研究所,台灣,2003。
    [14] D. Murray and A. Basu, “Motion Tracking with an Active Camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 449-459, May. 1994.
    [15] M. C. Tsai, K. Y. Chen, M. Y. Cheng, and K. C. Lin, “Implementation of a Real-Time Moving Object Tracking System Using Visual Servoing,” Robotica, vol. 21, no. 6, pp. 615-625, Dec. 2003.
    [16] Y. Ren, C. S. Chua, and Y. K. Ho, “Motion Detection with Nonstationary Background,” in Proceedings of the 11th International Conference on Image Analysis and Processing, 2001, pp. 78-83.
    [17] M. Y. Cheng, M. C. Tsai, and C. J. Chen, “Dynamic Visual Tracking Using SDG-like Matching,” Journal of Information Sciences and Engineering, vol. 24, no. 3, pp. 673-690, May. 2008.
    [18] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A Discriminatively Trained, Multiscale, Deformable Part Model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-8.
    [19] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, Sep. 2013.
    [20] C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, Sep. 1995.
    [21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
    [22] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 346-361.
    [23] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable Object Detection Using Deep Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2155-5162.
    [24] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1440-1448.
    [25] S. Ren, K. He, R. Girshick, and J. Sun, ”Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017.
    [26] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
    [27] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6517-6525.
    [28] J. Redmon and A. Farhadi, “Yolov3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018.
    [29] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single Shot MultiBox Detector,” in Proceedings of the European Conference on Computer Vision, 2016, pp. 21-37.
    [30] L. Xie, T. Ahmad, L. Jin, Y. Liu, and S. Zhang, “A New CNN-Based Method for Multi-Directional Car License Plate Detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 2, pp. 507-517, Feb. 2018.
    [31] X. Wang, T. Xu, J. Zhang, S. Chen, and Y. Zhang, “SO-YOLO Based WBC Detection with Fourier Ptychographic Microscopy,” IEEE Access, vol. 6, pp. 51566-51576, Aug. 2018.
    [32] G. Ning, Z. Zhang, C. Huang, X. Ren, H. Wang, C. Cai, and Z. He, “Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking,” in Proceedings of the IEEE International Symposium on Circuits and Systems, 2017, pp. 1-4.
    [33] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.
    [34] H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded Up Robust Features,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 404-417.
    [35] X. Yin and X. Zhu, “Hand Posture Recognition in Gesture-Based Human-Robot Interaction,” in Proceedings of the 1st IEEE Conference on Industrial Electronics and Applications, 2006, pp. 1-6.
    [36] E. J. Holden and R. Owens, “Recognising Moving Hand Shapes,” in Proceedings of the 12th International Conference on Image Analysis and Processing, 2003, pp. 14-19.
    [37] N. Liu and B.C. Lovell, “Hand Gesture Extraction by Active Shape Models,” in Proceedings of the Digital Image Computing: Techniques and Application, 2006, pp. 1-6.
    [38] N. H. Dardas and N. D. Georganas, “Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques,” IEEE Transactions on Instrumentation and Measurement, vol. 60, no. 11, pp. 3592-3607, Nov. 2011.
    [39] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-Means Clustering Algorithm,” Journal of the Royal Statistical Society. Series C, vol. 28, no. 1, pp. 100-108, 1979.
    [40] G. R. S. Murthy and R. S. Jadon, “Hand Gesture Recognition Using Neural Networks,” in Proceedings of the IEEE 2nd International Advance Computing Conference, 2010, pp. 134-138.
    [41] J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1992, pp. 379-385.
    [42] T. Starner and A. Pentland, “Real-Time American Sign Language Recognition from Video Using Hidden Markov Models,” in Proceedings of the International Symposium on Computer Vision, 1995, pp. 265-270.
    [43] A. Corradini, “Dynamic Time Warping for Off-Line Recognition of a Small Gesture Vocabulary,” in Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001, pp. 82-89.
    [44] K. Murakami and H. Taguchi, “Gesture Recognition Using Recurrent Neural Networks,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1991, pp. 237-242.
    [45] E. Tsironi, P. Barros, C. Weber, and S. Wermter, “An Analysis of Convolutional Long Short-Term Memory Recurrent Neural Networks for Gesture Recognition,” Neurocomputing, vol. 268, no. C, pp. 76-86, Dec. 2017.
    [46] 劉明山,基於視覺之手部追蹤與手勢辨識之研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2009。
    [47] 張繼宗,基於遞迴神經網路使用骨架資訊之連續動態手勢辨識,碩士論文,國立交通大學,電子研究所,台灣,2017。
    [48] 范鈞翔,基於視覺之靜態手勢辨識研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2018。
    [49] 林潔君,基於視覺之工業用機械手臂物件夾取研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015。
    [50] R. Tsai, “A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses,” IEEE Journal on Robotics and Automation, vol. 3, no. 4, pp. 323-344, Aug. 1987.
    [51] Z. Zhang, “A Flexible New Technique for Camera Calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330-1334, Nov. 2000.
    [52] A. Fusiello, E. Trucco, and A. Verri, “A Compact Algorithm for Rectification of Stereo Pairs,” Machine Vision and Applications, vol. 12, no. 1, pp. 16-22, Jul. 2000.
    [53] K. S. Fu, R. C. Gonzalez, and C. S. G. Lee. Robotics: Control, Sensing, Vision, and Intelligence. New York, NY: McGraw-Hill, 1987, pp. 13-33.
    [54] 羅國益,基於視覺之工業用機械手臂物件取放作業研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015。
    [55] 蔡弘晉,基於單應性矩陣之三維模型重建法應用於六軸關節型機械手臂,碩士論文,國立成功大學,電機工程學系研究所,台灣,2014。
    [56] 張庭育,虛擬視覺伺服估測器及動態視覺伺服架構之研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2018。
    [57] C. Cai, N. Somani, S. Nair, D. Mendoza, and A. Knoll, “Uncalibrated Stereo Visual Servoing for Manipulators Using Virtual Impedance Control,” in Proceedings of the 13th International Conference on Control Automation Robotics & Vision, 2014, pp. 1888-1893.
    [58] G. W. Stewart, “On the Early History of the Singular Value Decomposition,” SIAM Review, vol. 35, no. 4, pp. 551-566, Dec. 1993.
    [59] Y. Linde, A. Buzo, and R. Gray, “An Algorithm for Vector Quantizer Design,” IEEE Transactions on Communications, vol. 28, no. 1, pp. 84-95, Jan. 1980.
    [60] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
    [61] J. Hosang, R. Benenson, and B. Schiele, “Learning Non-maximum Suppression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6469-6477.
    [62] J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Properties,” National Academy of Sciences, vol. 79, no. 8, pp. 2554-2558, Apr. 1982.
    [63] J. L. Elman, “Finding Structure in Time,” Cognitive Science, vol. 14, no. 2, pp.179-211, Apr. 1990.
    [64] M. I. Jordan. “Attractor Dynamics and Parallelism in A Connectionist Sequential Machine,” in Proceedings of the 8th Annual Conference of the Cognitive Science Society, 1986, pp. 531-546.
    [65] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
    [66] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv preprint arXiv: 1412.3555, 2014.
    [67] B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, Nov. 2017.
    [68] X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, pp. 802-810.
    [69] M. Siam, S. Valipour, M. Jagersand, and N. Ray, “Convolutional Gated Recurrent Networks for Video Segmentation,” in Proceedings of the IEEE International Conference on Image Processing, 2017, pp. 3090-3094.
    [70] R. Pascanu, T. Mikolov, and Y. Bengio, “On the Difficulty of Training Recurrent Neural Networks,” in Proceedings of the 30th International Conference on Machine Learning, 2013, pp. 1310-1318.

    無法下載圖示 校內:2024-06-18公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE