| 研究生: |
劉展憲 Liu, Jan-Shian |
|---|---|
| 論文名稱: |
基於卷積神經網路與遞迴神經網路之手部追蹤與動態手勢辨識研究 Study on Hand Tracking and Dynamic Hand Gesture Recognition based on Convolutional Neural Network and Recurrent Neural Network |
| 指導教授: |
鄭銘揚
Cheng, Ming-Yang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 100 |
| 中文關鍵詞: | 手部偵測 、動態手勢辨識 、卷積神經網路 、遞迴神經網路 、深度學習 |
| 外文關鍵詞: | Hand Detection, Dynamic Hand Gesture Recognition, Convolutional Neural Network, Recurrent Neural Network, Deep Learning |
| 相關次數: | 點閱:123 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文以工業用機械手臂為例,使用電腦視覺技術辨識作業員所給予之動態手勢,依據所辨識出之不同動態手勢給予機械手臂相對應的命令,模擬機械手臂教導器之功能。本論文總共定義19種動態手勢,分別對應機械手臂的X、Y、Z軸平移與旋轉及紀錄、執行、停止等相關的手勢。本論文使用近年來相當受歡迎之You Only Look Once這個演算法,進行複雜背景的手部偵測,並搭配卷積遞迴神經網路進行動態手勢辨識。本論文將偵測到的手部資訊轉換至三維空間以計算作業員手部與機械手臂各軸之最短距離,進而避免人機互動過程中,作業員進入機械手臂工作空間之危險範圍內。本論文所進行之實驗分別對偵測與辨識技術分析、探討。在偵測技術方面,比較不同尺度層與Anchor Box數量對精準度所造成的影響,實驗結果顯示較多尺度層的架構其偵測效能較佳,於mAP[0.8]的測試標準中達到97%以上。在辨識技術方面,則比較傳統動態影像識別方法與引入YOLO架構的差異,並以不同卷積遞迴神經網路辨識動態手勢,實驗結果顯示其最佳辨識結果可達99%以上。
This thesis exploits computer vision technology to recognize the dynamic hand gestures given by the human operator. Based on the recognized dynamic hand gesture, suitable commands to the industrial robot manipulators can be given. This kind of solution can be seen as an alternative to “teach pendent”. There are 19 different dynamic hand gestures defined in this thesis. The popular “You Only Look Once” algorithm is employed in this thesis to detect hands from images with complex backgrounds and recognize dynamic hand gestures with convolutional recurrent neural networks. This thesis also calculates the shortest safe 3D distance between the operator's hand and the industrial robot manipulator. Thus, it further prevents the operator from the danger of crossing the boundary of the working space of the industrial robot manipulator during the human-computer interaction. Several experiments have been conducted to analyze the detection and recognition techniques developed in this thesis. In terms of the detection techniques, the experiment compares the detection precision under the influence of different scale layers and the number of Anchor Boxes. Experimental results indicate that the architecture of the multi-scale layer has better detection performance, with its 〖mAP〗_0.8 being more than 97%. As for the recognition technique, the experiment compares the recognition accuracy between the conventional dynamic image recognition approach and the approach that includes the YOLO algorithm. Furthermore, several different convolutional recurrent neural networks are employed to perform dynamic hand gesture recognition. Experimental results reveal that the best dynamic hand gesture recognition result reaches higher than 99%.
[1] Kristina Grifantini, “Open-Source Data Glove.” Internet: https://www.technologyreview.com/s/414021/open-source-data-glove/, Jun. 23, 2009.
[2] Wearable Devices, “Mudra Inspire.” Internet: http://wearabledevices.co.il/, Apr. 22, 2019.
[3] Ring ZERO台灣官方網站, “Ring ZERO.” Internet: http://www.getringzero.com/, Apr. 22, 2019.
[4] A. Elgammal, R. Duraiswami, D. Harwood, and L.S. Davis, “Background and Foreground Modeling Using Nonparametric Kernel Density Estimation for Visual Surveillance,” Proceedings of the IEEE, vol. 90, no. 7, pp. 1151-1163, Jul. 2002.
[5] May, “IFR:全球工業機器人銷售2019年將達41.3萬台、均成長率13%.” Internet: http://iknow.stpi.narl.org.tw/Post/Read.aspx?PostID=13560, Jul. 7, 7.
[6] 勞動部勞動及職業安全衛生研究所, “機器人危害預防手冊.” Internet: https://www.ilosh.gov.tw/menu/1223/1235/1237/%E6%A9%9F%E5%99%A8%E4%BA%BA%E5%8D%B1%E5%AE%B3%E9%A0%90%E9%98%B2%E6%89%8B%E5%86%8A/, Mar. 1, 2014.
[7] I. E. Sobel, “Camera Models and Machine Perception,” PhD dissertation, Mathematics. Dept., University of Stanford, California, 1970.
[8] J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, Nov. 1986.
[9] B. D. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” in Proceedings of the International Joint Conference on Artificial Intelligence, 1981, pp. 674-679.
[10] B. K. P. Horn and B. G. Schunck, “Determine Optical Flow,” Artificial Intelligence, vol. 17, no. 1-3, pp. 185-203, Aug. 1981.
[11] J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of Optical Flow Techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43-77, Feb. 1994.
[12] V. Markandey, A. Reid, and S. Wang, “Motion Estimation for Moving Target Detection,” IEEE Transactions on Aerospace and Electronic Systems, vol. 32, no. 3, pp. 866-874, Jul. 1996.
[13] 陳俊壬,即時視覺伺服追蹤系統之運動偵測與估算,碩士論文,國立成功大學,機械工程學系研究所,台灣,2003。
[14] D. Murray and A. Basu, “Motion Tracking with an Active Camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 449-459, May. 1994.
[15] M. C. Tsai, K. Y. Chen, M. Y. Cheng, and K. C. Lin, “Implementation of a Real-Time Moving Object Tracking System Using Visual Servoing,” Robotica, vol. 21, no. 6, pp. 615-625, Dec. 2003.
[16] Y. Ren, C. S. Chua, and Y. K. Ho, “Motion Detection with Nonstationary Background,” in Proceedings of the 11th International Conference on Image Analysis and Processing, 2001, pp. 78-83.
[17] M. Y. Cheng, M. C. Tsai, and C. J. Chen, “Dynamic Visual Tracking Using SDG-like Matching,” Journal of Information Sciences and Engineering, vol. 24, no. 3, pp. 673-690, May. 2008.
[18] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A Discriminatively Trained, Multiscale, Deformable Part Model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-8.
[19] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, Sep. 2013.
[20] C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, Sep. 1995.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
[22] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 346-361.
[23] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable Object Detection Using Deep Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2155-5162.
[24] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1440-1448.
[25] S. Ren, K. He, R. Girshick, and J. Sun, ”Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017.
[26] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
[27] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6517-6525.
[28] J. Redmon and A. Farhadi, “Yolov3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018.
[29] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single Shot MultiBox Detector,” in Proceedings of the European Conference on Computer Vision, 2016, pp. 21-37.
[30] L. Xie, T. Ahmad, L. Jin, Y. Liu, and S. Zhang, “A New CNN-Based Method for Multi-Directional Car License Plate Detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 2, pp. 507-517, Feb. 2018.
[31] X. Wang, T. Xu, J. Zhang, S. Chen, and Y. Zhang, “SO-YOLO Based WBC Detection with Fourier Ptychographic Microscopy,” IEEE Access, vol. 6, pp. 51566-51576, Aug. 2018.
[32] G. Ning, Z. Zhang, C. Huang, X. Ren, H. Wang, C. Cai, and Z. He, “Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking,” in Proceedings of the IEEE International Symposium on Circuits and Systems, 2017, pp. 1-4.
[33] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.
[34] H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded Up Robust Features,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 404-417.
[35] X. Yin and X. Zhu, “Hand Posture Recognition in Gesture-Based Human-Robot Interaction,” in Proceedings of the 1st IEEE Conference on Industrial Electronics and Applications, 2006, pp. 1-6.
[36] E. J. Holden and R. Owens, “Recognising Moving Hand Shapes,” in Proceedings of the 12th International Conference on Image Analysis and Processing, 2003, pp. 14-19.
[37] N. Liu and B.C. Lovell, “Hand Gesture Extraction by Active Shape Models,” in Proceedings of the Digital Image Computing: Techniques and Application, 2006, pp. 1-6.
[38] N. H. Dardas and N. D. Georganas, “Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques,” IEEE Transactions on Instrumentation and Measurement, vol. 60, no. 11, pp. 3592-3607, Nov. 2011.
[39] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-Means Clustering Algorithm,” Journal of the Royal Statistical Society. Series C, vol. 28, no. 1, pp. 100-108, 1979.
[40] G. R. S. Murthy and R. S. Jadon, “Hand Gesture Recognition Using Neural Networks,” in Proceedings of the IEEE 2nd International Advance Computing Conference, 2010, pp. 134-138.
[41] J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1992, pp. 379-385.
[42] T. Starner and A. Pentland, “Real-Time American Sign Language Recognition from Video Using Hidden Markov Models,” in Proceedings of the International Symposium on Computer Vision, 1995, pp. 265-270.
[43] A. Corradini, “Dynamic Time Warping for Off-Line Recognition of a Small Gesture Vocabulary,” in Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001, pp. 82-89.
[44] K. Murakami and H. Taguchi, “Gesture Recognition Using Recurrent Neural Networks,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1991, pp. 237-242.
[45] E. Tsironi, P. Barros, C. Weber, and S. Wermter, “An Analysis of Convolutional Long Short-Term Memory Recurrent Neural Networks for Gesture Recognition,” Neurocomputing, vol. 268, no. C, pp. 76-86, Dec. 2017.
[46] 劉明山,基於視覺之手部追蹤與手勢辨識之研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2009。
[47] 張繼宗,基於遞迴神經網路使用骨架資訊之連續動態手勢辨識,碩士論文,國立交通大學,電子研究所,台灣,2017。
[48] 范鈞翔,基於視覺之靜態手勢辨識研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2018。
[49] 林潔君,基於視覺之工業用機械手臂物件夾取研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015。
[50] R. Tsai, “A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses,” IEEE Journal on Robotics and Automation, vol. 3, no. 4, pp. 323-344, Aug. 1987.
[51] Z. Zhang, “A Flexible New Technique for Camera Calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330-1334, Nov. 2000.
[52] A. Fusiello, E. Trucco, and A. Verri, “A Compact Algorithm for Rectification of Stereo Pairs,” Machine Vision and Applications, vol. 12, no. 1, pp. 16-22, Jul. 2000.
[53] K. S. Fu, R. C. Gonzalez, and C. S. G. Lee. Robotics: Control, Sensing, Vision, and Intelligence. New York, NY: McGraw-Hill, 1987, pp. 13-33.
[54] 羅國益,基於視覺之工業用機械手臂物件取放作業研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015。
[55] 蔡弘晉,基於單應性矩陣之三維模型重建法應用於六軸關節型機械手臂,碩士論文,國立成功大學,電機工程學系研究所,台灣,2014。
[56] 張庭育,虛擬視覺伺服估測器及動態視覺伺服架構之研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2018。
[57] C. Cai, N. Somani, S. Nair, D. Mendoza, and A. Knoll, “Uncalibrated Stereo Visual Servoing for Manipulators Using Virtual Impedance Control,” in Proceedings of the 13th International Conference on Control Automation Robotics & Vision, 2014, pp. 1888-1893.
[58] G. W. Stewart, “On the Early History of the Singular Value Decomposition,” SIAM Review, vol. 35, no. 4, pp. 551-566, Dec. 1993.
[59] Y. Linde, A. Buzo, and R. Gray, “An Algorithm for Vector Quantizer Design,” IEEE Transactions on Communications, vol. 28, no. 1, pp. 84-95, Jan. 1980.
[60] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[61] J. Hosang, R. Benenson, and B. Schiele, “Learning Non-maximum Suppression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6469-6477.
[62] J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Properties,” National Academy of Sciences, vol. 79, no. 8, pp. 2554-2558, Apr. 1982.
[63] J. L. Elman, “Finding Structure in Time,” Cognitive Science, vol. 14, no. 2, pp.179-211, Apr. 1990.
[64] M. I. Jordan. “Attractor Dynamics and Parallelism in A Connectionist Sequential Machine,” in Proceedings of the 8th Annual Conference of the Cognitive Science Society, 1986, pp. 531-546.
[65] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
[66] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv preprint arXiv: 1412.3555, 2014.
[67] B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, Nov. 2017.
[68] X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, pp. 802-810.
[69] M. Siam, S. Valipour, M. Jagersand, and N. Ray, “Convolutional Gated Recurrent Networks for Video Segmentation,” in Proceedings of the IEEE International Conference on Image Processing, 2017, pp. 3090-3094.
[70] R. Pascanu, T. Mikolov, and Y. Bengio, “On the Difficulty of Training Recurrent Neural Networks,” in Proceedings of the 30th International Conference on Machine Learning, 2013, pp. 1310-1318.
校內:2024-06-18公開