| 研究生: |
范鈞翔 Fan, Jyun-Siang |
|---|---|
| 論文名稱: |
基於視覺之靜態手勢辨識研究 Study on Vision-Based Static Hand Gesture Recognition |
| 指導教授: |
鄭銘揚
Cheng, Ming-Yang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 83 |
| 中文關鍵詞: | 手勢辨識 、隱藏式馬可夫模型 、類神經網路 、深度學習 |
| 外文關鍵詞: | Hand Gesture Recognition, Hidden Markov Model, Artificial Neural Network, Deep Neural Network |
| 相關次數: | 點閱:61 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
電腦科技不斷發展,人機互動的場合越來越常見,在人機互動的方式中,除了使用鍵盤、滑鼠等硬體設備,還有應用電腦視覺的輔助,以手勢作為一種溝通媒介。藉由攝影機拍攝手部影像,應用不同影像辨識方法辨識出手勢所代表之意義。本論文使用了三種不同的靜態手勢辨識架構,分別為使用Hu Moments擷取手勢影像之特徵向量,並以向量量化轉換為觀測符號序列,作為隱藏式馬可夫模型之輸入;或是使用類神經網路架構,將Hu Moments特徵向量作為類神經網路架構之輸入進行辨識;除了一般類神經網路,本論文亦使用了現今相當熱門的卷積類神經網路,針對其前向及反向傳播進行數學推導,並對許多深度類神經網路之最佳化方法進行分析和實驗。於本論文實驗中,訓練與測試階段使用之實際手勢影像資料,除包含正對攝影機之標準手勢影像,亦包含手勢不同的角度姿態,以及手指合併等容易混淆的狀況,對三種架構進行實驗與比較。最後實驗結果證實,卷積類神經網路架構雖然運算量較大且訓練時間較長,但對於靜態手勢辨識問題,擁有三種架構中最高的辨識準確率,可達到99%以上。
With the development of computer technology, Human-Computer Interaction (HCI) has become increasingly important in recent years. In addition to mouse and keyboard, with the help of computer vision, hand gestures have become a more common method for HCI. Based on the hand images catured by a camera, the recognition of hand gestures can be performed by using various image recognition methods. This thesis implements and compares three different approaches for static hand gesture recognition: the Hidden Markov Model (HMM), Artificial Neural Network (ANN), and deep neural network. In the HMM based approach, this thesis uses Hu Moments to obtain the image feature vector of the hand gesture images. The obtained image feature vector is then quantized into a sequence of observation symbol to be used as the input of the HMM model. As for the ANN based approach, the image feature vector obtained using Hu Moments is also used as the input of the neural network for hand gesture recognition. In addition to the conventional artificial neural network, this thesis also adopts the popular Convolutional Neural Network (CNN) to recognize hand gestures. Mathematical derivations for forward and backward propagation of CNN is performed. Moreover, in this thesis, experiments and analysis concerning several optimization methods for deep neural networks are also performed. In the hand gesture recognition experiment, two types of hand gesture images are used for traing and testing. The first type of images contains the hand gesture images that are taken directly facing the camera and no fingers keep close to each other. The second type of images contains the hand gesture images that are not taken directly facing the camera or there are fingers keeping close to each other. The aforementioned three different hand gesture recognition approaches are tested in the experiment. Although the CNN based hand gesture recognition approach results in larger computation load and longer training time, experimental results indicate that it can achieve up to 99% recognition accuracy and has the best recognition results among the three tested approaches.
[1] S. S. Rautaray and A. Agrawal, “Vision Based Hand Gesture Recognition for Human Computer Interaction: A Survey,” Artificial Intelligence Review, vol. 43, no. 1, pp. 1-54, Jan. 2015.
[2] N. Y. Y. Kevin, S. Ranganath, and D. Ghosh, “Trajectory Modeling in Gesture Recognition Using CyberGloves/sup /spl reg// and Magnetic Trackers,” in Proceedings of the IEEE Region 10 Conference TENCON 2004, Chiang Mai, Thailand, 2004, pp. 571-574.
[3] CyberGlove Systems LLC. CyberGlove II. Retrieved Jun. 26, 2018, from http://www.cyberglovesystems.com/cyberglove-ii/
[4] J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” in Proceedings of the 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, IL, USA, 1992, pp. 379-385.
[5] T. Starner and A. Pentland, “Real-Time American Sign Language Recognition from Video Using Hidden Markov Models,” in Proceedings of the International Symposium on Computer Vision, Coral Gables, FL, USA, 1995, pp. 265-270.
[6] M. Hossain and M. Jenkin, “Recognizing Hand-Raising Gestures Using HMM,” in Proceedings of the 2nd Canadian Conference on Computer and Robot Vision, Victoria, BC, Canada, 2005, pp. 405-412.
[7] B. W. Min, H. S. Yoon, J. Soh, Y. M. Yang, and T. Ejima, “Hand Gesture Recognition Using Hidden Markov Models,” in Proceedings of the 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Orlando, FL, USA, 1997, pp. 4232-4235.
[8] A. Corradini, “Dynamic Time Warping for Off-Line Recognition of a Small Gesture Vocabulary,” in Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, Vancouver, BC, Canada, 2001, pp. 82-89.
[9] M. K. Bhuyan, D. Ghoah, and P. K. Bora, “A Framework for Hand Gesture Recognition with Applications to Sign Language,” in Proceedings of the 2006 Annual IEEE India Conference, New Delhi, India, 2006, pp. 1-6.
[10] X. Teng, B. Wu, W. Yu, and C. Liu, “A Hand Gesture Recognition System Based on Local Linear Embedding,” Journal of Visual Languages & Computing, vol. 16, no. 5, pp. 442-454, Oct. 2005.
[11] B. Yi, “Real-Time Natural Hand Gestures,” Computing in Science & Engineering, vol. 7, no. 3, pp. 92-97, Jun. 2005.
[12] C. C. Chang, “Adaptive Multiple Sets of CSS Features for Hand Posture Recognition,” Neurocomputing, vol. 69, no. 16-18, pp. 16-18, Oct. 2006.
[13] A. Chalechale and G. Naghdy, “Visual-Based Human-Machine Interface Using Hand Gestures,” in Proceedings of the 2007 9th International Symposium on Signal Processing and Its Applications, Sharjah, United Arab Emirates, 2007, pp. 1-4.
[14] N. Liu and B.C. Lovell, “Hand Gesture Extraction by Active Shape Models,” in Proceedings of the Digital Image Computing: Techniques and Application, Queensland, Australia, 2005, pp. 1-6.
[15] S. Malassiotis and M. G. Strintzis, “Real-Time Hand Posture Recognition Using Range Data,” Image and Vision Computing, vol. 26, no. 7, pp. 1027-1037, Jul. 2008.
[16] Y. Liu, Z. Gan, and Y. Sun, “Static Hand Gesture Recognition and Its Application Based on Support Vector Machines,” in Proceedings of the 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Phuket, Thailand, 2008, pp. 517-521.
[17] Y. Ren and F. Zhang, “Hand Gesture Recognition Based on MEB-SVM,” in Proceedings of the 2009 International Conference on Embedded Software and Systems, Zhejiang, China, 2009, pp. 344-349.
[18] Q. Chen, N. D. Georganas, and E. M. Petriu, “Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar,” IEEE Transactions on Instrumentation and Measurement, vol. 57, no. 8, pp. 1562-1571, Aug. 2008.
[19] Y. Liu and P. Zhang, “Vision-Based Human-Computer System Using Hand Gestures,” in Proceedings of the 2009 International Conference on Computational Intelligence and Security, Beijing, China, 2009, pp. 529-532.
[20] A. Prieto, F. Bellas, R. J. Duro, and F. Lopez-Pena, “An Adaptive Visual Gesture Based Interface for Human Machine Interaction in Intelligent Workspaces,” in Proceedings of the 2006 IEEE Symposium on Virtual Environments, Human-Computer Interfaces and Measurement Systems, La Coruna, Spain, 2006, pp. 43-48.
[21] J. M. Vilaplana and J. L. Coronado, “A Neural Network Model for Coordination of Hand Gesture During Reach to Grasp,” Neural Networks, vol. 19, no. 1, pp. 12-30, Jan. 2006.
[22] G. R. S. Murthy and R. S. Jadon, “Hand Gesture Recognition Using Neural Networks,” in Proceedings of the 2010 IEEE 2nd International Advance Computing Conference, Patiala, India, 2010, pp. 134-138.
[23] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of the Neural Information Processing Systems 2012, Stateline, NV, USA, 2012, pp. 1097-1105.
[25] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
[26] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” arXiv preprint arXiv:1409.4842, 2014.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv preprint arXiv:1512.03385, 2015.
[28] S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013.
[29] P. Molchanov, S. Gupta, K. Kim, and J. Kautz, “Hand Gesture Recognition with 3D Convolutional Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Santa Clara, California, USA, 2015, pp. 1-7.
[30] 陳建凱,完整自然手勢外型之擷取於深度類神經網路辨識,碩士論文,國立嘉義大學,資訊工程學系研究所,台灣,2016。
[31] 楊柏漢,基於深度學習在物聯網的應用之即時手語辨識,碩士論文,國立交通大學,多媒體工程研究所,台灣,2016。
[32] 許頌伶,利用三維模型訓練類神經網路的手勢辨識技術,碩士論文,國立交通大學,資訊科學與工程研究所,台灣,2016。
[33] 張繼宗,基於遞歸神經網路使用骨架資訊之連續動態手勢辨識,碩士論文,國立交通大學,電子研究所,台灣,2017。
[34] 劉明山,基於視覺之手部追蹤與手勢辨識研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2009。
[35] R. C. Gonzalez and R. E. Woods, Digital Image Processing. New Jersey, USA: Prentice-Hall, 2002.
[36] M. K. Hu, “Visual Pattern Recognition by Moment Invariants,” IRE Transactions on Information Theory, vol. 8, no. 2, pp. 179-187, Feb. 1962.
[37] R. Gray, “Vector Quantization,” IEEE ASSP Magazine, vol. 1, no. 2, pp. 4-29, Apr. 1984.
[38] Y. Linde, A. Buzo, and R. Gray, “An Algorithm for Vector Quantizer Design,” IEEE Transactions on Communications, vol. 28, no. 1, pp. 84-95, Jan. 1980.
[39] L. R. Rabiner, “A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[40] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. New Jersey, USA: Prentice-Hall, 1993.
[41] G. D. Forney, “The Viterbi Algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268-278, Mar. 1973.
[42] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1-38, 1977.
[43] W. S. McCulloh and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” The Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, Dec. 1943.
[44] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychological Review, vol. 65, no. 6, pp. 386-408, 1958.
[45] A. Nielsen, Neural Networks and Deep Learning. USA: Determination Press, 2015.
[46] P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. USA: Harvard University, 1975.
[47] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Representations by Back-Propagating Errors,” Nature, vol. 323, no. 6088, pp. 533-536, Oct. 1986.
[48] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” arXiv preprint arXiv:1311.2524, 2013.
[49] R. Girshick, “Fast R-CNN,” arXiv preprint arXiv:1504.08083, 2015.
[50] S. Ren, K. He, R. Girshick, and J. sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017.
[51] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” arXiv preprint arXiv:1506.02640, 2015.
[52] D. H. Hubel and T. N. Wiesel, “Receptive Fields, Binocular Interaction and Functional Architecture in the Cat's Visual Cortex,” The Journal of Physiology, vol. 160, no. 1, pp. 106-154, Jan. 1962.
[53] K. Fukushima and S. Miyake, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, vol. 36, no. 4, pp. 193-202, Apr. 1980.
[54] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines,” in Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010, pp. 807-814.
[55] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 315-323.
[56] A. L. Maas, A. Y. Hannum, and A. Y. Ng, “Rectifier Nonlinearities Improve Neural Network Acoustic Models,” in Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013, pp. 3-8.
[57] D. A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs),” arXiv preprint arXiv:1511.07289, 2015.
[58] H. Shimodaira, “Improving Predictive Inference Under Covariate Shift by Weighting the Log-Likelihood Function,” Journal of Statistical Planning and Inference, vol. 90, no. 2, pp. 227-244, Oct. 2000.
[59] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv preprint arXiv:1502.03167, 2015.
[60] S. Ruder, “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv:1609.04747, 2016.
[61] N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithms,” Neural Networks, vol. 12, no. 1, pp. 145-151, Jan. 1999.
[62] Y. Nesterov, “A Method for Unconstrained Convex Minimization Problem with the Rate of Convergence O (1/k^ 2),” Doklady AN USSR, vol. 269, pp. 543-547, 1983.
[63] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the Importance of Initialization and Momentum in Deep Learning,” in Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013, pp. 1139-1147.
[64] J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121-2159, Jul. 2011.
[65] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, 2014.
[66] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” arXiv preprint arXiv:1512.00567, 2015.
校內:2023-06-08公開