| 研究生: |
鄭佳玄 Cheng, Chia-Shiuan |
|---|---|
| 論文名稱: |
台灣手語轉譯之運動軌跡辨識 Hand Motion Recognition for the Vision-based Taiwanese Sign Language Interpretation |
| 指導教授: |
謝璧妃
Hsieh, Pi-Fuei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2005 |
| 畢業學年度: | 93 |
| 語文別: | 英文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 隱藏式馬可夫模型 、膚色分割 、台灣手語 、手的運動軌跡追蹤 |
| 外文關鍵詞: | Hidden Markov Models(HMMs), skin segmentation, hand tracking, Taiwanese Sign Language(TSL) |
| 相關次數: | 點閱:154 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
台灣手語是聽障人士溝通的基本工具之一。設計一套台灣手語辨識系統來做為溝通介面,對於一般人與聽障人士溝通上有很大的幫助。一般人也可以透過這套系統方便學習台灣手語。在我們的台灣手語辨識系統中包含三個子系統: 運動軌跡辨識、手形辨識和臉部表情辨識。在這篇研究論文中,我們以視覺為基礎下進行台灣手語的運動軌跡的研究與辨識。此運動軌跡辨識子系統由四個部分組成: 色彩模型的建立,手的追蹤,運動軌跡的表示和運動軌跡的辨識。
我們所選的手語字彙中包含一隻手和二隻手的運動軌跡。本子系統操作的條件如下:穩定光線的單色背景下與移動的手不會與頭有重合的條件下,能夠百分之百的追蹤到手的位置。由於一般人比手語常常不自覺將頭稍微擺動,為了處理這些異常情況的。我們首先在每一張影像中紀錄多個可能是我們要追蹤的移動手的候選者。藉由分析整個影帶中移動的手的候選者的區域大小,我們先自動辨別出是一隻手或是二隻手的運動軌跡且決定種子影像(seed frame)。從種子影像中決定出最有可能的手的位置,並且,由種子影帶往前和往後的時間點藉由在兩兩連續影帶中移動的手距離為最小,從前一個和後一個時間點所紀錄的候選者中決定我們要追蹤移動手的位置。在追蹤手的運動方面結合變遷偵測與膚色分割兩種方法,使得系統辨識更加準確且加速。
我們定義九種台灣手語運動軌跡進行實驗。每一種運動軌跡根據不同方向性可再細分為數個子運動軌跡。其中每一個子運動軌跡都具有方向性。這些子運動軌跡之方向性在運動軌跡的表示方法中可以分為一類。因為我們的子系統必須滿足運動軌跡的旋轉,位移,對稱和大小不變性,所以我們進行一系列座標轉換。其中,對稱性的因素是發生在於,當兩隻手比相同的某些運動軌跡時,與圓形的運動軌跡順時針或逆時針方向旋轉。當九種運動軌跡之一被初步辨識出後,我們利用原始座標位置中的相對關係,可以很輕易的回復它原本的方向性,進一步分辨那些具有相同手型與運動軌跡卻具不同方向的手語。
我們使用二種前處理來改善系統的效能: (1)去掉運動軌跡中多餘的重覆點且計算更準確的動動軌跡的中心點、(2)平滑運動軌跡。
我們也探討影像的解析度對此子系統效能的影響。從實驗結果顯示,當場景由原解析度352 x 240下降0.4倍至140 x 96 並不會降低系統效能。實驗結果顯示。我們所提出的運動軌跡方法能夠達到大約90%的準確率,經過了前述二種前處理可以提升至93%的準確率。
Taiwanese Sign Language (TSL) is one of the communication tools for deaf people. To design the recognition system for TSL can help people communicate with deaf people and learn TSL from the system. There three subsystems of sign language recognition for TSL in our system: the hand motion recognition, the hand shape recognition, and the facial expression recognition. In this study we have developed a vision-based approach to recognition of the hand motion of TSL. Our hand motion recognition subsystem consists of four phases: construction of color model, hand tracking, motion representation, and motion recognition.
There are on-hand or two-hand hand motions in our sign lexicon. Our hand tracking can track the hand positions with an accuracy of 100% under a plain background, a stationary lighting condition, no body movement, and no occlusion with head. The signer might have the incautious movement such as head’s movement during performing a sign gesture. We record the multiple hand candidates in each frame to deal with this situation. In the analysis of the multiple hand candidates, the number of hands in movement can be automatically determined and the seed frame can be chosen. We first determine the moving hands in the seed frame. By minimizing the distance between each pair of hand positions in consecutive frames to determine the moving hands in forward and backward time instance frame of the seed frame. We combine change detection and skin segmentation in the hand tracking phase to track the more accurate moving hands and to speed up the subsystem.
There are nine hand motion patterns defined for TSL without the direction information in this study and each hand motion pattern has several subpatterns that contain the direction information. These subpatterns defined for each hand motion are transformed as the same class by using our motion representation. Since it is desired that the recognition of hand motions is invariant with rotation, translation, symmetry, and scaling, we have made several coordinate transformations based on the polar coordinate system. We can handle the symmetry situation when the signer performs two-hand motion with the same hand motion pattern or the circular motion patterns are clockwise or counterclockwise rotation. Once one of the nine hand motion patterns has been recognized, the direction information can be easily restored based on the previous coordinate transformation. It can assist in recognizing the signs that are the same hand shape and hand motion but they have different direction. We use two different preprocessors: (1) Reduce redundant observations and calculate more accurate center of trajectory (Cc) (2) Apply mean filter to remove noise (Sm) to improve the system performance.
We have investigated the effect of the image resolution on recognition performance. The experimental results show that decreasing image resolution from 352 x 240 down to 140 x 96 does not deteriorate the system performance. Experimental results show that our proposed approach achieves a recognition accuracy of about 90%. The two preprocessors improve the recognition accuracy to achieve about 93%.
[1] W. C. Stokoe, Sign Language Structure: An outline of the Visual Communication System of the American Deaf, Buffalo, N.Y.: Univ. of Buffalo, 1960.
[2] H. Ren and G. Xu, “Human Action Recognition in Smart Classroom,” Proc. IEEE Int’l Conf. FG, pp. 54-60, 2002.
[3] A. D. Wilson and A. F. Bobick, “Parametric Hidden Markov Models for Gesture Recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 21, no. 9, Sept. 1999.
[4] L. W. Campbell, D. A. Becker, A. Azarbayejani, A. F. Bobick, and A. Pentland, “Invariant Features for 3-D Gesture Recognition,” Proc. IEEE Int’l Conf. FG, pp. 157-162, Zurich 1996.
[5] J. A. Montero V. and L. E. Sucar S., “Feature Selection for Visual Gesture Recognition Using Hidden Markov Models,” Proc. IEEE Int’l Conf. Computer Science, pp. 196-203, Mexican, 2004.
[6] L. D. Wilcox and M. A. Jack, “Training and Search Algorithms for an Interactive Wordspotting System,” Proc. Int’l Conf. Acoustics, Speech, and Signal Processing, vol. II, pp. 97-100, San Francisco, 1992.
[7] C. C. Lien, C. L. Huang, “Model-based Articulated Hand Motion Tracking for Gesture Recognition,” Journal of Image and Vision Computing., vol. 16, no. 2, pp. 121-134, 1998.
[8] M. J. Black and A. D. Jepson, “A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gesture and Expressions,” Proc Fifth European Conf. Computer Vision, pp. 909-924, 1998.
[9] K. Vaananen and K. Boehm, “Gesture Driven Interaction as a Human Factor in Virtual Environments-An Approach with Neural Networks,” Virtual Reality systems, R. Earnshaw, M. Gigante, H. Jones, eds., chapter 7, pp. 93-106. Academic Press, 1993.
[10] R. Kjeldsen and J. Kender, “Visual Hand Gesture Recognition for Window System Control,” Proc. Int’l Workshop Automatic Face-and-Gesture-Recognition, pp. 184-188, Zurich, Switzerland, 1995.
[11] H. K. Lee and J. H. Kim, “An HMM-based Threshold Model Approach for Gesture Recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 21, no. 10, pp. 961-973, Oct. 1999.
[12] M. H. Yang, N. Ahuja, and M. Tabb, “Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 8 pp. 1061-1074, Aug. 2002.
[13] J. Schlenzig, E. Hunter, and R. Jain, “Vision Based Hand Gesture Interpretation Using Recursive Estimation,” Proc. 28th Asilomar Conf. Signals, Systems, and Computers, 1994.
[14] T. E. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Models,” Proc. First Int’l Workshop Automatic Face and Gesture Recognition, pp. 189-194, 1995.
[15] T. Starner, J. Weaver, and A. Pentland,“Real-time American Sign Language Recognition Using Desk and Wearable Computer Based Video,”IEEE Trans. Pattern Anal. Machine Intell., 1998.
[16] S. S. Fels and G. E. Hinton, “Glove-Talk: A Neural Network Interface between a Data-Glove and a speech Synthesizer,” IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 2-8, Jan. 1993.
[17] S. S. Fels and G. E. Hinton, “Glove-Talk II: A Neural Network Interface which Maps Gestures to Parallel Format Speech Synthesizer Controls,” IEEE Trans. Neural Networks, vol. 9, no. 1, pp. 205-212, 1997.
[18] J. M. Siskind and Q. Morris, “A Maximum-Likelihood Approach to Visual Event Classification,” Proc Fourth European Conf. Computer Vision, pp. 347-360, 1996.
[19] A. F. Bobick and A. D Wilson, “A State-Based Approach to the Representation and Recognition of Gesture,” IEEE Trans. Pattern Anal. Machine Intell., vol. 19, no. 12, pp. 1325-1337, Dec. 1997.
[20] M. J. Black and A. D. Jepson, “A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gesture and Expressions,” Proc. Fifth European Conf. Computer Vision, pp. 909-924, 1998.
[21] C. Vogler and D. Metaxas, “ASL Recognition Based on Coupling between HMMs and 3D Motion Analysis,” Proc. Sixth IEEE Int’l Conf. Computer Vision, pp. 363-369, 1998.
[22] C. Vogler and D. Metaxas, “A Framework for Recognizing the Simultaneous Aspects of American Sign Language,” Computer Vision and Image Understanding, vol. 81, no. 3, pp. 358-384, 2001.
[23] N. Liu, B. C. Lovell, and P. J. Kootsookos, “Evaluation of HMM Training Algorithms for Letter Hand Gesture Recognition,” Proc. 3rd IEEE Int’l Symposium on Signal Processing and Information Technology, pp. 648-651, Dec. 2003.
[24] R. H. Liang and M. Ouhyoung, “A Real-time Continuous Gesture Recognition System for Sign Language,” Proc of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 558-567, 1998.
[25] M. C. Su, Y. X. Zhao, H. Huang, and H. F. Chen, “A Fuzzy Rule-based Approach to Recognizing 3-D Arm Movements,” IEEE Trans. Neural Systems and Rehabilitation Eng., vol. 9, no. 2, June 2001.
[26] F. S. Chen, C. M. Fu, and C. L. Huang, “Hand Gesture Recognition Using a Real-time Tracking Method and Hidden Markov Models,” Journal of Image and Vision Computing, vol. 1, pp. 745-758, March 2003.
[27] B.W. Ming, H. S. Yoon, J. Soh, T. Ohashi, and T. Ejima, “Visual Recognition of Static/Dynamic Gesture: Gesture-Driven Editing System,” Journal of Visual Language and Computing, vol. 10, pp. 291-309, Jan. 1999.
[28] V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual Interpretation of Hand Gestures for Human-Computer Interaction: a Review,” IEEE Trans. Pattern Anal. Machine Intell., vol. 19, no. 7, July 1997.
[29] S. Ahmad, “A Usable Real-Time 3D Hand Tracker,” IEEE Asilomar Conf., 1994.
[30] D. A. Becker and A. Pentland, “Using a Virtual Environment to Teach Cancer Patients T’ai Chi, Relaxation, and Self-Imagery,” Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognition, Killington, Vt., Oct. 1996.
[31] U. Brockl-Fox, “Real-Time 3D Interaction With Up to 16 Degrees of Freedom From Monocular Image Flows,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 172-178, June 1995.
[32] J. L. Crowley, F. Berard, and J. Coutaz, “Finger Tracking As an Input Device for Augmented Reality,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 195-200, June 1995.
[33] T. Darrell and A. P. Pentland, “Attention-Driven Expression and Gesture Analysis in an Interactive Environment,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 135-140, June 1995.
[34] W. T. Freeman, K. Tanaka, J. Ohta, and K. Kyuma, “Computer Vision for Computer Games,” Proc. Int’l Conf. Automatic Face and Gesture Recognition, Killington, Vt. pp. 100-105, Oct. 1996.
[35] W. T. Freeman and C. D. Weissman, “Television Control by Hand Gestures,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 179-183, June 1995.
[36] M. Fukumoto, Y. Suenaga, and K. Mase, “Finger-Pointer : Pointing Interface by image Processing,” Computers and Graphics, vol. 18, no. 5, pp. 633-642, 1994.
[37] C. Maggioni, “GestureComputer-New Ways of Operating a Computer,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 166-171, June 1995.
[38] F. K. H. Quek, T. Mysliwiec, and M. Zhao, “Finger Mouse: A Freehand Pointing Interface,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 372-377, June 1995.
[39] J. M. Rehg and T. Kanade, “DigitEyes: Vision-Based Human Hand Tracking,” Technical Report CMU-CS-93-220, School of Computer Science, Carnegie Mellon Univ., 1993.
[40] R. Cipolla and N. J. Hollinghurst, “Human-Robot Interface by Pointing With Uncalibrated Stereo Vision,” Image and vision Computing, vol. 14, pp. 171-178, Mar. 1996.
[41] E. Hunter, J. Schlenzig, and R. Jain, “Posture Estimation in Reduced-Model Gesture Input Systems,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, June 1995.
[42] A. Torige and T. Kono, “Human-Interface by Recognition of Human Gestures With Image Processing Recognition of Gesture to Specify Moving Directions,” IEEE Int’l Workshop on Robot and Human Communication, pp. 105-110, 1992.
[43] Y. Cui and J. J. Weng, “Hand Segmentation Using Learning-Based Prediction and Verification for Hand Sign Recognition,” Proc. Int’l Conf. Automatic Face and Gesture Recognition, Killington, Vt., pp. 88-93, Oct. 1996.
[44] T. E. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Models,” Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, pp. 189-194, June 1995.
[45] N. Habili, C. C. Lim, and A. Moini, “Segmentation of the Face and Hands in Sign Language Video Sequences Using Color and Motion Cues,” IEEE Trans. Circuits Syst. Video Techn, vol. 14, no. 8, Oct. 2004.
[46] H.S. Yoon, J. Soh, B.W. Ming, and H. S. Yang, “Recognition of Alphabetical Hand Gestures Using Hidden Markov Model,” IEICE Trans. Fund. Electr., vol. 87, pp. 1358-1366, Jul. 1999.
[47] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE Trans. ASSP, vol. 77, no. 2, pp. 257-286, 1989.
[48] N. Johnson, “Learning object behaviour models,” Ph. D Thesis, School of Computer Studies, University of Leeds, England, Sep 1998.
[49] B. W. Min, H. S. Yoon, J. Soh, Y. M. Yang, and T. Ejima, “Hand Gesture Recognition Using Hidden Markov Models,” International Conference on Systems, Man, and Cybernetics., vol. 5, pp. 4232 – 4235, Oct. 1997.
[50] L. E. Baum and G. R. Sell, “Growth Functions for Transformations on Manifolds,” Pac. J. Math., vol. 27, no. 2, pp. 211-227, 1968.
[51] A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimal Decoding Algorithm,” IEEE Trans. Informat. Theory, vol. IT-13, pp. 260-269, Apr. 1967.
[52] J. K. Baker, “The Dragon System – An Overview,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-23, no. 1, pp. 24-29, Feb. 1975.