簡易檢索 / 詳目顯示

研究生: 江明晏
Jiang, Ming-yen
論文名稱: 耦合隱藏式馬可夫模型於雙手手勢辨識
Recognition of Two-Handed Gestures via Couplings of Hidden Markov Models
指導教授: 謝璧妃
Hsieh, Pi-Fuei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 57
外文關鍵詞: gesture recognition, HMM, coupled hidden Markov model, stereo matching
相關次數: 點閱:92下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 手勢是自然且常見的表達方式。受惠於電腦視覺技術的發達,手勢辨識已廣泛低應用在人機介面中。本篇研究旨在臺灣手語之三維雙手勢辨識。我們以兩台攝影機進行手勢軌跡之追蹤,並用立體視覺技術來獲得每個時間點中軌跡之深度資訊。

    雖然傳統隱藏式馬可夫模型已被廣泛低使用在手勢辨識當中,然而,此種機率架構並不適用於模擬多條程序間之互動。為了完全利用雙手手勢同步之特性,我們研究了多樣的隱藏式馬可夫模型架構並提出用耦合隱藏式馬可夫模型作雙手手勢辨識。耦合隱藏式馬可夫模型內的狀態轉移機率會根據兩個隱藏式馬可夫模型間鏈結的機率大小不同而做適當的改變。當程序間的狀態差距越大時,我們給予落後的程序較大的狀態轉移機率,超前的程序則相反,以確保兩條程序的推進機近於同步狀態。

    我們選擇了十二個軌跡移動方向與攝影機平面垂直的臺灣手語作辨識,並得到不錯的結果。實驗結果顯現深度資訊能有效低從立體影像對中準確的擷取出來,而提出的耦合隱藏式馬可夫模型在雙手手勢的辨識上也較其他馬可夫模型架構出色。

    Gestures are a natural and ubiquitous way to convey meaning in communication. Thanks to computer vision techniques, gestures have been widely used in human computer interface. This study was conceived to advance gesture recognition techniques for Taiwanese sign language (TSL) whose phonemes are three-dimensional gestures. Two digital cameras were used to acquire stereo vision input sequences to not lose any depth information in hand motion.

    Although the hidden Markov models (HMMs) have been successfully applied to speech and gesture recognition, the standard framework actually cannot model the interaction among multiple processes adequately. To fully exploit the synchronous characteristics of two-handed gestures, this study investigated various architectures of HMMs and proposed the use of coupled HMMs (CHMMs) for two-handed sign recognition. The state transition probabilities in coupled HMMs were adaptively varied based on the information provided by coupling probability between HMMs. When the state difference between two HMMs was large, the lag process was given an increased state transition probability and the advanced process a reduced state transition probability. This ensures that two processes progressed nearly simultaneously.

    Effectiveness of the proposed methods were demonstrated with the recognition results of twelve TSL sign words, which trajectories were primarily perpendicular to the plane of cameras. Results show that the depth information can be accurately extracted from stereo images, and the proposed CHMM outperforms other HMM architectures on modeling two-handed sign gestures.

    1. INTRODUCTION 1 1.1 Motivation 1 1.2 Objective 3 1.3 Organization 7 2. FEATURE EXTRACTION 8 2.1 Skin color model 8 2.2 Trajectory extraction and representation 12 2.2.1 Depth reconstruct via stereo matching technique 13 2.2.2 The extraction of the face’s and hands' depth 17 2.2.3 Coordinate transformation 18 2.3 Hand shape extraction and representation 20 3. HIDDEN MARKOV MODELS FOR GESTURE RECOGNITION 21 3.1 The elements of an HMM 22 3.2 The three basic problems for HMMs 23 3.3 Solutions of the three basic problems of HMMs 24 3.3.1 Probability evaluation using the Forward-Backward procedure 24 3.3.2 Using the Viterbi algorithm to find the optimal state sequence 26 3.3.3 Parameter estimation using the Baum-Welch Method 27 4. COUPLED HMMs FOR GESTURE RECOGNITION 29 4.1 Various couplings of HMM 29 4.2 Extended parameter space in CHMMs 33 4.3 Evaluation of the coupled hidden Markov models 33 4.3.1 Viterbi Algorithms for CHMM 33 4.3.2 Parameter Reestimation of the CHMMs 34 4.4 Gestured based Coupled hidden Markov models 36 5. EXPERIMENTAL RESULTS 39 5.1 Data sets and environment setting 39 5.2 Trajectory recognition 43 5.3 Shape recognition 49 5.4 Gestures recognition 52 6. CONCLUSIONS 54 References 56

    [1] R. Sharma, V. I. Pavlovic, T. S. Huang, “Toward
    multimodel human computer interface,” in Proc. IEEE,
    vol. 86, no. 5, May 1998.
    [2] T. Starner and A. Pentland, “Real-time American sign
    language recognition using desk and wearable computer
    based video,” IEEE Trans. Pattern Analysis and Machine
    Intelligence, vol. 20, no. 12, Dec. 1998.
    [3] L. Gupta and S. Ma, “Gesture-based interaction and
    communication : automated classification of hand gesture
    contours,” IEEE Trans, System, Man and
    Cybernetics—Part C: Application and Reviews, vol. 31,
    no. 1, Feb. 2001.
    [4] C. Shan, T. Tan, and Y. Wei, “Real-time hand tracking
    using a mean shift embedded particle filter,” Pattern
    Recognition, vol. 40, pp. 1958–1970, July 2007.
    [5] J. Cui, Z. Sun, “Model-based visual hand posture
    tracking for guiding a dexterous robotic hand,” Optics
    Communications, vol. 235, pp. 311–318, May 2004.
    [6] M. Yeasin and S. Chaudhuri, “Visual understanding of
    dynamic hand gestures,” Pattern Recognition, vol. 33,
    pp. 1805–1817, Nov. 2000.
    [7] Y. A. Ivanov and A. F. Bobick, “Recognition of visual
    activities and interactions by stochastic parsing,”
    IEEE Trans. Pattern Analysis and Machine Intelligence,
    vol. 22, pp. 852–872, Aug. 2000.
    [8] L. R. Rabiner, “A tutorial on hidden Markov models and
    selected applications in speech recognition,” Proc. of
    the IEEE, vol. 77, no. 2, Feb. 1989.
    [9] A. D. Wilson and A. F. Bobick, “Parametric hidden
    Markov models for gesture recognition,” IEEE Trans.
    Pattern Analysis and Machine Intelligence, vol. 21, no.
    9, Sep. 1999.
    [10] H. K. Lee and J. H. Kim, “An HMM-based threshold model
    approach for gesture recognition,” IEEE Trans. Pattern
    Analysis and Machine Intelligence, vol. 21, no. 10,
    Oct. 1999.
    [11] A. D. Wilson and A. F. Bobick, “Parametric hidden
    Markov models for gesture recognition,” IEEE Trans.
    Pattern Analysis and Machine Intelligence, vol. 21, no.
    9, Sep. 1999.
    [12] H. K. Lee and H. K. Kim, “An HMM-based threshold model
    approach for gesture recognition,” IEEE Trans. Pattern
    Analysis and Machine Intelligence, vol. 21, no. 10,
    Oct. 1999.
    [13] A. F. Bobick and A. D. Wilson, “A state-based approach
    to the representation and recognition of gestures,”
    IEEE Trans. Pattern Analysis and Machine Intelligence,
    vol. 19, no. 12, Dec. 1997.
    [14] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden
    Markov models for modeling interactive processes,”
    Technical Report 405, MIT Media Lab, 1997.
    [15] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden
    Markov models for complex action recognition,” in
    Proc. IEEE, computer vision and pattern recognition,
    pp. 994–999, June 1997.
    [16] I. Rezek, P. Sykacek, and S. J. Roberts, “Learning
    interaction dynamics with coupled hidden Markov
    models,” in Proc. IEE, Science Measurement and
    Technology, vol. 147, no. 6, Nov. 2000.
    [17] A. V. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao and
    K. Murphy, “A coupled HMM for audio-visual speech
    recognition,” in Proc. IEEE Int’l Conf. Acoustics,
    Speech, and Signal Processing, 2002.
    [18] 史文漢、丁立芬,手能生橋,第一冊~第二冊,中華民國聾人協
    會發行,2004.
    [19] N. Habili, C. C. Lim, and A. Moini, “Segmentation of
    the face and hands in sign language video sequence using
    color and motion cues,” IEEE Trans. Circuits and
    Systems for Video Technology, vol. 14, no. 8, Aug. 2004.
    [20] R. L. Hsu, M. A. Mottaleb, and A. K. Jain, “Face
    detection in color images,” IEEE Trans. Pattern
    Analysis and Machine Intelligence, vol. 24, no. 5, May
    2002.
    [21] A. Dempster, N. Laird, and D. Rubin, “Maximum
    likelihood from incomplete data via the EM algorithm,”
    Journal of the Royal Statistical Society, B. 39, 1977.
    [22] T. Starner and A. Pentland, “ Real-time American sign
    language recognition using desk and wearable computer
    based video,” IEEE Trans. Pattern Analysis and Machine
    Intelligence, vol. 20, no. 12, Dec. 1998.
    [23] D. Scharstein and R. Szeliski, “A taxonomy and
    evaluation of dense two-frame stereo correspondence
    algorithms,” IJCV, 47(1/2/3):7–42, 2002.
    [24] C. L. Zitnick and T. Kanade, “A cooperative algorithm
    for stereo matching and occlusion detection,” IEEE
    Trans. Pattern Analysis and Machine Intelligence, vol.
    22, no. 7, July 2000.
    [25] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate
    energy minimization via graph cuts,” IEEE Trans.
    Pattern Analysis and Machine Intelligence, vol. 23, no.
    11, Nov. 2001.
    [26] H. Hirschmuller, P. R. Innocent, and J. Garibaldi,
    “Real-time correlation based stereo vision with reduced
    border errors,” International Journal of Computer
    Vision, 47(1/2/3), pp. 229–246, 2002
    [27] K. Arbter, W. E. Snyder, H. Burkhardt, and G. Hirzinger,
    “Application of affine-invariant Fourier descriptors to
    recognition of 3-D objects,” IEEE Trans. Pattern
    Analysis and Machine Intelligence, vol. 12, no. 7, July
    1990.
    [28] L. B. White, “Cartesian product hidden Markov models
    with applications,” IEEE Trans. Signal Processing, vol.
    40, no. 6, June, 1992.

    下載圖示 校內:2010-08-28公開
    校外:2010-08-28公開
    QR CODE