| 研究生: |
江明晏 Jiang, Ming-yen |
|---|---|
| 論文名稱: |
耦合隱藏式馬可夫模型於雙手手勢辨識 Recognition of Two-Handed Gestures via Couplings of Hidden Markov Models |
| 指導教授: |
謝璧妃
Hsieh, Pi-Fuei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 英文 |
| 論文頁數: | 57 |
| 外文關鍵詞: | gesture recognition, HMM, coupled hidden Markov model, stereo matching |
| 相關次數: | 點閱:92 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
手勢是自然且常見的表達方式。受惠於電腦視覺技術的發達,手勢辨識已廣泛低應用在人機介面中。本篇研究旨在臺灣手語之三維雙手勢辨識。我們以兩台攝影機進行手勢軌跡之追蹤,並用立體視覺技術來獲得每個時間點中軌跡之深度資訊。
雖然傳統隱藏式馬可夫模型已被廣泛低使用在手勢辨識當中,然而,此種機率架構並不適用於模擬多條程序間之互動。為了完全利用雙手手勢同步之特性,我們研究了多樣的隱藏式馬可夫模型架構並提出用耦合隱藏式馬可夫模型作雙手手勢辨識。耦合隱藏式馬可夫模型內的狀態轉移機率會根據兩個隱藏式馬可夫模型間鏈結的機率大小不同而做適當的改變。當程序間的狀態差距越大時,我們給予落後的程序較大的狀態轉移機率,超前的程序則相反,以確保兩條程序的推進機近於同步狀態。
我們選擇了十二個軌跡移動方向與攝影機平面垂直的臺灣手語作辨識,並得到不錯的結果。實驗結果顯現深度資訊能有效低從立體影像對中準確的擷取出來,而提出的耦合隱藏式馬可夫模型在雙手手勢的辨識上也較其他馬可夫模型架構出色。
Gestures are a natural and ubiquitous way to convey meaning in communication. Thanks to computer vision techniques, gestures have been widely used in human computer interface. This study was conceived to advance gesture recognition techniques for Taiwanese sign language (TSL) whose phonemes are three-dimensional gestures. Two digital cameras were used to acquire stereo vision input sequences to not lose any depth information in hand motion.
Although the hidden Markov models (HMMs) have been successfully applied to speech and gesture recognition, the standard framework actually cannot model the interaction among multiple processes adequately. To fully exploit the synchronous characteristics of two-handed gestures, this study investigated various architectures of HMMs and proposed the use of coupled HMMs (CHMMs) for two-handed sign recognition. The state transition probabilities in coupled HMMs were adaptively varied based on the information provided by coupling probability between HMMs. When the state difference between two HMMs was large, the lag process was given an increased state transition probability and the advanced process a reduced state transition probability. This ensures that two processes progressed nearly simultaneously.
Effectiveness of the proposed methods were demonstrated with the recognition results of twelve TSL sign words, which trajectories were primarily perpendicular to the plane of cameras. Results show that the depth information can be accurately extracted from stereo images, and the proposed CHMM outperforms other HMM architectures on modeling two-handed sign gestures.
[1] R. Sharma, V. I. Pavlovic, T. S. Huang, “Toward
multimodel human computer interface,” in Proc. IEEE,
vol. 86, no. 5, May 1998.
[2] T. Starner and A. Pentland, “Real-time American sign
language recognition using desk and wearable computer
based video,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 20, no. 12, Dec. 1998.
[3] L. Gupta and S. Ma, “Gesture-based interaction and
communication : automated classification of hand gesture
contours,” IEEE Trans, System, Man and
Cybernetics—Part C: Application and Reviews, vol. 31,
no. 1, Feb. 2001.
[4] C. Shan, T. Tan, and Y. Wei, “Real-time hand tracking
using a mean shift embedded particle filter,” Pattern
Recognition, vol. 40, pp. 1958–1970, July 2007.
[5] J. Cui, Z. Sun, “Model-based visual hand posture
tracking for guiding a dexterous robotic hand,” Optics
Communications, vol. 235, pp. 311–318, May 2004.
[6] M. Yeasin and S. Chaudhuri, “Visual understanding of
dynamic hand gestures,” Pattern Recognition, vol. 33,
pp. 1805–1817, Nov. 2000.
[7] Y. A. Ivanov and A. F. Bobick, “Recognition of visual
activities and interactions by stochastic parsing,”
IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 22, pp. 852–872, Aug. 2000.
[8] L. R. Rabiner, “A tutorial on hidden Markov models and
selected applications in speech recognition,” Proc. of
the IEEE, vol. 77, no. 2, Feb. 1989.
[9] A. D. Wilson and A. F. Bobick, “Parametric hidden
Markov models for gesture recognition,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 21, no.
9, Sep. 1999.
[10] H. K. Lee and J. H. Kim, “An HMM-based threshold model
approach for gesture recognition,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 21, no. 10,
Oct. 1999.
[11] A. D. Wilson and A. F. Bobick, “Parametric hidden
Markov models for gesture recognition,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 21, no.
9, Sep. 1999.
[12] H. K. Lee and H. K. Kim, “An HMM-based threshold model
approach for gesture recognition,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 21, no. 10,
Oct. 1999.
[13] A. F. Bobick and A. D. Wilson, “A state-based approach
to the representation and recognition of gestures,”
IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 19, no. 12, Dec. 1997.
[14] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden
Markov models for modeling interactive processes,”
Technical Report 405, MIT Media Lab, 1997.
[15] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden
Markov models for complex action recognition,” in
Proc. IEEE, computer vision and pattern recognition,
pp. 994–999, June 1997.
[16] I. Rezek, P. Sykacek, and S. J. Roberts, “Learning
interaction dynamics with coupled hidden Markov
models,” in Proc. IEE, Science Measurement and
Technology, vol. 147, no. 6, Nov. 2000.
[17] A. V. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao and
K. Murphy, “A coupled HMM for audio-visual speech
recognition,” in Proc. IEEE Int’l Conf. Acoustics,
Speech, and Signal Processing, 2002.
[18] 史文漢、丁立芬,手能生橋,第一冊~第二冊,中華民國聾人協
會發行,2004.
[19] N. Habili, C. C. Lim, and A. Moini, “Segmentation of
the face and hands in sign language video sequence using
color and motion cues,” IEEE Trans. Circuits and
Systems for Video Technology, vol. 14, no. 8, Aug. 2004.
[20] R. L. Hsu, M. A. Mottaleb, and A. K. Jain, “Face
detection in color images,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 24, no. 5, May
2002.
[21] A. Dempster, N. Laird, and D. Rubin, “Maximum
likelihood from incomplete data via the EM algorithm,”
Journal of the Royal Statistical Society, B. 39, 1977.
[22] T. Starner and A. Pentland, “ Real-time American sign
language recognition using desk and wearable computer
based video,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 20, no. 12, Dec. 1998.
[23] D. Scharstein and R. Szeliski, “A taxonomy and
evaluation of dense two-frame stereo correspondence
algorithms,” IJCV, 47(1/2/3):7–42, 2002.
[24] C. L. Zitnick and T. Kanade, “A cooperative algorithm
for stereo matching and occlusion detection,” IEEE
Trans. Pattern Analysis and Machine Intelligence, vol.
22, no. 7, July 2000.
[25] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate
energy minimization via graph cuts,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 23, no.
11, Nov. 2001.
[26] H. Hirschmuller, P. R. Innocent, and J. Garibaldi,
“Real-time correlation based stereo vision with reduced
border errors,” International Journal of Computer
Vision, 47(1/2/3), pp. 229–246, 2002
[27] K. Arbter, W. E. Snyder, H. Burkhardt, and G. Hirzinger,
“Application of affine-invariant Fourier descriptors to
recognition of 3-D objects,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 12, no. 7, July
1990.
[28] L. B. White, “Cartesian product hidden Markov models
with applications,” IEEE Trans. Signal Processing, vol.
40, no. 6, June, 1992.