| 研究生: |
蘇建源 Su, Chien-Yuan |
|---|---|
| 論文名稱: |
條件式隨機域之圖形處理於人類動作辨識之應用 Graphical Modeling of Conditional Random Field for Human Motion Recognition |
| 指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 96 |
| 中文關鍵詞: | 動作辨識 、條件式隨機域 、圖形模型 、隱藏式馬可夫模型 |
| 外文關鍵詞: | graphical model, conditional random field, hidden Markov model, motion recognition |
| 相關次數: | 點閱:95 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於人類運動中不同動作常同時包含類似的基本動作單元或是動作之間的移轉在時間點上會有意義含糊和重疊現象,所以人類動作辨識在電腦視覺技術上是複雜且具挑戰的研究。文獻中通常使用隱藏式馬可夫(HMM)或其變形的模型來模型化人類運動,然而這種模型在訓練與辨識時必須假定在時間上動作序列的資料彼此是獨立。但事實上相似的動作單元經常發生在不同時間段上且通常存在著長距離的相依性,因此本篇論文以條件式隨機域(CRF)為出發點並以圖形模型理論的觀點來解釋這些相依性。我們以圖形理論中的聯合樹(junction tree)將彼此存在相關性的變數結合成一個最大子圖(clique)。其目的是將具有迴圈的CRF圖形架構建立成一個子圖樹來確保簡單的樹狀推論演算法可以用來推論總體的聯合機率。我們將這樣的架構建立在HMM的基礎上,用來在無人工切割的動作序列中自動取得狀態分割與適合連續變數的特徵函式。我們將所提出的圖形模型化條件式隨機域(GMCRF)實現在CMU肢體動作資料庫與IDIAP手勢資料庫並與一般CRF跟HMM比較。在同時考慮三個連續時間點資料之相依性情形下,我們得到GMCRF優於其他兩者的實驗結果。此後我們將平衡考量參數量與圖形的複雜程度並考慮更多時間點的相依性來觀察是否可得到更高的人體動作辨識正確率。
Human motion recognition is a challenging topic in computer vision areas. In the literature, the hidden Markov model (HMM) and its extensions have been widely developed for modeling human motions. HMM methods have the assumption that the sequence of observations is mutually independent in temporal domain. But, in real-world applications, similar motions often occur at various time moments. The long-term dependences between observations should be modeled to improve motion recognition performance. For this reason, the conditional random field (CRF) has been applied for large-span modeling of mutually dependent observations. However, the exact implementation of CRF is computationally expensive. We present the graphical modeling approaches to rapid CRF implementation. Specifically, we employ the junction tree algorithm to deal with the complex CRF structure with loops. We integrate the variables which are dependent into a maximum clique and build a junction tree. The loopy CRF structure is then transformed to the clique tree. Using this procedure, a tree inference algorithm is presented for inference of the joint probability for all variables. In implementation of CRF, we specify the continuous-valued HMM parameters as the feature functions. In the experiments, we evaluate the proposed Graphical Modeling of CRF (GMCRF) for human motion recognition using CMU Graphics Lab Motion Capture Database and IDIAP TwoHandManip Database. The preliminary results show that GMCRF outperforms HMM and the linear-chain structure CRF in terms of recognition rates.
[1]J.K. Aggarwal, S. Park, “Human Motion: Modeling and
Recognition of Actions and Interactions”, Proceedings
of the 2nd international Symposium on 3D Data
Processing, Visualization, and Transmission ,2004
[2]R. V. Babu and K.R. Ramakrishnan, “Compressed Domain
Human Motion Recognition using Motion History
Information”, Proceedings of 2003 International
Conference on Image Processing, (ICIP03), Vol.2, pp.
321-324, 2003
[3]A. Berger, S. D. Pietra and V. D. Pietra, “A maximum
entropy approach to natural language processing”,
Computational Linguistics, vol. 22, no. 1, pp. 39-71,
1996
[4]R. Cucchiara, C. Grana, G.Tardini and R.Vezzani,
“Probabilistic People Tracking for Occlusion
Handling”, Proceedings of International Conference on
Pattern Recognition, 2004
[5]C. Fanti. L. Zelnik-Manor. P. Perona, “Hybrid Models
for Human Motion Recognition”, Proceedings of
International Conference on Computer Vision and Pattern
Recognition, 2005
[6]S. Feng, R. Manmatha and A. McCallum, “Exploring the
Use of Conditional Random Field Models and HMMs for
Historical Handwritten Document Recognition”,
Proceedings of the Second International Conference on
Document Image Analysis for Libraries, 2006
[7]A. Gunawardana, M. Mahajan, A. Acero and J. C. Platt,
“Hidden Conditional Random Fields for Phone
Classification”, INTERSPEECH, 2005
[8]X. He, R. Zemel, and M. A. Carreira-Perpinan,
“Multiscale conditional random fields for image
labeling”, Proceedings of IEEE Computer Society
Conference on Computer Vision and Pattern Recognition,
2004
[9]S. Khudanpur and J. Wu, “Maximum entropy techniques
for exploiting syntactic, semantic and collocational
dependencies in language modeling”, Computer Speech
and Language, vol. 14, pp. 355-372, 2000
[10]H.K. J. Kou and Y. Gao, “Maximum Entropy Direct
Models for Speech
Recognition”, IEEE Transactions on Audio, Speech, and
Language Processing, Vol. 14, No. 3, May 2006
[11]J. Lafferty, A. McCallum, F. Pereira, “Conditional
Random Fields: Probabilistic Models for Segmenting and
Labeling Sequence Data”, Proceeding of 18th
International Conference on Machine Learning, 2001
[12]J. Lafferty, X. Zhu and Y. Liu, “Kernel conditional
random fields: representation and clique selection”,
Proceedings of the Twenty-First International
Conference on Machine Learning, 2004
[13]A. Likhododev and Y. Gao, “Direct Models for Phoneme
Recognition”, Proceedings of IEEE international
Conference on Acoustics, Speech, and Signal
Processing, 2002
[14]Y. Liu, E. Shriberg, A. Stolcke, and M. Harper,
“Comparing HMM,Maximum Entropy, and Conditional Random
Fields for Disfluency Detection”, Proceedings of
Eurospeech, 2005
[15]W.W. Lok and K.L. Chan, “Model-Based Human Motion
Analysis in Monocular. Video”, Proceedings of IEEE
International Conference on Acoustics, Speech, and
Signal Processing, 2005
[16]C. Lu and N.J. Ferrier, “Repetitive motion analysis:
segmentation and event classification”,
Proceedings of IEEE Transactions on Pattern Analysis
and Machine Intelligence, pp. 258- 263, 2004
[17]W. Macherey and H. Ney, “A comparative study on
maximum entropy and discriminative training for
acoustic modeling in automatic speech recognition”,
Proceedings of European Conference on Speech
Communication and Technology (EUROSPEECH), vol. 1,
pp.493-496, 2003
[18]M. Mahajan, A. Gunawardana and A. Acero, “Training
algorithms for hidden conditional random fields”,
Proceedings of IEEE International Conference on
Acoustic, Speech and Signal Processing (ICASSP), vol.
1, pp. 273-276, 2006
[19]A. McCallum, D. Freitag and F. Pereira, “Maximum
Entropy Markov Models for Information Extraction and
Segmentation”, Proceedings of International
Conference on Machine Learning, 2000
[20]A. McCallum, “Efficiently Inducing Features of
Conditional Random Fields”, Proceedings ofthe 19th
Conference in Uncertainty in Articifical
Intelligence , 2003
[21]M. S. Nixon and J. N. Carter, “Advances in Automatic
Gait Recognition”, Proceedings of the Sixth IEEE
International Conference on Automatic Face and Gesture
Recognition, 2004
[22]N.M. Oliver, B. Rosario and A.P. Pentland, “A
Bayesian Computer Vision System for Modeling Human
Interactions”, IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol.22, pp. 831 - 843, 2000
[23]S. Della Pietra, V. Della Pietra and J. Lafferty,
“Inducing features of random field”, IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 19, no. 4, pp. 380-393, 1997
[24]S. Park and J.K. Aggarwal, “Event semantics in two-
person interactions”, Proceedings of the 17th
International Conference on Pattern Recognition, 2004
[25]S. Park and J.K. Aggarwal, “Segmentation and Tracking
of Interacting Human Body Parts Under Occlusion and
Shadowing”, Proceedings of IEEE Workshop on Motion
and Video Computing, 2002.
[26]M. M. Rahman and S. Ishikawa, “Robust appearance-
based human action recognition”, Proceedings of
International Conference on Pattern Recognition, 2004
[27]R.Rosales and S.Sclaroff, ”Inferring body pose
without tracking body parts”, Proceedings of
International Conference on Computer Vision and
Pattern Recognition, vol.2, pp. 721-727, 2000
[28]R. Rosenfeld, “A maximum entropy approach to adaptive
statistical language modeling”, Computer Speech and
Language, vol. 10, pp. 187-228, 1996
[29]S. Sakti, K. Markov and S. Nakamura, “Incorporating
Knowledge Sources Into a Statistical Acoustic Model
for Spoken Language Communication Systems”, IEEE
Transactions on Computers : Accepted for future
publication, 2007
[30]C. Sminchisescu, A. Kanaujia, Z. Li and D. Metaxas,
“Conditional Models for Contextual Human Motion
Recognition”, Proceedings of International Conference
on Computer Vision, 2005
[31]Y. Song, L. Goncalves and P. Perona, “Unsupervised
learning of human motion”, Proceedings of IEEE
Transactions on Pattern Analysis and Machine
Intelligence, pp.1–14, 2003
[32]C. Sutton, K. Rohanimanesh and A. McCallum, “Dynamic
Conditional Random Fields : Factorized Probabilistic
Models for Labeling and Segmenting Sequence Data”,
Proceeding of the 21st International Conference on
Machine Learning, 2004
[33]J. Tang, M. Hong, J. Li, and B. Liang, “Tree-
structured Conditional Radom Fields for Semantic
Annotation” ISWC06, 2006
[34]Simon Haykin, Jose C. Principe, Terrence J, Sejnowski
and John McWhirter, “New Directions in Statistical
Signal Processing From Systems to Brains”, MIT Press,
2005
[35]Y. Wang, Z.Q. Liu and L.Z. Zhou, "Learning
Hierarchical Non-parametric Hidden Markov Model of
Human Motion", Proceedings of the 4th International
Conference on Machine Learning and Cybernetics, 2005
[36]Y. Wang and Q. Ji, “A dynamic Conditional Random
Field Model for Object Segmentation in Image
Sequences”, Proceedings of the 2005 IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition, 2005
[37]S. B. Wang, A. Quattoni, L.P. Morency, D. Demirdjian
and T. Darrel, “Hidden Conditional Random Fields for
Gesture Recognition”, Proceedings of IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition, Vol. 2, pp.1521-1527, 2006
[38]Y. Wang and J. C. Rajapakse, “Contextual Modeling of
Functional MR Images With Conditional Random Fields”,
Proceedings of IEEE Transactions on Medical Imaging,
Vol. 25, No. 6, June 2006
[39]S. Wang, D. Schuurmans, F. Peng and Y. Zhao,
“Learning mixture models with the regularized latent
maximum entropy principle”, IEEE Transactions on
Neural Networks, vol. 15, no. 4, pp. 903-916, 2004
[40]J. Yang, Y. Xu and C.S.Chen, “Human action learning
via hidden Markov model”, IEEE Transactions on
Systems Man and Cybernetics, pp.34~44, 1997
[41]X. Zhang and F. Naghdy, “Human Motion Recognition
through Fuzzy Hidden Markov Model”, Proceedings of
International Conference on Computational Intelligence
for Modelling, Control and Automation, 2005
[42]CMU Graphics Lab Motion Capture
Database:http://mocap.cs.cmu.edu/
[43]IDIAP TwoHandManip database:
http://www.idiap.ch/resources/twohanded/
[44]Agnes Just, Sebastien Marcel, “Two-Handed Gesture
Recognition”, IDIAP Research Report, May 2005