簡易檢索 / 詳目顯示

研究生: 蘇建源
Su, Chien-Yuan
論文名稱: 條件式隨機域之圖形處理於人類動作辨識之應用
Graphical Modeling of Conditional Random Field for Human Motion Recognition
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 96
中文關鍵詞: 動作辨識條件式隨機域圖形模型隱藏式馬可夫模型
外文關鍵詞: graphical model, conditional random field, hidden Markov model, motion recognition
相關次數: 點閱:95下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於人類運動中不同動作常同時包含類似的基本動作單元或是動作之間的移轉在時間點上會有意義含糊和重疊現象,所以人類動作辨識在電腦視覺技術上是複雜且具挑戰的研究。文獻中通常使用隱藏式馬可夫(HMM)或其變形的模型來模型化人類運動,然而這種模型在訓練與辨識時必須假定在時間上動作序列的資料彼此是獨立。但事實上相似的動作單元經常發生在不同時間段上且通常存在著長距離的相依性,因此本篇論文以條件式隨機域(CRF)為出發點並以圖形模型理論的觀點來解釋這些相依性。我們以圖形理論中的聯合樹(junction tree)將彼此存在相關性的變數結合成一個最大子圖(clique)。其目的是將具有迴圈的CRF圖形架構建立成一個子圖樹來確保簡單的樹狀推論演算法可以用來推論總體的聯合機率。我們將這樣的架構建立在HMM的基礎上,用來在無人工切割的動作序列中自動取得狀態分割與適合連續變數的特徵函式。我們將所提出的圖形模型化條件式隨機域(GMCRF)實現在CMU肢體動作資料庫與IDIAP手勢資料庫並與一般CRF跟HMM比較。在同時考慮三個連續時間點資料之相依性情形下,我們得到GMCRF優於其他兩者的實驗結果。此後我們將平衡考量參數量與圖形的複雜程度並考慮更多時間點的相依性來觀察是否可得到更高的人體動作辨識正確率。

    Human motion recognition is a challenging topic in computer vision areas. In the literature, the hidden Markov model (HMM) and its extensions have been widely developed for modeling human motions. HMM methods have the assumption that the sequence of observations is mutually independent in temporal domain. But, in real-world applications, similar motions often occur at various time moments. The long-term dependences between observations should be modeled to improve motion recognition performance. For this reason, the conditional random field (CRF) has been applied for large-span modeling of mutually dependent observations. However, the exact implementation of CRF is computationally expensive. We present the graphical modeling approaches to rapid CRF implementation. Specifically, we employ the junction tree algorithm to deal with the complex CRF structure with loops. We integrate the variables which are dependent into a maximum clique and build a junction tree. The loopy CRF structure is then transformed to the clique tree. Using this procedure, a tree inference algorithm is presented for inference of the joint probability for all variables. In implementation of CRF, we specify the continuous-valued HMM parameters as the feature functions. In the experiments, we evaluate the proposed Graphical Modeling of CRF (GMCRF) for human motion recognition using CMU Graphics Lab Motion Capture Database and IDIAP TwoHandManip Database. The preliminary results show that GMCRF outperforms HMM and the linear-chain structure CRF in terms of recognition rates.

    第一章 緒論 1 1.1 前言 1 1.2 建構運動辨識系統相關研究 1 1.3 運動辨識的特徵擷取相關研究 2 1.4 建立運動辨識模型相關研究 3 1.5 論文目的與架構 5 第二章 相關研究探討 6 2.1 最大化熵馬可夫模型 (Maximum Entropy Markov Models, MEMMs) 6 2.2 條件式隨機域(Conditional Random Field, CRF) 8 2.3 不同圖形架構之CRF 11 2.4 運用CRF在辨識人體動作上的研究 14 2.5聯合樹(Junction Tree)介紹與文獻上應用 16 第三章 條件式隨機域之圖形模型建立 21 3.1 以圖型模型觀點探討具有迴圈圖形式的CRF 21 3.2圖形模型化條件式隨機域與特徵函數(feature function)的定義 27 3.2.1定義特徵函數 27 3.2.2 條件式隨機域模型訓練 29 3.3 TOP-N 實現方法 32 第四章 實驗 34 4.1實驗設定 34 4.1.1影像資料庫 34 4.1.2影像特徵擷取 40 4.1.3動作模型設定 43 4.1.4系統架構 45 4.2實驗結果 48 4.2.1 CMU Graphics Lab Motion Capture Database 48 4.2.2 IDIAP TwoHandManip Database 72 第五章 結論與未來工作 89 5.1 結論 89 5.2 未來工作 90 參考文獻 91

    [1]J.K. Aggarwal, S. Park, “Human Motion: Modeling and
    Recognition of Actions and Interactions”, Proceedings
    of the 2nd international Symposium on 3D Data
    Processing, Visualization, and Transmission ,2004
    [2]R. V. Babu and K.R. Ramakrishnan, “Compressed Domain
    Human Motion Recognition using Motion History
    Information”, Proceedings of 2003 International
    Conference on Image Processing, (ICIP03), Vol.2, pp.
    321-324, 2003
    [3]A. Berger, S. D. Pietra and V. D. Pietra, “A maximum
    entropy approach to natural language processing”,
    Computational Linguistics, vol. 22, no. 1, pp. 39-71,
    1996
    [4]R. Cucchiara, C. Grana, G.Tardini and R.Vezzani,
    “Probabilistic People Tracking for Occlusion
    Handling”, Proceedings of International Conference on
    Pattern Recognition, 2004
    [5]C. Fanti. L. Zelnik-Manor. P. Perona, “Hybrid Models
    for Human Motion Recognition”, Proceedings of
    International Conference on Computer Vision and Pattern
    Recognition, 2005
    [6]S. Feng, R. Manmatha and A. McCallum, “Exploring the
    Use of Conditional Random Field Models and HMMs for
    Historical Handwritten Document Recognition”,
    Proceedings of the Second International Conference on
    Document Image Analysis for Libraries, 2006
    [7]A. Gunawardana, M. Mahajan, A. Acero and J. C. Platt,
    “Hidden Conditional Random Fields for Phone
    Classification”, INTERSPEECH, 2005
    [8]X. He, R. Zemel, and M. A. Carreira-Perpinan,
    “Multiscale conditional random fields for image
    labeling”, Proceedings of IEEE Computer Society
    Conference on Computer Vision and Pattern Recognition,
    2004
    [9]S. Khudanpur and J. Wu, “Maximum entropy techniques
    for exploiting syntactic, semantic and collocational
    dependencies in language modeling”, Computer Speech
    and Language, vol. 14, pp. 355-372, 2000
    [10]H.K. J. Kou and Y. Gao, “Maximum Entropy Direct
    Models for Speech
    Recognition”, IEEE Transactions on Audio, Speech, and
    Language Processing, Vol. 14, No. 3, May 2006
    [11]J. Lafferty, A. McCallum, F. Pereira, “Conditional
    Random Fields: Probabilistic Models for Segmenting and
    Labeling Sequence Data”, Proceeding of 18th
    International Conference on Machine Learning, 2001
    [12]J. Lafferty, X. Zhu and Y. Liu, “Kernel conditional
    random fields: representation and clique selection”,
    Proceedings of the Twenty-First International
    Conference on Machine Learning, 2004
    [13]A. Likhododev and Y. Gao, “Direct Models for Phoneme
    Recognition”, Proceedings of IEEE international
    Conference on Acoustics, Speech, and Signal
    Processing, 2002
    [14]Y. Liu, E. Shriberg, A. Stolcke, and M. Harper,
    “Comparing HMM,Maximum Entropy, and Conditional Random
    Fields for Disfluency Detection”, Proceedings of
    Eurospeech, 2005
    [15]W.W. Lok and K.L. Chan, “Model-Based Human Motion
    Analysis in Monocular. Video”, Proceedings of IEEE
    International Conference on Acoustics, Speech, and
    Signal Processing, 2005
    [16]C. Lu and N.J. Ferrier, “Repetitive motion analysis:
    segmentation and event classification”,
    Proceedings of IEEE Transactions on Pattern Analysis
    and Machine Intelligence, pp. 258- 263, 2004
    [17]W. Macherey and H. Ney, “A comparative study on
    maximum entropy and discriminative training for
    acoustic modeling in automatic speech recognition”,
    Proceedings of European Conference on Speech
    Communication and Technology (EUROSPEECH), vol. 1,
    pp.493-496, 2003
    [18]M. Mahajan, A. Gunawardana and A. Acero, “Training
    algorithms for hidden conditional random fields”,
    Proceedings of IEEE International Conference on
    Acoustic, Speech and Signal Processing (ICASSP), vol.
    1, pp. 273-276, 2006
    [19]A. McCallum, D. Freitag and F. Pereira, “Maximum
    Entropy Markov Models for Information Extraction and
    Segmentation”, Proceedings of International
    Conference on Machine Learning, 2000
    [20]A. McCallum, “Efficiently Inducing Features of
    Conditional Random Fields”, Proceedings ofthe 19th
    Conference in Uncertainty in Articifical
    Intelligence , 2003
    [21]M. S. Nixon and J. N. Carter, “Advances in Automatic
    Gait Recognition”, Proceedings of the Sixth IEEE
    International Conference on Automatic Face and Gesture
    Recognition, 2004
    [22]N.M. Oliver, B. Rosario and A.P. Pentland, “A
    Bayesian Computer Vision System for Modeling Human
    Interactions”, IEEE Transactions on Pattern Analysis
    and Machine Intelligence, Vol.22, pp. 831 - 843, 2000
    [23]S. Della Pietra, V. Della Pietra and J. Lafferty,
    “Inducing features of random field”, IEEE
    Transactions on Pattern Analysis and Machine
    Intelligence, vol. 19, no. 4, pp. 380-393, 1997
    [24]S. Park and J.K. Aggarwal, “Event semantics in two-
    person interactions”, Proceedings of the 17th
    International Conference on Pattern Recognition, 2004
    [25]S. Park and J.K. Aggarwal, “Segmentation and Tracking
    of Interacting Human Body Parts Under Occlusion and
    Shadowing”, Proceedings of IEEE Workshop on Motion
    and Video Computing, 2002.
    [26]M. M. Rahman and S. Ishikawa, “Robust appearance-
    based human action recognition”, Proceedings of
    International Conference on Pattern Recognition, 2004
    [27]R.Rosales and S.Sclaroff, ”Inferring body pose
    without tracking body parts”, Proceedings of
    International Conference on Computer Vision and
    Pattern Recognition, vol.2, pp. 721-727, 2000
    [28]R. Rosenfeld, “A maximum entropy approach to adaptive
    statistical language modeling”, Computer Speech and
    Language, vol. 10, pp. 187-228, 1996
    [29]S. Sakti, K. Markov and S. Nakamura, “Incorporating
    Knowledge Sources Into a Statistical Acoustic Model
    for Spoken Language Communication Systems”, IEEE
    Transactions on Computers : Accepted for future
    publication, 2007
    [30]C. Sminchisescu, A. Kanaujia, Z. Li and D. Metaxas,
    “Conditional Models for Contextual Human Motion
    Recognition”, Proceedings of International Conference
    on Computer Vision, 2005
    [31]Y. Song, L. Goncalves and P. Perona, “Unsupervised
    learning of human motion”, Proceedings of IEEE
    Transactions on Pattern Analysis and Machine
    Intelligence, pp.1–14, 2003
    [32]C. Sutton, K. Rohanimanesh and A. McCallum, “Dynamic
    Conditional Random Fields : Factorized Probabilistic
    Models for Labeling and Segmenting Sequence Data”,
    Proceeding of the 21st International Conference on
    Machine Learning, 2004
    [33]J. Tang, M. Hong, J. Li, and B. Liang, “Tree-
    structured Conditional Radom Fields for Semantic
    Annotation” ISWC06, 2006
    [34]Simon Haykin, Jose C. Principe, Terrence J, Sejnowski
    and John McWhirter, “New Directions in Statistical
    Signal Processing From Systems to Brains”, MIT Press,
    2005
    [35]Y. Wang, Z.Q. Liu and L.Z. Zhou, "Learning
    Hierarchical Non-parametric Hidden Markov Model of
    Human Motion", Proceedings of the 4th International
    Conference on Machine Learning and Cybernetics, 2005
    [36]Y. Wang and Q. Ji, “A dynamic Conditional Random
    Field Model for Object Segmentation in Image
    Sequences”, Proceedings of the 2005 IEEE Computer
    Society Conference on Computer Vision and Pattern
    Recognition, 2005
    [37]S. B. Wang, A. Quattoni, L.P. Morency, D. Demirdjian
    and T. Darrel, “Hidden Conditional Random Fields for
    Gesture Recognition”, Proceedings of IEEE Computer
    Society Conference on Computer Vision and Pattern
    Recognition, Vol. 2, pp.1521-1527, 2006
    [38]Y. Wang and J. C. Rajapakse, “Contextual Modeling of
    Functional MR Images With Conditional Random Fields”,
    Proceedings of IEEE Transactions on Medical Imaging,
    Vol. 25, No. 6, June 2006
    [39]S. Wang, D. Schuurmans, F. Peng and Y. Zhao,
    “Learning mixture models with the regularized latent
    maximum entropy principle”, IEEE Transactions on
    Neural Networks, vol. 15, no. 4, pp. 903-916, 2004
    [40]J. Yang, Y. Xu and C.S.Chen, “Human action learning
    via hidden Markov model”, IEEE Transactions on
    Systems Man and Cybernetics, pp.34~44, 1997
    [41]X. Zhang and F. Naghdy, “Human Motion Recognition
    through Fuzzy Hidden Markov Model”, Proceedings of
    International Conference on Computational Intelligence
    for Modelling, Control and Automation, 2005
    [42]CMU Graphics Lab Motion Capture
    Database:http://mocap.cs.cmu.edu/
    [43]IDIAP TwoHandManip database:
    http://www.idiap.ch/resources/twohanded/
    [44]Agnes Just, Sebastien Marcel, “Two-Handed Gesture
    Recognition”, IDIAP Research Report, May 2005

    下載圖示 校內:立即公開
    校外:2007-09-14公開
    QR CODE