| 研究生: | 劉家瑞 Liu, Chia-Jui | 
|---|---|
| 論文名稱: | 結合隱藏式馬可夫模型與分類器錯誤加權分類方法於語音及視覺情緒辨識 Audio-Visual Emotion Recognition Based on Hidden Markov Model and Error-Weighted Classifier Combination | 
| 指導教授: | 吳宗憲 Wu, Chung-Hsien | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering | 
| 論文出版年: | 2010 | 
| 畢業學年度: | 98 | 
| 語文別: | 中文 | 
| 論文頁數: | 44 | 
| 中文關鍵詞: | 隱藏式馬可夫模型 、多模態模型 、分類器錯誤加權分類方法 | 
| 外文關鍵詞: | Hidden Markov Model, Multimodal Fusion, Error Weighted | 
| 相關次數: | 點閱:115 下載:1 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
隨著電腦科技的日益精進,電腦已漸漸融入人類的日常生活當中,因此建立一個具有智慧的人性化人機溝通介面就成為重要的研究方向;要讓電腦具有智慧,其中一個重要的課題就是了解人類的情緒;目前的研究,包括建立單一情緒辨識模組和混合式多模態架構,辨識模型主要採取支撐向量機、高斯混和模型以及隱藏式馬可夫模型,或是透過在特徵、決策或是模型上的融合方法提升情緒辨識之準確性。
在本論文中,主要是探討影響情緒辨識之語音及臉部表情的特性,並建立具有效能之多模態模型融合架構;其特定的研究目標為建立出一個新式的多模態模型融合策略於情緒辨識之架構上,除了考慮多模態訊號之間的相互關係外,例如:語音及影像,更進一步評估各個訊號之間特徵序列組合,對於推論各種情緒類別的貢獻程度,藉此達到更好的辨識效率;並且在各自訊號中的辨識模型採取隱藏式馬可夫模型,對於在訊號上情緒變化的連續時間性也納入考慮,來提升整體辨識的效能。
實驗部份,同時錄製情緒語音及其臉部表情影像的平行語料一共720句(3人),360句做為訓練,另180句做為測試;實驗結果顯示,本論文所提出之方法其情緒辨識率可達81.67%,在實際應用於情緒辨識上的確是有效的。
With the trend of computer technology, computers have come into human’s daily life gradually. For this reason, human machine interface with intelligence and humanity become an important research issue. Human emotion recognition is one of the critical topics. Recent research on emotion recognition includes the construction of a single emotion recognizer using speech or facial expression, and fused bimodal architecture. The recognition models include support vector machine, Gaussian mixture model and hidden Markov model, etc. Or promote the accuracy of emotion recognition by feature, decision, model level fusion.
In this paper, we investigate the features of speech and facial expression that affect the emotion recognition, and establish an effective multimodal fusion recognition model. More specially, this study focuses on establishing a novel multimodal fusion strategy, not only consider the correlation between the streams of multimodal, like speech and image, but also estimate the contribution of different feature pairs for recognition simultaneously, to obtain a better performance. The recognition models of each stream apply hidden Markov model, it promotion the performance by considering the time correlation of the emotion state change for the streams.
In order to evaluate our proposed approach, 720 emotion utterances and corresponding facial expressions were collected (3 persons). 50% for training, 25% for testing. The emotion recognition accuracy is 81.67 %. The experimental results show that our proposed method and architecture seems outperformed the traditional approaches in emotion recognition.
參考文獻                                         
[1]	A. Mehrabian. Communication without words. Psychology Today, 2:53–56, 1968.
[2]	Mehrabian and S.R. Ferris, Inference of attitude from nonverbal communication in two channels, Journal of Counseling Psychology 31 (3) (1967), pp. 248–252.
[3]	Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang, “A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,” Proc. Ninth ACM Int'l Conf. Multimodal Interfaces (ICMI '07), pp. 126-133, 2007.
[4]	A. Azcarate, F. Hageloh, K.v.d. Sande, R. Valenti, “Automatic facial emotion recognition. Universiteit van Amsterdam,” June 2005.
[5]	Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., & Kollias, S. (2005). Emotion recognition through facial expression analysis based on a neurofuzzy network. Special Issue on Emotion: Understanding & Recognition, Neural Networks, 18(4), 423–435.
[6]	I. Cohen, A. Garg, T.S. Huang, Emotion recognition from facial expressions using multilevel HMM, Neural Inf. Process. Syst. (2000)
[7]	P. Ekman and W. Friesen. Facial action coding system (FACS): Manual. Palo Alto: Consulting Psychologists Press, 1978.
[8]	Ying-li Tian, Takeo Kande, Jeffrey F. Cohn, ”Robost Lip Tracking by Combining Shape, Color and Motion”. Proc.Asian Conf. Computer Vision, pp.1040~1045,2000.
[9]	Ying-li Tian, Takeo Kande, Jeffrey F. Cohn, ”Recognizing Action Units for Facial Expression Analysis”. IEEE Transaction on Pattern Analysis and Machine Intelligence, pp. 97~115,2001.
[10]	M. Pantic and L. J. Rothkrantz. Automatic analysis of facial expressions: The state of the art. IEEE Transactions On Pattern Analysis And Machine Intelligence, 22(12):1424–1445, December 2000.
[11]	M. Pantic and L. J. M. Rothkrantz. An expert system for recognition of facial actions and their intensity. In Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, 2000.
[12]	Kwon, O. W., Chan, K., Hao, J., and Lee, T. W. 2003. Emotion recognition by speech signals. In Proceedings of the 8th European Conference on Speech Communication and Technology. 125--128.
[13]	C.D. Park and K.B. Sim, “Emotion Recognition and Acoustic Analysis from Speech Signal,” Proceedings of IJCNN, pp. 254-259, 2003.
[14]	Schuller, B.; Rigoll, G.; Lang, M.: “Hidden Markov Model-Based Speech Emotion Recognition,” Proc. ICASSP 2003, Vol. II, Hong Kong, China, pp. 1-4, 2003.
[15]	S. Emerich, E. Lupu, A. Apatean  “Bimodal Approach in Emotion Recognition using Speech and Facial Expressions” 9-th IEEE International Symposium on Signals, Circuits and Systems,  ISSCS’ 2009, Iasi,  9-10 July 2009, Vol.2,  pp.297-300.
[16]	Angeliki Metallinou , Sungbok Lee , Shrikanth Narayanan, Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice, Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia, p.250-257, December 15-17, 2008.
[17]	C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A.Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, “Analysis of emotion recognition using facial expressions speech and multimodal information,” in Int. Conf. Multimodal Interfaces, 2004, pp. 205–211.
[18]	H. J. Go, K. C. Kwak, D. J. Lee, and M. G. Chun, “Emotion recognition from facial image and speech signal,” in Int. Conf. Soc. Instrument and Control Engineers, 2003, pp. 2890–2895.
[19]	S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll, “Bimodal fusion of emotional data in an automotive environment,” in ICASSP, 2005, vol. II, pp. 1085–1088.
[20]	A. Kapoor, R.W. Picard and Y. Ivanov, “Probabilistic combination of multiple modalities to detect interest,” Proc. Int. Conf. Pattern Recogn., vol. 3, pp. 969-972, 2004.
[21]	Y. Ivanov, T. Serre, and J. Bouvrie. Error weighted classifier combination for multi-modal human identification. Technical Report CBCL Paper 258, Massachusetts Institute of Technology, Cambridge, MA, 2005.
[22]	 M. Song, J. Bu, C. Chen, and N. Li, “Audio-visual based emotion      recognition—A new approach,” in Int. Conf. Computer Vision and     Pattern Recognition, 2004, pp. 1020–1025.
[23]	M. Brand, N. Oliver, and A. Pentland, “Coupled Hidden Markov Models for Complex Action Recognition,” Proc. IEEE, 1997.
[24]	H. Pan, S. Levinson, T. S. Huang, and Z. P. Liang, “A fused hidden    Markov model with application to bimodal speech processing,” IEEE Trans. Signal Process., vol. 52, no. 3, pp. 573–581, March 2004.
[25]	P. Niyogi, E. Petajan, and J. Zhong, "Feature Based Representation for Audio-Visual Speech Recognition", Proceedings of the Audio Visual Speech Conference, Santa Cruz, CA, 1999.
[26]	C.-H. Wu, Z.-J. Chuang, and Y.-C. Lin, “Emotion Recognition from Text Using Semantic Label and Separable Mixture Model,” ACM Trans. on Asian Language Information Processing, Vol. 5, No. 2, pp. 165-182, June 2006.
[27]	T.F. Cootes, G.J. Edwards and C.J. Taylor, “Active appearance models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685 June 2001.
[28]	G.J. Edwards, T.F. Cootes, and C.J. Taylor, “Face Recognition Using Active Appearance Models,” Proc. European Conf. Computer Vision, pp. 582-595, June 1998.
[29]	Yeongjae Cheon , Daijin Kim, Neutral facial expression recognition using differential-AAM and manifold learning, Pattern Recognition, v.42 n.7, p.1340-1350, July, 2009.
[30]	吳明川, 林晉頤, “基於AAM自動化人臉表情辨識系統之研究”, 數位科技與創新管理研討會, 2008