簡易檢索 / 詳目顯示

研究生: 陳奕丞
Chen, Yi-Cheng
論文名稱: 基於語氣分析和語意內涵理解之情緒辨識系統
Emotion Recognition Based on Acoustic Analysis and Semantic Content Understanding
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 55
中文關鍵詞: 情緒辨識語意理解語氣韻律分析分類與回歸樹
外文關鍵詞: Emotion recognition, semantic understanding, acoustic analysis, classification tree and decision tree (CART)
相關次數: 點閱:117下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現行的情緒辨識技術可以分為三類:說話語氣分析、生理訊號檢測和文字相關分析。這些傳統的方法辨識率和操作便利性均低,其判斷情緒的方式亦較為主觀。在本篇論文中,我們試著以人類感受的角度來進行情緒的辨識,透過講話的內容和語氣以及多層次的評價來分析情緒類別,提供一個具有高辨識率且將四種基本情緒擴充成九種情緒的情緒辨識系統。
    在情緒語意偵測部份,透過事先已建好的語意情緒關鍵詞進行辨識說話者情緒的類別。另外;在找不到關鍵詞的情況下;找尋與該關鍵詞一同出現的情境詞進行語意的分析。在情緒語氣偵測的部份,藉由聲學上的差異,進而分析語音的音色及韻律並找出聲學特徵,建立情緒語氣模型來進行辨識。
    使用分類與回歸樹,依照多層次的評價將情緒做歸類,這種分析方式和心理學家的情緒定義相同,可增加辨識類別數並更接近實際語者的內心感受,最後進行基於語音和語意的九種情緒辨識,實驗結果顯示比只使用語意分析提昇6.6%的辨識率,並降低了在語者獨立時語氣辨識率較低的缺陷。

    Conventional subjective emotion recognition (ER) technologies can be classified into three methods: prosody detection, physiological change detection, and semantic content analysis. However, these ER methods’ performances are still not good enough for correct and convenient speaker independent emotion recognition. Based on several human evolutions, this thesis proposes an ER system that combines prosodic variations and semantic content analysis to enhance accuracy and increase the multiplicity of emotion categories from four basic emotions to nine detail emotions.
    In the emotional semantic content analysis, we adopt predefined semantic keywords to recognize the emotions. Besides, while no emotion keywords can be found in speech utterance, the proposed system tries to find the nearest keywords within the same sentence by emotion semantic understanding. In acoustic analysis, 5 kinds of acoustic features are utilized to simulate realistic human emotions, and be transformed into speaker independent emotion models. Then we adopt the classification and regression tree to categorize the semantic contents and prosodic variations data to the emotion states.
    The proposed algorithm of the emotion definition is similar to the psychologists’ manners and thus we can increase the recognition rate and find out the relationship between emotion categories and psychological feeling. The experimental results show that proposed system can improve accuracy rate about 6.6% higher than pervious system that only uses semantic content analysis. Moreover, this improvement is also a benefit for speaker independent ER.

    中文摘要 III Abstract IV CONTENT VI Figure List VIII Table List VIII Chapter 1. Introduction 1 1.1. Background and Motivation 1 1.2. Thesis of Objective 2 1.3. Thesis Organization 3 Chapter 2. Related Work 4 2.1. Overview of Emotion Recognition 4 2.2. Emotional Speech Database 6 2.3. Emotion Feature Extraction from Speech Signal 6 2.4. Speech Emotion Recognition Method Based on Prosodic Analysis 7 2.5. Speech Emotion Recognition Method Based on Semantic Content Analysis 8 Chapter 3. Emotion Recognition Based on Acoustic Analysis and Semantic Content Understanding 10 3.1. Introduction of the Framework of the Proposed System 12 3.2. Features Extraction and Discriminative Analysis 13 3.2.1. Pitch-Related Features 13 3.2.2. Intensity-Related Features 15 3.2.3 Linear Predictive Cepstral Coefficients 16 3.2.4 Mel-Frequency Cepstral Coefficients 17 3.2.5. Spectrum-Related Features 18 3.2.6 Discriminative Analysis 20 3.3. Acoustic Features Analysis by AdaBoost Algorithm and Training 24 3.3.1. AdaBoost Algorithm Overview 24 3.3.2. Weak Classifiers 26 3.3.3 Training Part of the Acoustic Variation Analysis 27 3.4. Semantic Content Extraction and Training 30 3.4.1. Semantic Content Analysis Processing 31 3.4.2. Related analysis on Emotion Keyword and Context Word 32 3.4.3. Construction of Concept Network 33 3.5. Emotional Content Training by CART Algorithm 36 3.5.1. Introduce of CART Algorithm 36 3.5.2. The Prototype Construction of Question Set 38 3.6. Final Emotion State Determination in Recognition Phase 39 Chapter 4. Experimental Setup and Results 42 4.1. Experimental Setup 42 4.2. Introduction of the database 42 4.2.1. Stimulus-Induced Corpus 42 4.2.2. Emotion Sentence Database 44 4.2.3. Subjective Test 45 4.3. Experimental Results 48 4.3.1. Experimental on Acoustic Feature Extraction with AdaBoost Algorithm 48 4.3.2. Emotion Recognition Results obtained from Textual Content 49 4.3.3. Emotion Recognition Results obtained from integrated system 51 Chapter 5. Conclusion and Future Work 53 References 54

    [1] G. Tzanetakis and P. Cook, "Musical Genre Classification of Audio Signals," IEEE Tractions on Speech and Audio Processing, vol. Vol. 10, 2002.
    [2] Y.-C. Lin, "Emotion Classification System based on Semantic Content Analysis." Master thesis, NCKU, 2003.
    [3] Z. J. Chuang and C. H. Wu, "Emotion Recognition using Acoustic Features and Textual Content," in Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004, pp. pp. 53-56.
    [4] Z. J. Chuang and C. H. Wu, "Multi-Modal Emotion Recognition from Speech and Text," in International Journal of Computational Linguistics and Chinese Language Processing, 2004, pp. pp. 45-62.
    [5] C. M. Lee and S. Narayanan, "Toward Detecting Emotions in Spoken Dialogs," IEEE Tractions on Speech and Audio Processing, vol. Vol. 13, 2005.
    [6] N. Cook, T. Fujisawa, and K. Takami, "Evaluation of the Affective Valence of Speech Using Pitch Substructure," IEEE Transaction on Audio, Speech, and Language Processing, vol. Vol.14, 2006.
    [7] J. Tao, Y. Kang, and A. Li, "Prosody Conversion From Neutral Speech to Emotional Speech," IEEE Transactions on Audio, Speech and Language Processing, vol. Vol.14, 2006.
    [8] S.-H. Chen, "Voice Activity Detection and Keyword Spotting System on Embedded Platform," Master thesis, NCKU, 2007.
    [9] [9] Y.-P. P. C. Jia Rong, Morshed Chowdhury, Gang Li, "Acoustic Features Extraction for Emotion Recognition," in IEEE/ACIS International Conference on Computer and Information Science, 2007.
    [10] E. D. Xiao, Weibei Dou, Liming Chen, "Automatic Hierarchical Classification of Emotional Speech," in Ninth IEEE International Symposium on Multimedia, 2007.
    [11] S. Casale, A. Russo, G. Scebba, and S. Serrano, "Speech Emotion Classification Using Machine Learning Algorithms," in The IEEE International Conference on Semantic Computing 2008, pp. pp. 158-165.
    [12] C. Busso, S. Lee, and S. Narayanan, "Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection," IEEE Transactions on Audio, Speech, and Language Processing, vol. Vol. 17, 2009.
    [13] A. M. Emily Mower, Chi-Chun Lee, Abe Kazemzadeh, Carlos Busso, Sungbok Lee Shrikanth Narayanan, "Interpreting ambiguous emotional exressions," in 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, 2009.
    [14] E. H. Kim, K. H. Hyun, S. H. Kim, and Y. K. Kwak, "Improved Emotion Recognition With a Novel Speaker-Independent Feature," IEEE/ASME Transactions on Mechatronics, vol. Vol. 14, 2009.
    [15] J.-S. Park, J.-H. Kim, and Yung-Hwan, "Feature Vector Classification based Speech Emotion Recognition," IEEE Transactions on Consumer Electronics, vol. 55, 2009.
    [16] M. B. Serdar Yildirim, Chul Min Lee, Abe Kazemzadeh, Carlos Busso, Zhigang Deng, Sungbok Lee, Shrikanth Narayanan, "An acoustic study of emotions expressed in speech," in International conference on Spoken Language Processing Jeju island, 2004.
    [17] P. a. A. H. Subasic, "Affect Analysis of Text Using Fuzzy Semantic Typing," IEEE Transactions on Fuzzy system, vol. 9, pp. 483-496, 2001.
    [18] R.-T. Chen, "Semi-Automatic Construction of Affection Ontology and Its Application on Emotion Identification," master thesis, NCKU, 2005.
    [19] Z.-J. C. Chung-Hsien Wu, Yu-Chung Lin, "Emotion Recognition from Text Using Semantic Labels and Separable Mixture Models," in ACM Transcations on Asian Language Information Processing, vol. 5, 2006, pp. 165-182.
    [20] R. Timofeev, "Classification and Regression Trees Theory and Applications," in Berlin. Berlin, 2004.
    [21] Y.-C. L. Yi-Hsuan Yang, Ya-Fan Su, and Homer H. Chen, "A Regression Approach to Music Emotion Recognition," IEEE Transaction on Audio, Speech, and Language, vol. 16, 2008.

    無法下載圖示 校內:2020-01-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE