| 研究生: |
張柏雄 Chang, Bor-Hsiung |
|---|---|
| 論文名稱: |
中文語音情緒之自動辨識 Automated Recognition of Emotion in Mandarin |
| 指導教授: |
周榮華
Chou, Jung-Hua |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2002 |
| 畢業學年度: | 90 |
| 語文別: | 中文 |
| 論文頁數: | 65 |
| 中文關鍵詞: | 非線性頻率轉換 、情緒語音辨識 、Mel頻率刻度 、動態扭曲演算法 、LBG演算法 、標準參考樣本 |
| 外文關鍵詞: | dynamic time warping, LBG, reference pattern, emotion recognizer, mel frequency |
| 相關次數: | 點閱:90 下載:13 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本文提出一個情緒語音辨識系統。情緒的分類包含了正常 (Normal)、生氣(Anger)、厭煩(Boredom)、快樂(Happiness)和悲傷(Sadness)。一段情緒語音信號經傅利葉轉換之後所求得之頻譜參數,透過非線性濾波器組,轉換成以Mel 頻率刻度為基礎之能量參數向量,再經LBG 演算法量化成固定長度之特徵向量,並經修正型強健演算法,訓練出各個情緒語音特徵向量。利用中文語音以母音為基礎之特性,對訓練所得到的特徵向量做情緒特徵之強化,以作為在不同情緒下之語音標準參考樣本(Standard Reference Patterns),接著利用動態扭曲演算法,計算出測試樣本與參考樣本間之最小距離,以其能達到情緒精確辨識之結果。本論文中所使用的資料庫,由兩位女性語者在不同情緒下,表達出12 種不同字數的句型,一共有591 句。
經實驗分析結果,本系統之平均辨識率可達約50﹪,兩個語者之平
均辨識率分別為51﹪和46﹪。
This study proposes an emotion recognizer in Mandarin speech. Five human emotions embedded in speech including normal, anger, boredom, happiness and sadness are investigated. The speech spectrum was calculated using FFT first. Then, a set of 19 Mel scaled filter banks was applied to the FFT power spectrum. The feature vector based on Mel frequency power coefficients was extracted. Afterwards, the vector for each speech frame was assigned to a cluster by vector quantization. The vector quantizer design based on LBG algorithm was adopted. A modified robust training method was used to train the emotion-specific reference patterns. In addition, enhanced emotion features for each reference patterns were performed. Finally, the minimum distance between reference patterns and test pattern was computed by DTW (Dynamic time warping) method to obtain the recognition result. The corpus consists of 591 emotional utterances from two female speakers.
The results show that the emotion patterns can be recognized fairly. A total average accuracy of 50% is achieved. More specifically, an average accuracy of 51% and 46% are obtained for two female speakers respectively.
參考文獻
[1] Cannon, W.B., “Again the James-Lange theory of emotion: a critical examination and an alternative theory”, Am. J. Psychol., 39,106-24, 1931.
[2] Strongman K.T.著,游恆山譯,「情緒心理學」,五洲發行, 文笙總經銷,民76,台北巿。
[3] Cornelius R.R., The science of emotion. Research and tradition in the psychology of emotion Upper Saddle River (NJ): Prentice-Hall, 1996.
[4] Cornelius R.R., “THEORETICAL APPROACHES TO EMOTION ”, ISCA Workshop on Speech and Emotion, Vassar College, Poughkeepsie, NY USA, 2000.
[5] Picard R.W., Vyzas E., and Healey J., “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, no. 10, October 2002.
[6] Pereira C., “Dimension of emotional meaning in speech”, ISCA Workshop on Speech and Emotion, Speech, Hearing and Language Research Centre, Macquarie University, Australia, 2000.
[7] Davitz, J.R., “Auditory correlates of vocal expression of emotional feeling. In the communication of emotional meaning”, New York: McGraw-Hill, 1964.
[8] Mozziconacci, S.J.L, “Speech variability and emotion: Production and perception.” Ph.D. thesis, indhoven, The Netherlands, 1998.
[9] Iida A., Campbell N., Iga S., Higuchi F., Higuchi F., and Yasumura M., “A Speech Synthesis System with Emotion for Assisting Communication”, ISCA Workshop on Speech and Emotion, Keio Research Institute at SFC, Keio University, ATR Information Sciences Division, 2000.
[10] Paeschke A., and Sendlmeier W. F., ”Prosodic Characteristics of Emotional Speech: Measurement of Fundamental Frequency Movements” ISCA Workshop on Speech and Emotion, Technical University Berlin, Germany, 2000.
[11] Amir N., Ron S., and Laor N., “Analysis of an emotional speech cprpus in Hebrew based on objective criteria”, ISCA Workshop on Speech and Emotion, Holon Academic Institute of Technology, Holon, Israel, 2000.
[12] Roach P., “Techniques for the Phonetic Description of
Emotional Speech”, ICSA Workshop on Speech and Emotion, School of Linguistics and Applied Language Studies, University of Reading, U.K., 2000.
[13] Nicholson J., Takahashi K., and Nakatsu R., “Emotion recognition in speech using neural networks”, ATR Media Integration & Communications Research Lab Neural Information Processing, 1999. Proceedings. ICONIP '99. 6th
International Conference on, Volume: 2, 1999.
[14] Yamada T., Hashimoto H., and Tosa N., “Pattern recognition of emotion with Neural Network”, Proceedings of the 1995 IEEE IECON 21st International Conference on, Volume: 1, 1995.
[15] Polzin T.S., “Detecting Verbal and Non-verbal Cues in the Communication of Emotion” Ph.D. thesis, Carnegie Mellon University, 1998.
[16] Fukuda S., and Kostov V., ”Extracting emotion from voice”, IEEE International Conference on Systems, Man, and Cs, 1999.
[17] Nwe T.L., and Wei F.S., ”Speech Based Emotion Classification”, Electrical and Electronic Technology, 2001. TENCON. Proceedings of IEEE Region 10 International Conference on, Volume: 1, 2001.
[18] Sato J., and Morishima S., “Emotion modeling in speech production using emotion space”, Faculty of Engineering, Seikei University, IEEE International Workshop on Robot and Human Communication, 1996.
[19] Hanson J.H.L., “Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect”, IEEE Trans. Speech Audio Processing, vol.2, Oct. 1994.
[20] Zhou G., Hansen H.L., and Kaiser J.F., “Nonlinear Feature Based Classification of Speech Under Stress”, IEEE Transaction on Speech and Audio Processing, vol. 9, no.3, March, 2001.
[21] Hansen J.H.L., and Womack B.D., “Feature Analysis and Neural-Based Classification of Speech Under Stress”, IEEE Transactions on Speech and Audio Processing, vol. 4, no. 4, July 1996.
[22] Rabiner L., and Juang B.H., Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, N.J,1993.
[23] 王玉? 著,「以內容為基礎的音頻信號之切割與分類之研究」,國立成功大學碩士論文,民國90 年。
[24] Rabiner L., and Schafer R., Digital processing of speech signals, Prentice-Hall, Inc., N.J, 1978.
[25] Holmes J.N, Speech Synthesis and Recognition, Van Nostrand Reinhold (UK) Co. Ltd, Molly Millars Lane, England, 1988.
[26] V.Opperheim A., and Schafer R.W., Discrete-time Signal Processing, Prentice-Hall, Upper Saddle River, N.J, 1999.
[27] Nwe T.L., and Wei F.S., ”Speech Based Emotion Classification”, Electrical and Electronic Technology, 2001. TENCON. Proceedings of IEEE Region 10 International
Conference on, Volume: 1, 2001.
[28] Gold B., and Morgan N., Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley & Sons, Inc., N.Y, 2000.
[29] Linde Y., Buzo A., and GrayR.M., “ An algorithm for vector quantizer design”, IEEE Transactions on Communications, 28:84-95, January 1980.
[30] Gersho A. , and Gray R.M., Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992.
[31] 戴顯權編著,「資料壓縮」,松崗發行,1996 年3 月,台北。
[32] Ney H., “The Use of One-Stage Dynamic Programming Algorithm for Connected Word Recognition”, IEEE Trans. On Acoustics, Speech, and Signal Processing, vol. ASSP-32, NO. 2, April 1984.
[33] 陳明熒著,「PC 電腦語音辨認實作」,旗標出版,民83, 台北市。