簡易檢索 / 詳目顯示

研究生: 洪詠馨
Hung, Yung-Hsin
論文名稱: 比較中英文母語人士的情緒語語音研究
A Comparative Acoustic Study of Mandarin Emotional Speech by English and Mandarin Native speakers
指導教授: 簡華麗
Jian, Hua-Li
學位類別: 碩士
Master
系所名稱: 文學院 - 外國語文學系
Department of Foreign Languages and Literature
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 101
中文關鍵詞: 情緒語情緒語音重音放置語言移轉音高音強音長
外文關鍵詞: Emotional speech, emotional prosody, stress placement, language transfer, F0, intensity, duration
相關次數: 點閱:143下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究著重於(1)探討英語母語人士是否能使用中文適當地表達情緒,(2)比較中英文母語人士的情緒語的語音是否有異同。將中英文母語人士可被認知的情緒語音作比較,其中比較的語音特色有音高、音長、音強。藉由比較兩組的語音特質,我們可得知英文母語人士使用中文表達情緒上的困難,也可知道在表達不同情緒時使用不同的語音特色。
    藉由劇情引導情緒,參與者必須表達四種基本情緒(開心、生氣、難過、害怕)及普通當作控制變因。分別由十六位英語母語人士及十六位中文母語人士表達這些情緒語,並由另外十六位中文母語人士來評斷這些情緒語是否可被認知。我們將記錄及分析可被認知的情緒語語音特質。本研究所分析的語音特質包括句子的音頻幅度、每一個音節的音頻平均質、句子及音節長度、音節的強度,我們將比較兩組語言人士的語音特質異同及情緒種類的語音特質異同。本研究的發現如下:
    (1)中文母語人士的所有情緒語認知率皆高於英文母語人士。其中所有的中文母語人士所表達的情緒語都有很高的認知率,然而由英文母語人士所表達的開心、生氣、害怕卻是得到較低的認知率,導致此結果的原因包括口音、語音信號及文化特有音素的影響。
    (2)首先整句話的音頻一起分析,兩組母語人士皆使用相似的音頻幅度,然而兩組母語人士所表達的開心及中文母語人士所表達的害怕皆使用較小的音頻幅度,因為這些情緒的表達皆使用較高且平穩的音頻。當只分析第一句話時,難過相較於普通卻有較高的音頻,這可能是使用中文表達難過時的語音特色之一。由英語母語人士所表達的第二句話有較多的音頻變化,因為藉由這些強烈的語音變化,他們所表達的情緒可更容易被認知。
    (3)由英文母語人士所表達的情緒語句子長度皆長於中文母語人士,但是由於兩組間比較基準(普通)的音長有很大的差異,導致在中文母語人士的情緒語中只有生氣有較短的音長。在英文母語人士的情緒語中,有較高活性的情緒皆有較短的語音長度。當比較兩組間的音節長度,只有害怕有較多相似的音節長度;此外,第一、二句話的最後一個音節長度皆有被延長的現象,此現象在第二句話的「的」特別明顯,它延續了前一個音「來」的韻尾。然而,當說話者在表達低活性的情緒如難過,「的」延續前一個音的現象就不會發生,反而變成降落的音調。
    (4)兩組語言人士的情緒語音在生氣和開心時皆有較大的語音強度,而在難過及害怕時則使用較低的語音強度。中文母語人士善長使用聲音強度的寬幅來表達不同的情緒,然而,英文母語人士卻傾向於使用相同強度寬幅來表達所有情緒。

    This thesis aims to (1) ascertain whether native speakers of English can express emotions successfully in Mandarin Chinese, and (2) examine how emotional expressions by native speakers of English, who are learners of Mandarin, differ from native speakers of Mandarin when the emotional portrayals are recognizable. The recognizable emotional portrayals by native speakers of English are compared with those of native speakers of Mandarin. The acoustic features, including F0 parameters, duration, and the intensity values, are analyzed. By comparing these acoustic features of the two groups, the obstacles on using Mandarin to express emotions by native speakers of English may be revealed and how the acoustic features differ when different emotions are expressed.
    The four basic emotions (JOY, ANGER, SADNESS, FEAR) and NEUTRAL as a control were elicited by adopting the scenario approach. The recognition of emotional portrayals from 16 native speakers of English (NS-E) and 16 native speakers of Mandarin Chinese (NS-C) were conducted by 16 native Mandarin raters. The acoustic data of each emotion was recorded to compare prosodic variations between the two groups. F0 range in sentential level, mean F0 of each syllable, the sentential duration, syllabic duration, and the intensity of each segment were analyzed within each emotional expression and compared between the two groups. The findings of this study are summarized as follows:
    (1)The recognition rates of all emotion categories in NS-C are higher than those of NS-E. All emotions are recognized well when uttered by NS-C, but JOY, ANGER, FEAR have low recognition rates by NS-E for the cause of accents, vocal cues, and culture-specific components.
    (2)Both groups adopt similar F0 range at the whole sentence level; however, JOY in both group and FEAR in the NS-C group have small F0 range since these emotions are pronounced with constantly high pitch and little changes. At the first sentence level, SADNESS has a higher F0 range than NEUTRAL, which could be the characteristics of showing SADNESS in Mandarin. More pitch variations is found in S2 by NS-E as NS-C use stable pitch in high activation emotions, but NS-E use a lot of pitch variations to indicate their emotion changes.
    (3)The overall sentence durations of NS-E were longer than NS-C in all emotional categories since level tones tend to be lengthened by the non-native speakers. As the duration of reference point (NEUTRAL) is significantly different between the two groups, only ANGER has fast duration in the NS-C group. Emotions with high activations are all found to have short duration in the NS-E group. As to the comparison of syllable durations between the two groups, there are more segments found to have similar duration in FEAR between the two groups. Also, the final segments in both S1 and S2 are lengthened. The phenomenon is especially found in de5, which reduplicates the tail of the preceding tone, lai2. But when a low activation emotion such as SADNESS is portrayed, it does not reduplicate the preceding tone but has a falling tone.
    (4)ANGER and JOY are emotions found with high intensity values, while SADNESS and FEAR are those with low intensity values in both groups. Native speakers are good at using different intensity range to indicate different emotions, but non-native speakers tend to stick to the same intensity range to express all emotions.

    中文摘要 I ABSTRACT III CONTENTS VI LIST OF FIGURES VIII LIST OF TABLES X CHAPTER ONE INTRODUCTION 1 1.1 Background and Motivation 2 1.2 Purpose of the study 3 1.3 The scope of the study and the contributions 4 1.4 Organization of the thesis 6 CHAPTER TWO LITERATURE REVIEW 7 2.1 Recognition of emotional expressions 7 2.2 Prosodic features of human vocalized emotions 9 2.2.1 F0 parameters in emotional speech 9 2.2.2 Intensity 11 2.2.3 Duration 12 2.2.4 Summary of prosodic features in emotional speech 12 2.3 Prosodic aspects of Mandarin Chinese 13 2.3.1 Introduction on tone 13 2.3.2 Effects of interlanguage for English native speakers learning Mandarin Chinese 15 2.3.3 Mandarin third tone sandhi rule 17 CHAPTER THREE METHODOLOGY 19 3.1 Emotional portrayals 19 3.1.1 Participants 19 3.1.2 Emotions studied 20 3.1.3 Scenarios 21 3.1.4 Selection of sentences 21 3.1.5 Recording procedure 22 3.1.6 Selection of recognizable emotional utterances 22 3.2 Measurement of acoustic data 23 3.2.1 Measurement of F0 23 3.2.2 Measurement of duration 25 3.2.3 Measurement of intensity 27 3.3 Overall research design 28 CHAPTER FOUR RESULTS AND DISCUSSION 30 4.1 Results of recognizable emotional utterances 30 4.1.1 Predictions 31 4.1.2 Recognition rates of all emotions 31 4.2 Results of acoustic analysis 34 4.2.1 Comparison of F0 parameters between NS-C and NS-E 34 4.2.2 Comparison of duration between NS-C and NS-E 56 4.2.3 Comparison of intensity between NS-C and NS-E 67 CHAPTER FIVE CONCLUSION 74 5.1 Summary of the findings 74 5.1.1 Recognition of emotional expressions 75 5.1.2 Pitch variations 75 5.1.3 Speech rates of each emotion 78 5.1.4 Stress placement 79 5.2 Contributions of the present study 80 5.3 Pedagogical Implications 80 5.4 Limitation and Further research 81 Reference 83 Appendix A 89 Appendix B 91 Appendix C 100

    Albas, D., McCluskey, K., & Albas, C. (1976). Perception of the emotional content of speech: A comparison of two Canadian groups. Journal of Cross-Cultural Psychology, 7(4), 481.
    Bachorowski, J. (1999). Vocal expression and perception of emotion. Current Directions in Psychological Science, 8(2), 53.
    Banse, R., & Scherer, K. (1996). Acoustic profiles in vocal emotion expression. Journal of personality and social psychology, 70, 614-636.
    Banziger, T., & Scherer, K. (2005). The role of intonation in emotional expressions. Speech Communication, 46(3-4), 252-267.
    Bao, C. L., Chen, Y. D., Yei, C. H., & Zhang, Y. H. (1993). Emotion Recognition from Mandarin Speech. Paper presented at the Proceedings of the 16th Symposium of the Acoustical Society of the Republic of China.
    Black, J. (1961). Relationships among fundamental frequency, vocal sound pressure, and rate of speaking. Language and Speech, 4(4), 196-199.
    Boersma, P., & Weenink, D. (2005). Praat: doing phonetics by computer (Version 4.3. 14)[Computer program]. Retrieved May, 26, 2005.
    Broselow, E., Hurtig, R., & Ringen, C. (1987). The perception of second language prosody. Interlanguage phonology: The acquisition of a second language sound system, 350-364.
    Burkhardt, F., & Sendlmeier, W. (2000). Verification of acoustical correlates of emotional speech using formant-synthesis. Paper presented at the ISCA workshop, Newcastle, North Ireland, 151-156.
    Cahn, J. (1990). The generation of affect in synthesized speech. Journal of the American Voice I/O Society, 8(1), 1-1.
    Caldwell-Harris, C., & Aycicegi-Dinn, A. (2009). Emotion and lying in a non-native language. International Journal of Psychophysiology, 71(3), 193-204.
    Chao, Y.-R. (1968). A Grammar of Spoken Chinese. Berkeley: University of California Press.
    Chao, Y-R. (1948). Mandarin primer: An intensive course in spoken Chinese. Cambridge: Harvard University Press.
    Chen, G.T. (1974). The pitch range of English and Chinese speakers. Journal of Chinese Linguistics 2, 159-171.
    Chen, M. (2000). Tone sandhi: Patterns across Chinese dialects: Cambridge University Press.
    Davitz, J. (1964). Minor studies and some hypotheses. The communication of emotional meaning, 143-156.
    Dromey, C., Silveira, J., & Sandor, P. (2005). Recognition of affective prosody by speakers of English as a first or foreign language. Speech Communication, 47(3), 351-359.
    Ekman, P. (1992). An argument for basic emotions. Cognition & Emotion, 6(3), 169-200.
    Enos, F., & Hirschberg, J. (2006). A framework for eliciting emotional speech: capitalizing on the actor's process. Paper presented at the 1st International Workshop on Corpora for Research on Emotion and Affect, Genoa, Italy, 6-10.
    Fonagy, I. (1981). Emotions, voice and music. Research aspects on singing, 33, 51-79.
    Frick, R. (1985). Communicating emotion: The role of prosodic features. Psychological Bulletin, 97(3), 412-429.
    Fry, D. (1954). Duration and intensity as physical correlates of linguistic stress. The Journal of the Acoustical Society of America, 26, 138.
    Gandour, J. (1983). Tone perception in Far Eastern languages. Journal of Phonetics, 11, 149-175.
    Graham, C., Hamblin, A., & Feldstein, S. (2001). Recognition of emotion in English voices by speakers of Japanese, Spanish and English. IRAL-International Review of Applied Linguistics in Language Teaching, 39(1), 19-37.
    Jian, H.L. (2004). On the syllable timing in Taiwan English. Speech Prosody 2004-ISCA.
    Kiriloff, C. (1969). On the auditory perception of tones in Mandarin. Phonetica, 20(2-4), 63-67.
    Lee, L., Tseng, C., & Ouh-Young, M. (1989). The synthesis rules in a Chinese text-to-speech system. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(9), 1309-1320.
    Li, W. (2001). Where have all the neutral tones gone? Charting neutral tone decline in Taipei Mandarin, with evidence from online phonological simulation. Paper presented at the American Oriental Society 211th Meeting.
    Lin, M. (1965). The pitch indicator and the pitch characteristics of tones in Standard Chinese. Acta Acoustica (China), 2, 8-15.
    Moore, C., & Jongman, A. (1997). Speaker normalization in the perception of Mandarin Chinese tones. The Journal of the Acoustical Society of America, 102, 1864.
    Murray, I., & Arnott, J. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097.
    Oudeyer, P. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59(1-2), 157-183.
    Paeschke, A., Kienast, M., & Sendlmeier, W. (1999). F 0-Konturen bei emotionaler Sprechweise. Proceedings zur DAGA, Berlin,(Typoskript bzw. Kopie d. Vf.).
    Paulmann, S., Pell, M., & Kotz, S. (2008). How aging affects the recognition of emotional speech. Brain and language, 104(3), 262-269.
    Pell, M., Monetta, L., Paulmann, S., & Kotz, S. (2009). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior, 33(2), 107-120.
    Pereira, C., & Watson, C. (1998). Some acoustic characteristics of emotion. Paper presented at the 5th International Conference on Spoken Language Processing, Sydney, Australia.
    Petrushin, V. (1999). Emotion in speech: Recognition and application to call centers. Artificial Neu. Net. In Engr., 7-10.
    Polzin, T., & Waibel, A. (1998). Detecting emotions in speech. Paper presented at the Proc. Cooperative Multimodal Communication (CMC '98).
    Scherer, K. (1979). Nonlinguistic vocal indicators of emotion and psychopathology. Emotions in personality and psychopathology, 493-529.
    Scherer, K. (1984). On the nature and function of emotion: A component process approach. Approaches to emotion, 293-317.
    Scherer, K. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99(2), 143-165.
    Scherer, K. (1989). Vocal correlates of emotion. Handbook of psychophysiology: Emotion and social behavior, 165-197.
    Scherer, K. (2000): "A cross-cultural investigation of emotion inferences from voice and speech: implications for speech technology". ICSLP-2000(2), 379-382.
    Scherer, K., Banse, R., Wallbott, H., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15(2), 123-148.
    Scherer, K., Banse, R., & Wallbott, H. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1), 76.
    Scherer, K., & Banziger, T. (2004). Emotional Expression in Prosody: A Review and an Agenda for Future Research. Paper presented at the ISCA workshop, 359-366.
    Schroder, M. (2001). Emotional speech synthesis: A review. Paper presented at the Proceedings of Eurospeech 2001, Aalborg, Denmark.
    Selinker, L. (1972). Interlanguage. IRAL-International Review of Applied Linguistics in Language Teaching, 10(1-4), 209-232.
    Shen, X. (1989). Interplay of the four citation tones and intonation in Mandarin Chinese. Journal of Chinese Linguistics, 17(1), 61-74.
    Shen, X. (1990). The Prosody of Mandarin Chinese: University of California Press.
    Shih, C. (1997). Mandarin third tone sandhi and prosodic structure. Studies in Chinese phonology, 81-123.
    Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1145-1154.
    Tickle, A. (2000). English and Japanese speaker’s emotion vocalizations and recognition: a comparison highlighting vowel quality. ISCA Workshop on Speech and Emotion, Belfast, 2000.
    Tsai, P.-L. (2008). Perception and Production of Chinese Lexical Tone by Adult English Speaking Learners. [Ching Hua University MA thesis].
    Van Bezooijen, R. (1984). Characteristics and recognizability of vocal expressions of emotion. The Netherlands: Floris
    Van Bezooijen, R., Otto, S., & Heenan, T. (1983). Recognition of vocal expressions of emotion: A three-nation study to identify universal characteristics. Journal of Cross-Cultural Psychology, 14(4), 387.
    Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162-1181.
    White, C.M. (1981). Tonal perception errors and interference from English intonation. Journal of Chinese Language Teachers Association 16, 27-56.
    Wright, M. (1983). A metrical approach to tone sandhi in Chinese dialects: University of Massachusetts.
    Wu, L. (2009). Interlanguage Rhythmic Patterns and the Tone Effects on Learning Chinese for Native Speakers of English. National Cheng Kung University, Taiwan, Tainan.
    Wu, Z.-J. (2000). From Traditional Chinese Phonology to Modern Speech Processing - Realization of Tone and Intonation in Standard Chinese. ISCLP2000.

    下載圖示 校內:2012-07-29公開
    校外:2012-07-29公開
    QR CODE