簡易檢索 / 詳目顯示

研究生: 魏文麗
Wei, Wen-Li
論文名稱: 在社交互動對話中互動風格辨識之研究
A Study on Interaction Style Recognition in Social Interactive Dialogues
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 102
中文關鍵詞: 互動風格情緒感知時間歷程轉換函數
外文關鍵詞: Interaction style, emotion perception, temporal course, conversion function
相關次數: 點閱:118下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人機介面在日常生活中扮演日益重要的角色,這促使了越來越多的研究人員投入人機互動研究領域,其中,對話系統是人機互動研究領域當中被廣泛探討的議題之一。雖然針對對話系統已進行了諸多研究,系統對於使用者往往只能作出單調的回應,然而,不同的使用者則是透過不同的社交風格去表達他們的意圖,為了達到更人性化互動的目的,系統應該採取靈活多變的對話回應。因此,系統如何去了解使用者在對話過程中展現的社交互動風格,對於實現和諧的人機互動是一個重要且創新的研究方向。
    基於心理學家建議之最佳互動風格配對可以提供對話系統選擇適當的回應,本論文以探討互動風格辨識之研究議題為目標。根據互動風格之定義,透過人格特質與情緒狀態等心理因素來縮小低階特徵 (例如,韻律和語言特徵) 和高階互動風格之間的差距。所提出的交互關聯融合模型提供了一個機率架構來結合情緒、人格特質、初步互動風格結果,和互動風格歷史資訊之間的關係,以得到最後的互動風格辨識結果。實驗結果表明,交互關聯融合模型得到令人滿意地互動風格辨識結果,也證實了結合人格特質與情緒狀態等心理因素確實有效地提高互動風格辨識的準確性。
    由於情緒感知在互動風格中扮演了重要的角色,在實現互動風格辨識的過程中,為了在真實對話的環境下獲得更好的語音情緒辨識結果,本論文進一步針對情緒展現在時間歷程上之演化模式做探討。除此之外,本論文更進一步地探討了在真實對話的情況下,說話因素造成人臉表情情緒辨識效能下降的問題。由於人們通常在說話的同時帶有情緒的展現,使得臉部表情的變化受到嘴巴周圍肌肉牽動產生影響,而為了解決這個問題,本論文提出一個特徵臉轉換為基礎的方法,利用平行資料庫(純表情和伴隨說話因素的表情),透過統計式高斯混合模型建立內文相依的線性轉換函數,移除說話因素對於人臉表情造成的影響,以提升情緒辨識之準確性。實驗結果顯示,本論文所提出的方法優於目前文獻上的做法,證實移除說話因素有助於提升人臉表情情緒辨識的效能。

    In today’s world, human-computer interface plays a key role in daily life. Researchers have been increasingly interested in the field of Human Computer Interaction (HCI). Dialogue systems, as one of the prominent HCI research areas, have been applied to a wide range of domains. Although significant efforts have been made in dialogue systems, an invariable or planned response is generally employed to respond to the speakers’ queries. However, speakers can express their intents in different social styles. Flexible and versatile responses should be taken into account to achieve more user-friendly interaction. Hence, determining how a speaker is engaged in a dialogue is crucial for achieving harmonious interaction between human and computer.
    Based on the most appropriate Interaction Style (IS) pairs suggested by Berens for selecting a suitable response in a dialogue system, the interaction styles defined by Berens were chosen and adopted in this dissertation with the goal of tuning dialogue-system responses to the style of a computer user. According to the Berens model, personality trait and emotional state were considered in this dissertation as features to narrow the gap between low-level features (e.g., prosodic and linguistic features) and high-level interaction style. The proposed Fused Cross-Correlation Model (FCCM) provides a unified probabilistic framework to model the relationships among the psychological factors of emotion, personality trait, transient IS and IS history, for recognizing IS. The experimental results indicate that the proposed FCCM yields satisfactory results in IS recognition and also demonstrate that combining psychological factors effectively improves IS recognition accuracy.
    Since emotion perception plays an important role in social interaction, to obtain a better speech emotion recognition result, we further focused on exploring the temporal evolution of an emotional expression. On the other hand, we also explore the problem of speaking effect for emotion recognition from facial expressions in real dialogue cases. When humans simultaneously express emotion while speaking, the changes of facial expression affected by muscle contractions of the mouth area may cause an unsatisfactory emotion recognition result. To manage this problem, an eigenface conversion-based approach is proposed; through a context-dependent linear conversion function modeled by a statistical Gaussian Mixture Model (GMM) is constructed with parallel data from speaking and non-speaking facial expressions with emotions, to remove speaking effect on facial expressions for improving accuracy of emotion recognition. Experimental results show that the proposed method outperforms current approaches and demonstrates that removing the speaking effect on facial expression is useful for improving the performance of emotion recognition.

    TABLE OF CONTENT VI LIST OF FIGURES IX LIST OF TABLES XI CHAPTER 1. INTRODUCTION 1 1.1. Motivation 1 1.2. Application Areas 5 1.3. The Approach of this Dissertation 6 1.4. The Organization of this Dissertation 8 CHAPTER 2. LITERATURE REVIEW 9 2.1. Emotion Perception 9 2.1.1 Emotion Recognition from Facial Expressions 9 2.1.2 Emotion Recognition from Speech 11 2.2. Personality Trait Recognition 12 2.3. Interaction Style Recognition 13 CHAPTER 3. DATA COLLECTION, ANNOTATION AND ANALYSIS 14 3.1. MHMC Audio-Visual Emotion Database 14 3.1.1 Emotion Category 14 3.1.2 Data Collection 15 3.1.3 Data Labeling 17 3.1.4 Consistency Evaluation of the Ground Truth 19 3.2. NCKU Conversation-Based IS Speech Corpus 21 3.2.1 Data Collection 21 3.2.2 Personality Trait Assessment 22 3.2.3 Interaction Style, Emotional State and Temporal Course Labeling 23 3.2.4 Interrater Agreement 24 3.2.5 Correlations Analysis between Psychological Factors and ISs 26 CHAPTER 4. EMOTION PERCEPTION 31 4.1. Emotion Recognition from Facial Expressions 31 4.1.1 System Overview 31 4.1.2 Eigenface Conversion-Based Approach 35 4.1.2.1 Articulatory Attribute Class Detection 35 4.1.2.2 Facial Feature Representation 36 4.1.2.3 Eigenface Conversion Function Modeling 40 4.1.2.4 Conversion Function Selection 42 4.1.2.5 Facial Feature Reconstruction 46 4.1.2.6 Facial Expression Verification and Emotional Quadrant Decision 46 4.1.2.7 Arousal-Valence Prediction 49 4.1.3 Experiments 49 4.1.3.1 Experimental Setup 49 4.1.3.2 Performance Comparison for the Emotional Quadrants Recognition 51 4.1.3.3 Performance Comparison for Arousal and Valence Value Prediction 58 4.1.4 Summary 60 4.2. Emotion Recognition from Speech 60 4.2.1 Feature Extraction 60 4.2.2 Temporal Course Modeling 62 4.2.3 Experiments 66 4.2.3.1 Experimental Setup 66 4.2.3.2 Experimental Results 67 4.2.4 Summary 69 CHAPTER 5. PERSONALITY TRAIT RECOGNITION 71 5.1. Prosodic and Linguistic Features Extraction 71 5.2. Likelihood Estimation of PT 71 5.3. Experimental Results 72 CHAPTER 6. INTERACTION STYLE RECOGNITION 73 6.1. System Overview 73 6.2. Fused Cross-Correlation Model 74 6.2.1 Model Derivation 74 6.2.2 Likelihood Estimation of Transient IS 78 6.2.3 Prosodic and Linguistic Features Extraction 79 6.2.4 Cross-Correlation Estimation 79 6.3. Experiments 79 6.3.1 Experimental Setup 79 6.3.2 Performance Comparison of Automatic Speech Recognition 81 6.3.3 Performance Comparison of Transient IS Recognition 81 6.3.4 Performance Comparison of IS Recognition 82 6.4. Summary 87 CHAPTER 7. CONCLUSIONS AND FUTURE WORK 88 REFERENCES 90

    [Ambady and Rosenthal 1992] N. Ambady and R. Rosenthal, “Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis,” Psychol. Bull., vol. 111, no. 2, pp. 256–274, 1992.
    [Artstein and Poesio 2008] R. Artstein and M. Poesio, “Inter-coder agreement for computational linguistics,” Computational Linguistics, vol. 34, no. 4, pp. 555–596, 2008.
    [Audhkhasi et al. 2012] K. Audhkhasi, A. Metallinou, M. Li, and S. S. Narayanan, “Speaker personality classification using systems based on acoustic-lexical cues and an optimal tree-structured Bayesian network,” in Proc. INTERSPEECH, 2012.
    [Ayadi et al. 2011] M. E. Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: features, classification schemes, and databases,” Pattern Recognition, vol. 44, no. 3, pp. 572–587, 2011.
    [Berens 2008] L. V. Berens, Understanding Yourself and Others: An Introduction to Interaction Styles. Telos Publications, 2008.
    [Boersma and Weenink 2007] Boersma, P. and Weenink, D., Praat: doing phonetics by computer. http://www.praat.org/. 2007.
    [Bolton and Bolton 2009] R. Bolton and D. G. Bolton, People Styles at Work -- And Beyond: Making Bad Relationships Good and Good Relationships Better. American Management Association, 2009.
    [Bradley et al. 2001] M. M. Bradley, M. Codispoti, B. N. Cuthbert, and T. J. Lang, “Emotion and motivation I: defensive and appetitive reactions in picture processing,” Emotion, vol. 1, no. 3, pp. 276–298, 2001.
    [Burges 1998] Christopher J. C. Burges. “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.
    [Carletta 1996] J. Carletta, “Assessing agreement on classification tasks: the kappa statistic,” Comput. Linguistic, vol. 22, no. 2, pp. 249–254, 1996.
    [Carpenter 2006] R. Carpenter, Jabberwacky [Online]. Available: http://www.jabberwacky.com/
    [Chang and Lin 2001] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm
    [Choi and Oh 2006] H. C. Choi and S. Y. Oh, “Realtime facial expression recognition using active appearance model and multilayer perceptron,” SICE-ICASE International Joint Conference, pp. 5924–5927, 2006.
    [Cooper et al. 2009] H. M. Cooper, L. V. Hedges, and J. C. Valentine, The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation, NY 2009.
    [Cootes et al. 1992] T. F. Cootes, D. H. Cooper, C. J. Taylor, and J. Graham, “Trainable method of parametric shape description,” Image and Vision Computing, vol. 10, no. 5, pp. 289–294, 1992.
    [Cootes et al. 2001] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, 2001.
    [Dahl 1994] D. A. Dahl, “Expanding the scope of the ATIS task: the ATIS—3 corpus,” in Proc. DARPA Human Language Technology Workshop, pp.43–48, 1994.
    [Datcu and Rothkrantz 2008] D. Datcu and L. J. M. Rothkrantz, “Semantic audio-visual data fusion for automatic emotion recognition,” Euromedia, 2008.
    [Deerwester et al. 1990] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, 1990.
    [Dubuisson et al. 2002] S. Dubuisson, F. Davoine, and M. Masson, “A solution for facial expression representation and recognition,” Signal Processing: Image Communication, vol. 17, no. 9, pp. 657–673, 2002.
    [Ekman 1993] P. Ekman, “Facial expression and emotion,” American Psychologist, vol. 48, no. 4, pp. 384–392, 1993.
    [Ekman 1999] P. Ekman, Handbook of Cognition and Emotion. Wiley, 1999.
    [Eysenck 1947] H. Eysenck, Dimensions of Personality. 1947.
    [Forlizzi 2005] J. Forlizzi, “Robotic products to assist the aging population,” ACM Interactions Special Issue on Human-Robot Interaction, vol. 5, no. 2, pp. 16–18, 2005.
    [Goldberg et al. 2006] L. R. Goldberg, J. A. Johnson, H. W. Eber, R. Hogan, M. C. Ashton, C. R. Cloninger, and H. G. Gough, “The international personality item pool and the future of public-domain personality measures,” Journal of Research in Personality, vol. 40, no.1, pp. 84–96, 2006.
    [Gunes and Pantic 2010] H. Gunes and M. Pantic, “Automatic, dimensional and continuous emotion recognition,” International Journal of Synthetic Emotions, vol. 1, no. 1, pp. 68–99, 2010.
    [Huang 2003] Y. Huang, “Support vector machines for text categorization based on latent semantic indexing,” Technical report, Electrical and Computer Engineering Department, The Johns Hopkins University, 2003.
    [Iacobelli et al. 2011] F. Iacobelli, A. J. Gill, S. Nowson, and J. Oberlander, “Large scale personality classification of bloggers,” Int’l Conf. on Affective computing and intelligent interaction, vol. Part II, pp. 568–577, 2011.
    [Jolliffe 1986] I. T. Jolliffe, Principal Component Analysis. Springer, USA, 1986.
    [Jurafsky et al. 2009] D. Jurafsky, R. Ranganath, and D. Macfarland, “Extracting social meaning: identifying interactional style in spoken conversation,” in Proc. NAACL HLT, pp. 638–646, 2009.
    [Kain and Macon 1998] A. Kain and M. W. Macon, “Spectral voice conversion for text-to-speech synthesis,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 285–288, 1998.
    [Kharat and Dudul 2008] G. U. Kharat and S. V. Dudul, “Human emotion recognition system using optimally designed SVM with different facial feature extraction techniques”, WSEAS Transactions on Computers, vol. 7, no. 6, pp. 650–659, 2008.
    [Kooladugi et al. 2011] S. G. Kooladugi, N. Kumar, and K. S. Rao, “Speech emotion recognition using segmental level prosodic analysis”, Int’l Conf. on Devices and Communications, 1-5, 2011.
    [Kotsia et al. 2008] I. Kotsia, I. Buciu, and I. Pitas, “An analysis of facial expression recognition under partial facial image occlusion,” J. Image and Vision Computing, vol. 26, no. 7, pp. 1052–1067. 2008.
    [Kwon et al. 2003] O. W. Kwon, K. Chan, J. Hao, and T. W. Lee, “Emotion recognition by speech signals”, Proc. Eighth European Conf. Speech Comm. and Technology (EUROSPEECH), 2003.
    [Landis and Koch 1977] J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,” Biometrics, vol. 33, no. 1, pp. 159–174, 1977.
    [Lang 1980] P. J. Lang, “Behavioral treatment and bio-behavioral assessment: computer applications: technology in mental health care delivery systems,” In: Sidowski, J.B., Johnson, J.H., Williams, T.A. (eds.), pp. 119–137, Ablex Publishing, Norwood, 1980.
    [Lang et al. 1993] P. J. Lang, M. K. Greenwald, M. M. Bradley, and A. O. Hamm, “Looking at pictures: affective, facial, visceral, and behavioral reactions,” Psychophysiology, vol. 30, no. 3, pp. 261–273, 1993.
    [Lanitis et al. 1994] A. Lanitis, C. J. Taylor, and T. F. Cootes, “Automatic tracking, coding and reconstruction of human faces using flexible appearance models,” IEEE Electronic Letters, vol. 30, no. 19, pp. 1587–1588, 1994.
    [Lee and Narayanan 2005] C. M. Lee and S. Narayanan, “Toward detecting emotions in spoken dialogs”, IEEE Trans. Speech and Audio Processing, vol. 13, no. 2, pp. 293–303, 2005.
    [Lee et al. 2010] C.-H. Lee, C.-H. Wu, and J.-C. Guo, “Pronunciation variation generation for spontaneous speech synthesis using state-based voice transformation,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 4826–4829, 2010.
    [Levenson 2003] R. W. Levenson, “Autonomic specificity and emotion,” Handbook of Affective Sciences, pp. 212–224, 2003. Oxford: Oxford University press.
    [Li et al. 2005] J. Li, Y. Tsao, and C.-H. Lee, “A study on knowledge source integration for rescoring in automatic speech recognition,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 837–840, 2005.
    [Lima et al. 2004] A. Lima, H. Zen, Y. Nankaku, C. Miyajima, K. Tokuda, and T. Kitamura, “On the use of kernel PCA for feature extraction in speech recognition,” IEICE Transactions on Information and Systems, vol. 87-D, no. 12, pp. 2802–2811, 2004.
    [Lin et al. 2012] J.-C. Lin, C.-H. Wu, and W.-L. Wei, “Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition,” IEEE Trans. on Multimedia, vol. 14, no. 1, pp. 142–156, 2012.
    [Lin et al. 2013] J.-C. Lin, C.-H. Wu, and W.-L. Wei, “Emotion recognition of conversational affective speech using temporal course modeling,” INTERSPEECH, pp. 1336–1340, 2013.
    [Liu et al. 2008] J. Liu, Y. Xu, S. Seneff, and V. Zue, “Citybrowser II: a multimodal restaurant guide in Mandarin,” in Proc. ISCSLP, pp. 1–4, 2008.
    [Luengo et al. 2005] Luengo, I., Navas, E., Hernáez, I. and Sánchez, J., “Automatic emotion recognition using prosodic parameters”, Proc. INTERSPEECH, 493-496, 2005.
    [Ma and Chen 2004] W.-Y. Ma and K.-J. Chen, “Design of CKIP Chinese word segmentation system,” Journal of Chinese Language and Computing, vol. 14, no. 3, 2004.
    [Mairesse et al. 2007] F. Mairesse, M. A. Walker, M. R. Mehl, and R. K. Moore, “Using linguistic cues for the automatic recognition of personality in conversation and text,” Journal of Artificial Intelligence Research, vol. 30, pp. 457–500, 2007.
    [Mana and Pianesi 2007] Mana, N. and Pianesi, F., “Modeling of emotional facial expressions during speech in synthetic talking heads using a hybrid approach”, Int’l Conf. Auditory-Visual Speech Processing (AVSP), 2007.
    [Mehrabian 1968] A. Mehrabian, “Communication without words,” Psychol. Today, vol. 2, no.4, pp.53–56, 1968.
    [Menzel and D’Aluisio 2000] P. Menzel and F. D’Aluisio, Robo Sapiens: Evolution of a New Species. Cambridge, MA: MIT Press, 2000.
    [Metallinou et al. 2008] A. Metallinou, S. Lee, and S. Narayanan, “Audio-visual emotion recognition using Gaussian mixture models for face and voice,” Int’l Symposium on Multimedia, pp. 250–257, 2008.
    [Metallinou et al. 2010] A. Metallinou, S. Lee, and S. Narayanan, “Decision level combination of multiple modalities for recognition and analysis of emotional expression,” Int’l Conf. Acoustics, Speech, and Signal Processing, pp. 2462–2465, 2010.
    [Metallinou et al. 2012] A. Metallinou, M. Wollmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan, “Context-sensitive learning for enhanced audiovisual emotion classification,” IEEE Trans. Affective Computing, vol. 3, no. 2, pp. 184–198, 2012.
    [Mikels et al. 2005] J. A. Mikels, B. L. Fredrickson, G. R. Larkin, C. M. Lindberg, S. J. Maglio, and P. A. Reuter-Lorenz, “Emotional category data on images from the International Affective Picture System,” Behavior Research Methods, vol. 37, no. 4, pp. 626–630, 2005.
    [Morrison et al. 2007] D. Morrison, R. Wang, and L. C. De Silva, “Ensemble methods for spoken emotion recognition in call-centres,” Speech Communication, vol. 49, no. 2, pp. 98–112, 2007.
    [Nicolaou et al. 2011] M. A. Nicolaou, H. Gunes, and M. Pantic, “A multi-layer hybrid framework for dimensional emotion classification,” Proc. of ACM Multimedia, pp. 933–936, 2011.
    [Ntalampiras and Fakotakis 2012] S. Ntalampiras and N. Fakotakis, “Modeling the temporal evolution of acoustic parameters for speech emotion recognition,” IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 116–125, 2012.
    [Park and Kim 2009] S. Park and D. Kim, “Subtle facial expression recognition using motion magnification,” Pattern Recognition Letters, vol. 30, no. 7, pp. 708–716, 2009.
    [Pennebaker et al. 2007] J. W. Pennebaker, R. J. Booth, and M. E. Francis, Linguistic Inquiry and Word Count: LIWC [Computer software], Austin, TX: LIWC.net, 2007.
    [Picard 1997] R. W. Picard, Affective Computing. MIT Press, 1997.
    [Roy et al. 2000] N. Roy, J. Pineau, and S. Thrun, “Spoken dialogue management using probabilistic reasoning,” in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), pp. 93–100, 2000.
    [Russell 1980] J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology, vol. 39, no. 6, pp. 1161–1178, 1980.
    [Scherer 2003] K. R. Scherer, “Vocal communication of emotion: a review of research paradigms”, Speech Communication, vol. 40, no. 1-2, pp. 227–256, 2003.
    [Schuller et al. 2003] B. Schuller, G. Rigoll, and M. Lang, “Hidden Markov model-based speech emotion recognition,” Proc. Int’l Conf. Acoustics, Speech, and Signal Processing (ICASSP), pp. II 1–4, 2003.
    [Sen and Srivastava 1990] A. Sen and M. Srivastava, Regression Analysis: Theory, Methods, and Applications. New York, Springer, 1990.
    [Siniscalchi et al. 2008] S. M. Siniscalchi, T. Svendsen, and C.-H. Lee, “Toward a detector-based universal phone recognizer,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 4261–4264, 2008.
    [Smola and Schölkopf 2004] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004.
    [Song et al. 2008] M. Song, M. You, N. Li, and C. Chen, “A robust multimodal approach for emotion recognition,” Neurocomputing, vol. 71, no. 10-12, pp. 1913–1920, 2008.
    [Stevenson et al. 2007] R. A. Stevenson, J. A. Mikels, and T. W. James, “Characterization of the affective norms for english words by discrete emotional categories,” Behavior Research Methods, vol. 39, no. 4, pp. 1020–1024, 2007.
    [Stylianou et al. 1998] Y. Stylianou, O. Cappé, and E. Moulines, “Continuous probabilistic transform for voice conversion,” IEEE Transactions on Speech and Audio Processing, vol. 6, no. 2, pp. 131–142, 1998.
    [Takiguchi and Ariki 2007] T. Takiguchi and Y. Ariki, “PCA-based speech enhancement for distorted speech recognition,” Journal of Multimedia, vol. 2, no. 5, pp. 13–18, 2007.
    [Tang and Deng 2007] F. Tang and B. Deng, “Facial expression recognition using AAM and. local facial features,” Int’l Conf. on Natural Computation, pp. 632–635, 2007.
    [Tao et al. 2006] J. Tao, Y. Kang, and A. Li, “Prosody conversion from neutral speech to emotional speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1145–1154, 2006.
    [Thayer 1989] R. E. Thayer, The Biopsychology of Mood and Arousal. New York: Oxford Univ. Press, 1989.
    [Toothaker 1992] L. E. Toothaker, Multiple Comparison Procedures. Sage Pubns, 1992.
    [Truong et al. 2009] K. P. Truong, D. A. van Leeuwen, M. A. Neerincx, and F. M.G. de Jong, “Arousal and valence prediction in spontaneous emotional speech: felt versus perceived emotion,” in Proceedings of Interspeech, pp. 2027–2030, 2009.
    [Turk and Pentland 1991] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
    [Valstar 2008] M. F. Valstar, Timing is everything: a spatio-temporal approach to the analysis of facial actions. Ph.D. thesis, Imperial College, London, 2008.
    [Valstar and Pantic 2006] M. F. Valstar and M. Pantic, “Fully automatic facial action unit detection and temporal analysis”, Proc. Int’l Conf. on Computer Vision and Pattern Recognition (CVPR), 2006.
    [Valstar and Pantic 2012] M. F. Valstar and M. Pantic, “Fully automatic recognition of the temporal phases of facial actions,” IEEE Trans. Systems, Man and Cybernetics–Part B, vol. 42, no.1, pp. 28–43, 2012.
    [Vapnik 1995] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1995.
    [Viola and Jones 2001] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Proc. Int’l Conf. Computer Vision Pattern Recognition, pp. 511–518, 2001.
    [Wagner et al. 2013] J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André, “The Social Signal Interpretation (SSI) Framework Multimodal Signal Processing and Recognition in Real-Time,” ACM International Conference on Multimedia, 2013.
    [Wallace 2003] R. Wallace, Artificial linguistic internet computer entity (A.L.I.C.E.) [Online]. Available: http://www.alicebot.org/aiml.html
    [Wang and Guan 2008] Y. Wang and L. Guan, “Recognizing human emotional state from audiovisual signals,” IEEE Trans. Multimedia, vol. 10, no.5, pp. 936–946, Aug. 2008.
    [Wang et al. 2012] Y. Wang, L. Guan, and A. N. Venetsanopoulos, “Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition,” IEEE Trans. Multimedia, vol. 14, no.3, pp. 597–607, 2012.
    [Walker et al. 1997] M. A. Walker, J. E. Cahn, and S. J. Whittaker, “Improvising linguistic style: social and effective bases for agent,” in Proc. International Conference on Autonomous Agents, pp. 96–105, 1997.
    [Wei et al. 2013] W.-L. Wei, C.-H. Wu, J.-C. Lin, and H. Li, “Interaction style detection based on fused cross-correlation model in spoken conversation,” ICASSP, Vancouver, Canada, 2013.
    [Wei et al. 2014] W.-L. Wei, C.-H. Wu, J.-C. Lin, and H. Li, “Exploiting psychological factors for interaction style recognition in spoken conversation,” IEEE Trans. Audio, Speech and Language Processing, vol. 22, no. 3, pp. 659–671, 2014.
    [Wu and Liang 2011] C.-H. Wu and W.-B. Liang, “Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels,” IEEE Trans. Affective Computing, vol. 2, no. 1, pp. 10–21, 2011.
    [Wu and Yan 2005] C.-H. Wu and G.-L. Yan, “Speech act modeling and verification of spontaneous speech with disfluency in a spoken dialogue system,” IEEE Trans. on Speech and Audio Processing, vol. 13, no. 3, pp. 330–344, 2005.
    [Wu et al. 2006] C.-H. Wu, Z.-J. Chuang, and Y.-C. Lin, “Emotion recognition from text using semantic label and separable mixture model,” ACM Trans. on Asian Language Information Processing, vol. 5, no. 2, pp. 165–182, 2006.
    [Wu et al. 2009] C.-H. Wu, J. F. Yeh, and Z. J. Chuang, “Emotion perception and recognition from speech”, Affective Information Processing, Chapter 6, pp. 93–110, Springer, 2009.
    [Wu et al. 2010(a)] D. Wu, T. D. Parsons, E. Mower, and S. Narayanan, “Speech emotion estimation in 3D space,” Proceedings of IEEE International Conference on Multimedia & Expo (ICME), pp. 737–742, 2010.
    [Wu et al. 2010(b)] C.-H. Wu, C.-H. Lee, and Z.-J. Chung, “Co-articulation generation using maximum direction change and apparent motion for Chinese visual speech synthesis,” in Proceedings of International Computer Symposium, 2010.
    [Wu et al. 2013(a)] C.-H. Wu, W.-L. Wei, J.-C. Lin, and W.-Y. Lee, “Speaking effect removal on emotion recognition from facial expressions based on eigenface conversion,” IEEE Trans. Multimedia, vol. 15, no.8, pp. 1732–1744, 2013.
    [Wu et al. 2013(b)] C.-H. Wu, J.-C. Lin, and W.-L. Wei, “Two-level hierarchical alignment for semi-coupled HMM-based audiovisual emotion recognition with temporal course,” IEEE Trans. on Multimedia, vol. 15, no. 8, pp. 1880–1895, 2013.
    [Yang et al. 2004] J. Yang, D. Zhang, A. F. Frangi, and J.-Y. Yang, “Two-dimensional PCA: a new approach to appearance-based face representation and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1, pp. 131–137, 2004.
    [Yang et al. 2008] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H.-H. Chen, “A regression approach to music emotion recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 2, pp. 448–457, 2008.
    [Yeasin et al. 2006] M. Yeasin, B. Bullot, and R. Sharma, “Recognition of facial expressions and measurement of levels of interest from video,” IEEE Trans. Multimedia, vol. 8, no.3, pp. 500–508 June, 2006.
    [Young et al. 2006] S. J. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book (Version 3.4). Cambridge University Press, 2006.
    [Zeng et al. 2004] Z. Zeng, J. Tu, M. Liu, T. Zhang, N. Rizzolo, Z. Zhang, T. Huang, D. Roth, and S. Levinson, “Bimodal HCI-related affect recognition,” The 6th International Conference on Multimodal Interfaces, pp. 137–143, 2004.
    [Zeng et al. 2007] Z. Zeng, J. Tu, M. Liu, T. S. Huang, B. Pianfetti, D. Roth, and S. Levinson, “Audio-visual affect recognition,” IEEE Trans. Multimedia, vol. 9, no. 2, pp. 424–428, 2007.
    [Zeng et al. 2009] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: audio, visual, and spontaneous expressions”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 39–58, 2009.
    [Zhang et al. 2011] L. Zhang, D. Tjondronegoro, and V. Chandran, “Evaluation of texture and geometry for dimensional facial expression recognition,” International Conference on Digital Image Computing: Techniques and Applications, pp. 620–626, 2011.
    [Zhao 2006] S.-Y. Zhao, “Humanoid social robots as a medium of communication, New Media & Society, vol. 8, no. 3, pp. 401–419, 2006.
    [Zhao and Zhang 2012] X. Zhao and S. Zhang, “Facial expression recognition using local binary patterns and discriminant kernel locally linear embedding,” EURASIP Journal on Advances in Signal Processing, vol. 2012, no. 20, pp. 1–9, 2012.
    [Zhu et al. 2009] Y. Zhu, Y. Chen, and Y. Zhan, “Expression recognition method of image sequence in audio-video,” Third International Symposium on Intelligent Information Technology Application, pp. 513–516, 2009.

    下載圖示 校內:2020-07-14公開
    校外:2020-07-14公開
    QR CODE