研究生: |
黃琨義 Huang, Kun-Yi |
---|---|
論文名稱: |
基於激發性語音回應之情緒識別於情感性疾患偵測之研究 A Study on Emotion Recognition from Elicited Speech Responses for Mood Disorder Detection |
指導教授: |
吳宗憲
Wu, Chung-Hsien |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 100 |
中文關鍵詞: | 情感性疾患偵測 、語音情緒識別 、隱含式情感結構模型 、卷積神經網路 、長短期記憶模型 、注意模型 |
外文關鍵詞: | Mood disorder detection, speech emotion recognition, latent affective structure model, convolutional neural network, long short-term memory, attention model |
相關次數: | 點閱:141 下載:6 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語音是人與人之間最容易溝通表達的方式之一,而且人類可透過語音感受彼此的情緒。在心理學上,長期低落情緒將間接導致負向的心情狀態進而影響心理健康,並引發情感性疾病。情感性疾患中,躁鬱症與憂鬱症是現今最常見的精神疾病,臨床醫師經由精神疾病診斷與手冊中定義的症狀、家族病史等以及長期的追蹤來確診病患,經臨床診斷發現,躁鬱症患者在初診時處於鬱期,有相當高的機率被誤診為憂鬱症,而誤診使得患者受到不適當的治療導致病情惡化。近年來語音情緒識別技術已相當成熟,因此如能從語音情緒識別有效的應用於情感性疾患的偵測,輔以臨床醫生精確的區分躁鬱症與憂鬱症的診斷並降低誤診率將是一大目標。
本研究的目的為設計及發展情感性疾患偵測系統,以輔助醫師診斷情感性疾患。本研究除了彙集一多媒介語料庫,包含語音、文字情緒與心情抒發等語料庫,並與臨床醫師合作共同設計及發展一藉由使用者觀看情緒激發影片之回應語音來建構出情感性疾患語料庫。透過探討上述不同媒介的情緒表現,建立轉換模型減小感知情緒與真實情緒之間的誤差,並融合非同步多模態於長時間情緒累積偵測正負向心情,並且進一步觀察各類情緒激發影片中短時情緒剖面的表現,利用注意模型將短時情緒剖面的重要程度量化及顯著化,同時分析各類影片對情感性疾患的實際反應。最後,以群集模型聚集短時情緒剖面,建構出情緒剖面碼本,並導入隱含式情感結構建立情感性疾患偵測模型,以強化情感性病患之偵測。
實驗評量主要探討語音情緒辨識率、文字情緒辨識率、心情辨識率、情感性疾患辨識率與傳統模型效能比較。實驗結果顯示,本文所提的方法架構與實現系統,在系統功能性評估上,語音情緒辨識,文字情緒辨識與心情辨識有效提升辨識率並優於傳統方法。在臨床應用上,情感性疾患的偵測也優於傳統分類器,且有效提升辨識率。本研究屬特殊臨床應用,未來之研究著重臨床案例收集及運用先進的語音、文字的技術於系統發展與設計的修改與改善。
Speech is one of the simplest and fastest way for human-human communication. Through speech, people can distinctly feel the emotion of each other. Long-term negative emotion may indirectly lead to a negative mood, thus affects mental health and causes emotional illness. In mood disorder, bipolar disorder (BD) and unipolar disorder (UD) are the most common mental illness. In clinical diagnosis, the diagnostic criterion is based on the Diagnostic and Statistical Manual of Mental Disorders which defines symptoms, progressions, and family medical history and so on. According to clinical diagnosis, a large portion of BD patients are misdiagnosed as having UD initially, and misdiagnosis causes the patient to be treated inappropriately to cause the disease to deteriorate. In recent years, speech emotion recognition technology has been advanced considerably. With the advancement in automatic speech emotion recognition, accurate and early diagnosis for clinicians to distinguish between BD and UD becomes achievable.
The purpose of this study was to design and develop a mood disorder detection system to assist clinician in diagnosing mood disorder. This study collected data from different modalities, consisting of speech, text, and emoji. In addition, this study cooperated with the psychiatrist to design and develop the system to collect the elicited facial expressions and speech responses of the patients from six elicited emotional videos to construct the mood disorder database. Then, this study constructed a conversion model to reduce differences between perceived emotional expression and self-reported emotional expression. Also, the asynchronous long-term emotion expressions from different modalities are fused to determine the conclusive positive/negative mood state. For modeling local variation of emotions in each speech response, the attention mechanism was used to generate the emotion profile (EP) of each elicited speech response. Then, this study analyzes the emotion profile of each elicited emotional video. Finally, this study clustered the short-term emotion profiles to construct an emotion profile codebook. Finally, a class-specific latent affective structure model (LASM) was proposed to model the structural relationships among the emotion codewords with respect to six emotional videos for mood disorder detection.
Experimental evaluation on speech emotion recognition, text emotion recognition, mood detection, and mood disorder detection were conducted. Performance comparisons to traditional models were also included. The experimental results showed the performance of the proposed method was better than that of the traditional model for emotion recognition. In the clinical application, the performance of the proposed method also outperformed the traditional model for mood disorder detection. In future work, advanced speech and text technology will be used to gradually modify and improve each model to achieve a better performance for mood disorder detection.
AbdAlmisreb, A., Abidin, A. F., & Tahir, N. M. (2015). Maxout based deep neural networks for Arabic phonemes recognition. International Colloquium on Signal Processing & Its Applications (pp. 192-197). IEEE.
Akinci, G., Polat, E., & Koçak, O. M. (2012). A video based eye detection system for bipolar disorder diagnosis. Signal Processing and Communications Applications Conference (pp. 1-4). IEEE.
Amir, N., Ron, S., & Laor, N. (2000). Analysis of an emotional speech corpus in Hebrew based on objective criteria. ISCA Tutorial and Research Workshop on Speech and Emotion.
Anila, R., & Revathy, A. (2015). Emotion recognition using continuous density HMM. International Conference on Communications and Signal Processing (pp. 0919-0923). IEEE.
Association, A. P. (2013). Diagnostic and statistical manual of mental disorders (DSM-5). American Psychiatric Pub.
Barnes, T. R. (1989). A rating scale for drug-induced akathisia. The British Journal of Psychiatry, 154(5), pp. 672-676.
Batliner, A., Steidl, S., Seppi, D., & Schuller, B. (2010). Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Advances in Human-Computer Interaction, 2010, p. 3.
Bersani, G., Polli, E., Valeriani, G., Zullo, D., Melcore, C., Capra, E., et al. (2013). Facial expression in patients with bipolar disorder and schizophrenia in response to emotional stimuli: a partially shared cognitive and social deficit of the two disorders. Neuropsychiatric Disease and Treatment, 9, p. 1137.
Bhalla, J. S., & Aggarwal, A. (2013). Using Adaboost Algorithm along with Artificial neural networks for efficient human emotion recognition from speech. International Conference on Control, Automation, Robotics and Embedded Systems (pp. 1-6). IEEE.
Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7-8), pp. 613-625.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. Ninth European Conference on Speech Communication and Technology.
Burnett, S., Bird, G., Moll, J., Frith, C., & Blakemore, S.-J. (2009). Development during adolescence of the neural processing of social emotion. Journal of Cognitive Neuroscience, 21(9), pp. 1736-1750.
Busso, C., & Narayanan, S. S. (2008). The expression and perception of emotions: Comparing assessments of self versus others. Ninth Annual Conference of the International Speech Communication Association.
Cannizzaro, M., Harel, B., Reilly, N., Chappell, P., & Snyder, P. J. (2004). Voice acoustical measurement of the severity of major depression. Brain and Cognition, 56(1), pp. 30-35.
Cheng, C. M., Chen, H. C., Chan, Y. C., Su, Y. C., & Tseng, C. C. (2013). Taiwan corpora of Chinese emotions and relevant psychophysiological data-Normative Data for Chinese Jokes. Chin. J. Psychol, 55, pp. 555-569.
Cohn, J. F., Kruez, T. S., Matthews, I., Yang, Y., Nguyen, M. H., Padilla, M. T., et al. (2009). Detecting depression from facial actions and vocal prosody. International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 1-7). IEEE.
Deng, J., Xia, R., Zhang, Z., Liu, Y., & Schuller, B. (2014). Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition. International Conference on Acoustics, Speech, and Signal Processing (pp. 4851-4855). IEEE.
ehownet. (2007). Retrieved from http://ehownet.iis.sinica.edu.tw/index.php
Ekman, P. (1999). Basic emotions. Handbook of Cognition and Emotion (pp. 45-60). Wiley Online Library.
Eyben, F., Wöllmer, M., & Schuller, B. (2010). Opensmile: the munich versatile and fast open-source audio feature extractor. International Conference on Multimedia (pp. 1459-1462). ACM.
Fei, W., Ye, X., Sun, Z., Huang, Y., Zhang, X., & Shang, S. (2016). Research on speech emotion recognition based on deep auto-encoder. International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (pp. 308-312). IEEE.
France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Engineering, 47(7), pp. 829-837.
Fu, T. S.-T., Lee, C.-S., Gunnell, D., Lee, W.-C., & Cheng, A. T.-A. (2013). Changing trends in the prevalence of common mental disorders in Taiwan: a 20-year repeated cross-sectional survey. The Lancet, 381(9862), pp. 235-241.
gensim: Topic modelling for humans. (2017). Retrieved from https://radimrehurek.com/gensim/index.html
Giannakopoulos, T. (2009). A method for silence removal and segmentation of speech signals, implemented in Matlab. University of Athens.
Gideon, J., Provost, E. M., & McInnis, M. (2016). Mood state prediction from speech of varying acoustic quality for individuals with bipolar disorder. International Conference on Acoustics, Speech and Signal Processing (pp. 2359-2363). IEEE.
Greco, A., Valenza, G., Lanata, A., Rota, G., & Scilingo, E. P. (2014). Electrodermal activity in bipolar patients during affective elicitation. IEEE Journal of Biomedical and Health Informatics, 18(6), pp. 1865-1873.
Gross, J. J., & Levenson, R. W. (1995). Emotion elicitation using films. Cognition & Emotion, 9(1), pp. 87-108.
Grünerbl, A., Muaremi, A., Osmani, V., Bahle, G., Oehler, S., Tröster, G., et al. (2015). Smartphone-based recognition of states and state changes in bipolar disorder patients. IEEE Journal of Biomedical and Health Informatics, 19(1), pp. 140-148.
Guidi, A., Vanello, N., Bertschy, G., Gentili, C., Landini, L., & Scilingo, E. P. (2015). Automatic analysis of speech F0 contour for the characterization of mood changes in bipolar patients. Biomedical Signal Processing and Control, 17, pp. 29-37.
Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23(1), p. 56.
Han, W., Ruan, H., Yu, X., & Zhu, X. (2016). Combining feature selection and representation for speech emotion recognition. International Conference on Multimedia & Expo Workshops (pp. 1-5). IEEE.
Hirschfeld, R. M., Williams, J. B., Spitzer, R. L., Calabrese, J. R., Flynn, L., Keck Jr, P. E., et al. (2000). Development and validation of a screening instrument for bipolar spectrum disorder: the Mood Disorder Questionnaire. American Journal of Psychiatry, 157(11), pp. 1873-1875.
Horwitz, R., Quatieri, T. F., Helfer, B. S., Yu, B., Williamson, J. R., & Mundt, J. (2013). On the relative importance of vocal source, system, and prosody in human depression. International Conference on Body Sensor Networks (pp. 1-6). IEEE.
Howard, N. (2013). Approach towards a natural language analysis for diagnosing mood disorders and comorbid conditions. International Conference on Artificial Intelligence (pp. 234-243). IEEE.
Huang, K.-Y., Wu, C.-H., & Su, M.-H. (n.d.). Attention-based Convolutional Neural Network and Long Short-term Memory for Short-term Detection of Mood Disorders based on Elicited Speech Responses. Pattern Recognition.
Huang, K.-Y., Wu, C.-H., Kuo, Y.-T., & Jang, F.-L. (2016). Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder. INTERSPEECH, (pp. 1452-1456).
Huang, K.-Y., Wu, C.-H., Kuo, Y.-T., Yen, H.-H., Jang, F.-L., & Chiu, Y.-H. (2015). Data collection of elicited facial expressions and speech responses for mood disorder detection. International Conference on Orange Technologies (ICOT) (pp. 42-45). IEEE.
Huang, K.-Y., Wu, C.-H., Su, M.-H., & Chou, C.-H. (2017). Mood disorder identification using deep bottleneck features of elicited speech. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 1648-1652). IEEE.
Huang, K.-Y., Wu, C.-H., Su, M.-H., & Fu, H.-C. (2017). Mood detection from daily conversational speech using denoising autoencoder and LSTM. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5125-5129). IEEE.
Huang, K.-Y., Wu, C.-H., Su, M.-H., & Kuo, Y.-T. (2018). Detecting Unipolar and Bipolar Depressive Disorders from Elicited Speech Responses Using Latent Affective Structure Model. IEEE Transactions on Affective Computing.
Huang, K.-Y., Wu, C.-H., Yang, T.-H., Su, M.-H., & Chou, J.-H. (2016). Speech emotion recognition using autoencoder bottleneck features and LSTM. International Conference on Orange Technologies (ICOT) (pp. 1-4). IEEE.
Jeon, J. H., Xia, R., & Liu, Y. (2011). Sentence level emotion recognition based on decisions from subsentence segments. International Conference on Acoustics, Speech and Signal Processing (pp. 4940-4943). IEEE.
Jiang, B., Song, Y., Wei, S., Wang, M.-G., McLoughlin, I., & Dai, L.-R. (2014). Performance evaluation of deep bottleneck features for spoken language identification. International Symposium on Chinese Spoken Language Processing (pp. 143-147). Citeseer.
Jiang, D.-N., & Cai, L.-H. (2004). Speech emotion classification with the combination of statistic features and temporal features. ICME (pp. 1967-1970). Citeseer.
Karam, Z. N., Provost, E. M., Singh, S., Montgomery, J., Archer, C., Harrington, G., et al. (2014). Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. International Conference on Acoustics, Speech and Signal Processing (pp. 4858-4862). IEEE.
Katsimerou, C., Albeda, J., Huldtgren, A., Heynderickx, I., & Redi, J. A. (2016). Crowdsourcing empathetic intelligence: the case of the annotation of EMMA database for emotion and mood recognition. ACM Transactions on Intelligent Systems and Technology, 7(4), p. 51.
Khorram, S., Gideon, J., McInnis, M. G., & Provost, E. M. (2016). Recognition of Depression in Bipolar Disorder: Leveraging Cohort and Person-Specific Knowledge. INTERSPEECH, (pp. 1215-1219).
Lanata, A., Greco, A., Valenza, G., & Scilingo, E. P. (2014). A pattern recognition approach based on electrodermal response for pathological mood identification in bipolar disorders. International Conference on Acoustics, Speech and Signal Processing (pp. 3601-3605). IEEE.
Leucht, S., Pitschel-Walz, G., Abraham, D., & Kissling, W. (1999). Efficacy and extrapyramidal side-effects of the new antipsychotics olanzapine, quetiapine, risperidone, and sertindole compared to conventional antipsychotics and placebo. A meta-analysis of randomized controlled trials. Schizophrenia Research, 35(1), pp. 51-68.
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. Asia-Pacific Signal and Information Processing Association (pp. 1-4). IEEE.
Lin, J.-C., Wu, C.-H., & Wei, W.-L. (2012). Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition. IEEE Transactions on Multimedia, 14(1), pp. 142-156.
Liu, X.-Y., Zhou, Y.-M., & Zheng, R.-S. (2007). Measuring semantic similarity in WordNet. International Conference on Machine Learning and Cybernetics (pp. 3431-3435). IEEE.
Low, L.-S. A., Maddage, N. C., Lech, M., Sheeber, L., & Allen, N. (2010). Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents. International Conference on Acoustics Speech and Signal Processing (pp. 5154-5157). IEEE.
Low, L.-S., Maddage, N. C., Lech, M., Sheeber, L. B., & Allen, N. B. (2011). Detection of clinical depression in adolescents speech during family interactions. IEEE Transactions on Biomedical Engineering, 58(3), pp. 574-586.
Mansoorizadeh, M., & Charkari, N. M. (2017). Speech emotion recognition: Comparison of speech segmentation approaches. Proceedings of IKT, Mashad, Iran. Citeseer.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Ministry of Health and Welfare. (n.d.). Retrieved from https://iiqs.mohw.gov.tw/index.aspx
Moore II, E., Clements, M. A., Peifer, J. W., & Weisser, L. (2008). Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Transactions on Biomedical Engineering, 55(1), pp. 96-107.
Mower, E., & Narayanan, S. (2011). A hierarchical static-dynamic framework for emotion classification. International Conference on Acoustics, Speech and Signal Processing (pp. 2372-2375). IEEE.
Muaremi, A., Gravenhorst, F., Grünerbl, A., Arnrich, B., & Tröster, G. (2014). Assessing bipolar episodes using speech cues derived from phone calls. International Symposium on Pervasive Computing Paradigms for Mental Health (pp. 103-114). Springer.
Narayanan, A., & Wang, D. (2013). Ideal ratio mask estimation using deep neural networks for robust speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7092-7096). IEEE.
Neiberg, D., Elenius, K., Karlsson, I., & Laskowski, K. (2009). Emotion recognition in spontaneous speech. Working Papers in Linguistics, 52, pp. 101-104.
Niu, J., Qian, Y., & Yu, K. (2014). Acoustic emotion recognition using deep neural network. International Symposium on Chinese Spoken Language Processing (pp. 128-132). IEEE.
Ooi, K. E., Lech, M., & Allen, N. B. (2013). Multichannel weighted speech classification system for prediction of major depression in adolescents. IEEE Transactions on Biomedical Engineering, 60(2), pp. 497-506.
Osmani, V., Gruenerbl, A., Bahle, G., Haring, C., Lukowicz, P., & Mayora, O. (2015). Smartphones in mental health: detecting depressive and manic episodes. arXiv preprint arXiv:1510.01665.
Perlis, R. H. (2005). Misdiagnosis of bipolar disorder. The American Journal of Managed Care, 11(9 Suppl), pp. S271-4.
Picard, R. W. (1997). Affective computing. MIT Press, Cambridge.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), pp. 257-286.
Ren_CECps 1.0. (n.d.). Retrieved from http://a1-www.is.tokushima-u.ac.jp/member/ren/Ren-CECps1.0/Ren-CECps1.0.html
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), p. 1161.
Sanchez, M. H., Vergyri, D., Ferrer, L., Richey, C., Garcia, P., Knoth, B., et al. (2011). Using prosodic and spectral features in detecting depression in elderly males. Annual Conference of the International Speech Communication Association.
Schaefer, K. L., Baumann, J., Rich, B. A., Luckenbaugh, D. A., & Zarate Jr, C. A. (2010). Perception of facial emotion in adults with bipolar or unipolar depression and controls. Journal of Psychiatric Research, 44(16), pp. 1229-1235.
Schleusing, O., Renevey, P., Bertschi, M., Koller, J.-M., & Paradiso, R. (2011). Monitoring physiological and behavioral signals to detect mood changes of bipolar patients. International Symposium on Medical Information & Communication Technology (pp. 130-134). IEEE.
Schuller, B., & Devillers, L. (2010). Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm. INTERSPEECH, (pp. 801-804).
Schuller, B., Steidl, S., & Batliner, A. (2009). The interspeech 2009 emotion challenge. Tenth Annual Conference of the International Speech Communication Association.
SchullerBjörn, & RigollGerhard. (2006). Timing levels in segment-based speech emotion recognition. INTERSPEECH.
SchullerBjörn, RigollGerhard, & LangManfred. (2003). Hidden Markov model-based speech emotion recognition. International Conference on Acoustics, Speech, and Signal Processing. 2, 頁 II-1. IEEE.
Singh, T., & Rajput, M. (2006). Misdiagnosis of bipolar disorder. Psychiatry (Edgmont), 3(10), pp. 57-63.
Suicide - Wikipedia. (n.d.). Retrieved from https://en.wikipedia.org/wiki/Suicide
Thanathamathee, P. (2014). Boosting with feature selection technique for screening and predicting adolescents depression. International Conference on Digital Information and Communication Technology and it's Applications (pp. 23-27). IEEE.
Thayer, R. E. (1990). The biopsychology of mood and arousal. Oxford University Press.
The 3rd CCF Conference on Natural Language Processing & Chinese Computing. (2014). Retrieved from http://tcci.ccf.org.cn/conference/2014/index.html
Tzirakis, P., Trigeorgis, G., Nicolaou, M. A., Schuller, B. W., & Zafeiriou, S. (2017). End-to-end multimodal emotion recognition using deep neural networks. IEEE Journal of Selected Topics in Signal Processing, 11(8), pp. 1301-1309.
Valenza, G., Citi, L., Gentili, C., Lanata, A., Scilingo, E. P., & Barbieri, R. (2015). Characterization of depressive states in bipolar patients using wearable textile technology and instantaneous heart rate variability assessment. IEEE Journal of Biomedical and Health Informatics, 19(1), pp. 263-274.
Valenza, G., Nardelli, M., Lanata, A., Gentili, C., Bertschy, G., Paradiso, R., et al. (2014). Wearable monitoring for mood recognition in bipolar disorder based on history-dependent long-term heart rate variability analysis. IEEE Journal of Biomedical and Health Informatics, 18(5), pp. 1625-1635.
Vandyke, D. (2013). Depression detection & emotion classification via data-driven glottal waveforms. Humaine Association Conference on Affective Computing and Intelligent Interaction (pp. 642-647). IEEE.
Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), pp. 69-75.
Watson, D., & Tellegen, A. (1985). Toward a consensual structure of mood. Psychological Bulletin, 98(2), p. 219.
Wu, C.-H., & Liang, W.-B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing, 2(1), pp. 10-21.
Wu, C.-H., Lin, J.-C., & Wei, W.-L. (2013). Two-level hierarchical alignment for semi-coupled HMM-based audiovisual emotion recognition with temporal course. IEEE Transactions on Multimedia, 15(8), pp. 1880-1895.
Wu, C.-H., Lin, J.-C., & Wei, W.-L. (2014). Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Transactions on Signal and Information Processing.
Wu, C.-H., Wei, W.-L., Lin, J.-C., & Lee, W.-Y. (2013). Speaking effect removal on emotion recognition from facial expressions based on eigenface conversion. IEEE Transactions on Multimedia, 15(8), pp. 1732-1744.
Xia, R., Deng, J., Schuller, B., & Liu, Y. (2014). Modeling gender information for emotion recognition using denoising autoencoder. International Conference on Acoustics, Speech and Signal Processing (pp. 990-994). IEEE.
Yingthawornsuk, T., & Shiavi, R. G. (2008). Distinguishing depression and suicidal risk in men using GMM based frequency contents of affective vocal tract response. International Conference on Control, Automation and Systems (pp. 901-904). IEEE.
Young, R. C., Biggs, J. T., Ziegler, V. E., & Meyer, D. A. (1978). A rating scale for mania: reliability, validity and sensitivity. The British Journal of Psychiatry, 133(5), pp. 429-435.
Yu, L.-C., Lee, L.-H., Hao, S., Wang, J., He, Y., Hu, J., et al. (2016). Building Chinese affective resources in valence-arousal dimensions. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (pp. 540-545).