簡易檢索 / 詳目顯示

研究生: 張家誠
Chang, Chia-Cheng
論文名稱: 探討臉部表情巨觀與微觀之連續變化於情感性疾患之分類
Exploring Macroscopic and Microscopic Fluctuation of Facial Expression for Mood Disorder Classification
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 63
中文關鍵詞: 情感性疾患臉部表情連續變化自發性臉部資料庫動作單元移動向量調變頻譜小波原理
外文關鍵詞: Mood disorder, facial fluctuation, spontaneous facial database, action unit, motion vector, modulation spectrum, wavelet decomposition
相關次數: 點閱:111下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在情感疾患的臨床診斷中,有大部分的躁鬱症病患被誤判為憂鬱症。然而,由於臨床醫生證實,躁鬱症病患在就醫時常有臉部表情淡化的現象。因此本論文期望利用此一特性搭配機器學習的技術,建立一個客觀且快速的醫師輔助診斷系統。本論文之研究主要透過收集病患們藉由觀看情緒性的刺激影片時所呈現出來的臉部表情反應,分析短期性臉部表情資料的特徵來進行情感性疾患之分類。本論文針對情感疾患的分類從臉部變動的角度切入進行分析。並且分別就巨觀與微觀的角度提出各自的相對應的方法。最後運用決策層融合的技巧結合巨觀與微觀之優點以得到較佳之分類效能。
    在巨觀的角度裡,本論文採用動作單元(Action Unit)來描述臉部的肌肉變化,接著利用調變頻譜(Modulation Spectrum)抽取臉部動作單元在短時間連續變化的參數,並且使用類神經網路學習出各情感疾患在不同情緒激發下之片段模型。最後應用幾何平均和連乘法則分別結合不同片段與動作單元獲得分類結果。
    而在微觀的角度裡,我們使用移動向量(Motion Vector)能偵測臉部細微變化的特性,考量其在八大基本方向之連續變化,並以小波原理抽取包含頻帶能量與亂度等的連續變化參數。並且採用自動編碼器(Auto-encoder)進行降維,之後再應用長短期記憶(Long Short Term Memory)類神經網路來建立情感性疾患分類模型。
    為了驗證本論文提出的方法,我們透過實驗分別從躁鬱症、憂鬱症與正常人取得12位患者進行分析。巨觀的角度可以獲得61.1%的辨識率,而微觀的角度則可以得到67.7%的辨識率。經由決策層融合兩者之優點則可以取得72.2%的辨識率。另外,我們亦分別對巨觀與微觀的方法均採用一些不同的方法予以比較,證實本論文提出之方法可以獲得較好的效能。

    In clinical diagnosis of mood disorder, a large portion of bipolar disorder patients (BDs) are misdiagnosed as unipolar depression (UDs). Clinicians have confirmed that BDs generally show “Reduced Affect” during clinical treatment. Thus, it is expected to build an objective and one-time diagnosis system for diagnosis assistance by using machine learning techniques. In this thesis, facial expressions of BD, UD and control group elicited by emotional video clips are collected for exploring temporal fluctuation characteristics among the three groups. The differences of facial expressions among mood disorders are investigated by observing macroscopic and microscopic fluctuations. To deal with these problems, the corresponding methods for feature extraction and modeling are proposed. Finally, decision level fusion is utilized by combining the results from multiple approaches to improve the classification performance.
    From the viewpoint of macroscopic facial expression, action unit (AU) is applied for describing the temporal transformation of muscles. Then, modulation spectrum is used for extracting short-term variation of AU. The artificial neutral network (ANN) is then applied to characterize the interval-based mood disorder. By using the geometric average and product rule, intervals among different emotions and AUs are integrated to obtain the results.
    On the other hand, motion vector (MV) is employed for observing subtle changes in microscopic facial expression. Eight basic orientation of motion vector changes is considered for describing micro fluctuation. Then, wavelet decomposition is applied to extract entropy and energy of different frequency bands. Besides, for dimensionality reduction, an autoencoder neural network is adopted to extract the bottleneck features. Finally, in order to describe the long-term variation among different emotional elicitations, the long short term memory (LSTM) is employed for modeling mood disorders.
    For evaluation of the proposed method, 12 subjects for each group (i.e. BD, UD, C) are included in the K-fold (K=12) cross validation experiments. Macroscopic expression reached 61.1% classification accuracy and microscopic expression achieved 67.7% accuracy. The proposed approach based on the fusion of both classification results obtained 72.2% accuracy which indicates that AU and MV descriptors are complementary to each other.

    摘要 I Abstract III 誌謝 V Content VII List of Tables XI List of Figures XII Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Problem and Goal 4 1.3.1 Goal 4 1.3.2 Problems and Proposed Ideas 5 1.4 Related Work 6 1.4.1 Survey of Classification Methods for Mood Disorder 6 1.4.2 Model 7 1.4.3 Reduced Affect in Facial Expression 8 1.5 Thesis Architecture 10 Chapter 2 Dataset 12 2.1 CHI-MEI Dataset 12 2.1.1 Video Clips for Eliciting Emotions 12 2.1.2 Subjects 14 2.1.3 Environment of Data Collection 15 2.1.4 Process of Data Collection 15 2.1.5 Data Format and Data Structure 16 2.2 DISFA Dataset 17 2.2.1 Spontaneous Databases 18 2.2.2 Data Format and Data Structure 19 Chapter 3 Proposed Method and System Framework 21 3.1 Preprocessing 21 3.1.1 Video Selection 22 3.1.2 Face Alignment 22 3.2 Macroscopic Fluctuation 23 3.2.1 Macroscopic Facial Descriptor 24 3.2.2 Fluctuation Features 28 3.2.3 Mood Disorder Detection 31 3.3 Microscopic Fluctuation 32 3.3.1 Microscopic Facial Descriptor 33 3.3.2 Fluctuation Feature 38 3.3.3 Dimensionality Reduction 41 3.3.4 Mood Disorder Detection 42 3.4 Decision Level Fusion 44 Chapter 4 Experiments 46 4.1 Macroscopic Experiments 46 4.1.1 Action Unit Feature Comparison 46 4.1.2 Effect of k Values in Modulation Spectrum 47 4.1.3 Effect of Number of ANN Hidden Nodes 48 4.1.4 Comparison of Different Macroscopic Methods 49 4.2 Microscopic Experiments 51 4.2.1 Effect of Auto-encoder Hidden Node Number 51 4.2.2 Effect of LSTM Hidden Node Number 52 4.2.3 Comparison of Different Microscopic Methods 53 4.3 Fusion Experiments 55 Chapter 5 Conclusions and Future Work 57 5.1 Conclusions 57 5.2 Future Work 58 References 59

    [1] A. P. Association, Diagnostic and statistical manual of mental disorders (DSM-5®): American Psychiatric Pub, 2013.
    [2] "生命中的暗潮-男性與女性的憂鬱指數," http://scc.yuntech.edu.tw/column/AA/a/a_01/a_01_12.htm.
    [3] C.-S. L. Tiffany S.-T Fu, David Gunnell, W.-C Lee, and Andrew T.-A Cheng, "Changing Trends in the Prevalence of Common Mental Disorders in Taiwan: A 20-year Repeated Cross-sectional Survey," Lancet vol. 381, 2013.
    [4] "躁鬱症常被誤診 吃錯藥病更重," http://news.ltn.com.tw/news/life/paper/634434, 2012.
    [5] L. L. Hirschfeld RM, Vornik LA, "Perceptions and impact of bipolar disorder: how far have we really come? Results of the national depressive and manic-depressive association 2000 survey of individuals with bipolar disorder.," J Clin Psychiatry, vol. 64, pp. 161-174, 2003.
    [6] D. Arnone, J. Cavanagh, D. Gerber, S. Lawrie, K. Ebmeier, and A. McIntosh, "Magnetic resonance imaging studies in bipolar disorder and schizophrenia: meta-analysis," The British Journal of Psychiatry, vol. 195, pp. 194-201, 2009.
    [7] C. McDonald, J. Zanelli, S. Rabe-Hesketh, I. Ellison-Wright, P. Sham, S. Kalidindi, et al., "Meta-analysis of magnetic resonance imaging brain morphometry studies in bipolar disorder," Biological psychiatry, vol. 56, pp. 411-417, 2004.
    [8] M. X. Liu C-H, Li F, Wang Y-J, Tie C-L, et al, "Regional Homogeneity within the Default Mode Network in Bipolar Depression: A Resting-State Functional Magnetic Resonance Imaging Study.," PLoS, vol. 7, 2012.
    [9] V. D. C. Juan I. Arribas, and Tülay Adalı, "Automatic Bayesian classification of healthy controls, bipolar disorder and schizophrenia using intrinsic connectivity maps from fMRI data," IEEE Trans Biomed Eng., vol. 57, pp. 1850-1860, 2010.
    [10] B. Rashid, E. Damaraju, G. D. Pearlson, and V. D. Calhoun, "Dynamic connectivity states estimated from resting fMRI Identify differences among Schizophrenia, bipolar disorder, and healthy control subjects," Frontiers in human neuroscience, vol. 8, p. 897, 2014.
    [11] N. F. Jie, M. H. Zhu, X. Y. Ma, E. A. Osuch, M. Wammes, J. Th, et al., "Discriminating Bipolar Disorder From Major Depression Based on SVM-FoBa: Efficient Feature Selection With Multimodal Brain Imaging Data," IEEE Transactions on Autonomous Mental Development, vol. 7, pp. 320-331, 2015.
    [12] G. Akinci, E. Polat, and O. M. Kocak, "A video based eye detection system for bipolar disorder diagnosis," in 2012 20th Signal Processing and Communications Applications Conference (SIU), 2012, pp. 1-4.
    [13] G. Valenza, M. Nardelli, A. Lanata, C. Gentili, G. Bertschy, R. Paradiso, et al., "Wearable monitoring for mood recognition in bipolar disorder based on history-dependent long-term heart rate variability analysis," IEEE Journal of Biomedical and Health Informatics, vol. 18, pp. 1625-1635, 2014.
    [14] E. Parvinnia, M. Sabeti, M. Z. Jahromi, and R. Boostani, "Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm," Journal of King Saud University-Computer and Information Sciences, vol. 26, pp. 1-6, 2014.
    [15] F. AliMardani, R. Boostani, and B. Blankertz, "Presenting a Spatial-Geometric EEG Feature to Classify BMD and Schizophrenic Patients," International Journal of Advances in Telecommunications, Electrotechnics, Signals and Systems, vol. 5, pp. 79-85, 2016.
    [16] S. Alghowinem, R. Goecke, M. Wagner, G. Parkerx, and M. Breakspear, "Head pose and movement analysis as an indicator of depression," in Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on, 2013, pp. 283-288.
    [17] K. E. B. Ooi, M. Lech, and N. B. Allen, "Multichannel weighted speech classification system for prediction of major depression in adolescents," IEEE Transactions on Biomedical Engineering, vol. 60, pp. 497-506, 2013.
    [18] F. Alimardani, R. Boostani, M. Azadehdel, A. Ghanizadeh, and K. Rastegar, "Presenting a new search strategy to select synchronization values for classifying bipolar mood disorders from schizophrenic patients," Engineering Applications of Artificial Intelligence, vol. 26, pp. 913-923, 2013.
    [19] A. Grünerbl, A. Muaremi, V. Osmani, G. Bahle, S. Oehler, G. Tröster, et al., "Smartphone-based recognition of states and state changes in bipolar disorder patients," IEEE Journal of Biomedical and Health Informatics, vol. 19, pp. 140-148, 2015.
    [20] L.-S. A. Low, N. C. Maddage, M. Lech, L. Sheeber, and N. Allen, "Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents," in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 5154-5157.
    [21] F. Eyben, S. Petridis, B. Schuller, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, "Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks," in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 5844-5847.
    [22] "Reduced affect display," https://en.wikipedia.org/wiki/Reduced_affect_display.
    [23] A. C. Vederman, S. L. Weisenbach, L. J. Rapport, H. M. Leon, B. D. Haase, L. M. Franti, et al., "Modality-specific alterations in the perception of emotional stimuli in bipolar disorder compared to healthy controls and major depressive disorder," Cortex, vol. 48, pp. 1027-1034, 2012.
    [24] H. Demirel, D. Yesilbas, I. Ozver, E. Yuksek, F. Sahin, S. Aliustaoglu, et al., "Psychopathy and facial emotion recognition ability in patients with bipolar affective disorder with or without delinquent behaviors," Comprehensive psychiatry, vol. 55, pp. 542-546, 2014.
    [25] L. M. Mazaira-Fernandez, A. Álvarez-Marquina, and P. Gómez-Vilda, "Improving Speaker Recognition by Biometric Voice Deconstruction," Frontiers in bioengineering and biotechnology, vol. 3, 2015.
    [26] 梁育綺, 謝淑蘭, 翁嘉英, and 孫蒨如, "台灣地區華人情緒與相關心理生理資料庫─ 標準化華語版情緒電影短片材料庫與主觀評量常模," 中華心理學刊, vol. 55, pp. 601-621, 2013.
    [27] J. J. Gross and R. W. Levenson, "Emotion elicitation using films," Cognition & emotion, vol. 9, pp. 87-108, 1995.
    [28] S. M. Mavadati, M. H. Mahoor, K. Bartlett, P. Trinh, and J. F. Cohn, "Disfa: A spontaneous facial action intensity database," IEEE Transactions on Affective Computing, vol. 4, pp. 151-160, 2013.
    [29] P. Lucey, J. F. Cohn, K. M. Prkachin, P. E. Solomon, S. Chew, and I. Matthews, "Painful monitoring: Automatic pain monitoring using the UNBC-McMaster shoulder pain expression archive database," Image and Vision Computing, vol. 30, pp. 197-205, 2012.
    [30] I. Sneddon, M. McRorie, G. McKeown, and J. Hanratty, "The belfast induced natural emotion database," IEEE Transactions on Affective Computing, vol. 3, pp. 32-41, 2012.
    [31] X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, et al., "BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database," Image and Vision Computing, vol. 32, pp. 692-706, 2014.
    [32] M. F. Valstar and M. Pantic, "Fully automatic recognition of the temporal phases of facial actions," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, pp. 28-43, 2012.
    [33] Y. Zhu, F. De la Torre, J. F. Cohn, and Y.-J. Zhang, "Dynamic cascades with bidirectional bootstrapping for action unit detection in spontaneous facial behavior," IEEE transactions on affective computing, vol. 2, pp. 79-91, 2011.
    [34] B. Jiang, M. F. Valstar, and M. Pantic, "Action unit detection using sparse appearance descriptors in space-time video volumes," in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, 2011, pp. 314-321.
    [35] G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, et al., "The computer expression recognition toolbox (CERT)," in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, 2011, pp. 298-305.
    [36] T. Simon, M. H. Nguyen, F. De La Torre, and J. F. Cohn, "Action unit detection with segment-based svms," in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 2737-2744.
    [37] S. Eleftheriadis, O. Rudovic, and M. Pantic, "Multi-conditional Latent Variable Model for Joint Facial Action Unit Detection," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3792-3800.
    [38] G. Tzimiropoulos and M. Pantic, "Gauss-newton deformable part models for face alignment in-the-wild," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1851-1858.
    [39] J.-C. Lin, C.-H. Wu, and W.-L. Wei, "Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition," IEEE Transactions on Multimedia, vol. 14, pp. 142-156, 2012.
    [40] M. F. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer, "Meta-analysis of the first facial expression recognition challenge," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, pp. 966-979, 2012.
    [41] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, p. 27, 2011.
    [42] S. Koelstra, M. Pantic, and I. Patras, "A dynamic texture-based approach to recognition of facial actions and their temporal models," IEEE transactions on pattern analysis and machine intelligence, vol. 32, pp. 1940-1954, 2010.
    [43] C.-H. Wu, W.-B. Liang, K.-C. Cheng, and J.-C. Lin, "Hierarchical modeling of temporal course in emotional expression for speech emotion recognition," in Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on, 2015, pp. 810-814.
    [44] N. Kanedera, T. Arai, H. Hermansky, and M. Pavel, "On the importance of various modulation frequencies for speech recognition," in Fifth European Conference on Speech Communication and Technology, 1997.
    [45] H.-H. Yen, "Detection of Mood Disorder Using Modulation Spectrum of Facial Action Unit Profiles," 2015.
    [46] S. Park and D. Kim, "Spontaneous facial expression classification with facial motion vectors," in Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on, 2008, pp. 1-6.
    [47] "Wavelet," https://en.wikipedia.org/wiki/Wavelet.
    [48] S. Zhang and Q. Sun, "Human pulse recognition based on wavelet transform and BP network," in Signal Processing, Communications and Computing (ICSPCC), 2015 IEEE International Conference on, 2015, pp. 1-4.
    [49] A. Kumar, L. K. Joshi, A. Pal, and A. Shukla, "MODWT based time scale decomposition analysis of BSE and NSE indexes financial time series," International Journal of Mathematical Analysis, vol. 5, pp. 1343-1352, 2011.
    [50] "Autoencoder," https://en.wikipedia.org/wiki/Autoencoder.

    下載圖示 校內:2018-08-31公開
    校外:2018-08-31公開
    QR CODE