| 研究生: |
王柏皓 Wang, Po-Hao |
|---|---|
| 論文名稱: |
利用卷積類神經網路對華人臉部表情進行情緒判斷 Chinese facial expression recognition based on deep convolutional neural network |
| 指導教授: |
黃柏僩
Huang, Po-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
社會科學院 - 心理學系 Department of Psychology |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 77 |
| 中文關鍵詞: | 卷積類神經網路 、深度學習 、臉部表情辨識 |
| 外文關鍵詞: | convolutional neural network, deep learning, facial expression recognition |
| 相關次數: | 點閱:104 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
臉部表情辨識(facial expression recognition, FER)是指透過規則或演算法的建立,賦予電腦或機器辨識人類臉部表情能力的技術,在電腦視覺領域中是一相當熱門的議題,也被廣泛應用於人機互動、即時監控、社交娛樂等各方面。現今的FER研究多半聚焦在如何提升辨識的正確率或建立強韌性(Robustness)的辨識模型,鮮少有研究著墨於跨文化臉部表情資料的比較。而近年來,心理學的研究陸續證實了人類無論在表情的呈現或辨識上,都會因屬於不同文化而有所差異,東方人較仰賴眼睛的線索,而西方人則較依靠嘴巴等其他臉部特徵。
本篇研究使用LeNet、AlexNet、VGGNet以及ResNet四個經典的卷積類神經網路模型,對台灣地區華人情緒與相關心理生理資料庫與公開的臉部表情圖片資料庫KDEF進行臉部表情辨識,並將結果與人類的辨識結果相互比較,試圖探討機器與人類在臉部表情辨識的異同;再來,我們對臉部的重要特徵如眼睛、嘴巴進行遮蔽,透過辨識正確率下降程度的差異,推測卷積類神經網路模型不同文化底下的臉部表情是否使用不同的特徵進行辨識,並透過透過使用可解釋人工智慧(Explainable AI)的技術,嘗試以視覺化的方式理解機器在進行臉部表情辨識時關注的臉部位置。
Facial expression recognition (FER) is a such fairly popular research topic, has been applied in various areas of life. However, most FER research focuses on improving accuracy or building robust models and rarely focuses on the impact of different cultures on FER. In this thesis, there are two studies are conducted to explore the impact of culture on FER. In study one, we use four kinds of classical convolutional neural network structures, LeNet, AlexNet, VGGNet, and ResNet, to recognize seven categories of emotional pictures, which are from two different cultural datasets¬ — Taiwan corpora Chinses emotions and relevant psychophysiological data (TW dataset) and The Karolinska Directed Emotional Faces (KDEF dataset). We compare the results with the results of humans in the picture recognition test try to explore the differences between facial recognition of machines and humans. In study two, we mask the eyes and mouth in the picture, try to find what are the most important features for a machine to recognize different facial expressions.
Bakkes, S., Tan, C. T., & Pisan, Y. (2012). Personalised gaming: a motivation and overview of literature. Paper presented at the Proceedings of The 8th Australasian Conference on Interactive Entertainment: Playing the System.
Bulwer, J. (1966). Pathomyotamia Or A Dissection of the Significative Muscles of the Affections of the Minde: Being an Essay to a New Method of Observing the Most Important Movings of the Muscles of the Head, as They are the Neerest and Immediate Organs of the Voluntarie Or Impetuous Motions of the Mind. With the Proposall of a New Nomenclature of the Muscles. By JB Sirnamed the Chirosopher: WW.
Chen, J., Chi, Z., & Fu, H. (2016). Facial expression recognition with dynamic Gabor volume feature. Paper presented at the 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP).
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (1998). A Comparative Evaluation of Active Appearance Model Algorithms. Paper presented at the BMVC.
Cootes, T. F., Taylor, C. J., Cooper, D. H., & Graham, J. (1995). Active shape models-their training and application. Computer vision and image understanding, 61(1), 38-59.
Crivelli, C., Jarillo, S., Russell, J. A., & Fernández-Dols, J.-M. (2016). Reading emotions from faces in two indigenous societies. Journal of Experimental Psychology: General, 145(7), 830.
Darwin, C., & Prodger, P. (1998). The expression of the emotions in man and animals: Oxford University Press, USA.
DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E., Gainer, A., . . . Lhommet, M. (2014). SimSensei Kiosk: A virtual human interviewer for healthcare decision support. Paper presented at the Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems.
Ding, H., Zhou, S. K., & Chellappa, R. (2017). Facenet2expnet: Regularizing a deep face recognition net for expression recognition. Paper presented at the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
Duchenne, G.-B., & de Boulogne, G.-B. D. (1990). The mechanism of human facial expression: Cambridge university press.
Duric, Z., Gray, W. D., Heishman, R., Li, F., Rosenfeld, A., Schoelles, M. J., . . . Wechsler, H. (2002). Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proceedings of the IEEE, 90(7), 1272-1289.
Ekman, P., & Friesen, W. V. (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. semiotica, 1(1), 49-98.
Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2), 124.
Ekman, P., Friesen, W. V., O'sullivan, M., Chan, A., Diacoyanni-Tarlatzis, I., Heider, K., . . . Ricci-Bitti, P. E. (1987). Universals and cultural differences in the judgments of facial expressions of emotion. Journal of personality and social psychology, 53(4), 712.
Elfenbein, H. A., & Ambady, N. (2002a). Is there an in-group advantage in emotion recognition?
Elfenbein, H. A., & Ambady, N. (2002b). On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychological bulletin, 128(2), 203.
Elfenbein, H. A., & Ambady, N. (2003a). Universals and cultural differences in recognizing emotions. Current directions in psychological science, 12(5), 159-164.
Elfenbein, H. A., & Ambady, N. (2003b). When familiarity breeds accuracy: Cultural exposure and facial emotion recognition. Journal of personality and social psychology, 85(2), 276.
Feichtinger, H. G., & Strohmer, T. (1997). Gabor Analysis and Algorithms: Theory and Applications: Springer Science & Business Media.
Friesen, E., & Ekman, P. (1978). Facial action coding system: a technique for the measurement of facial movement. Palo Alto, 3.
Galati, D., Miceli, R., & Sini, B. (2001). Judging and coding facial expression of emotions in congenitally blind children. International Journal of Behavioral Development, 25(3), 268-278.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
Hinton, G., Osindero, S., Welling, M., & Teh, Y. W. (2006). Unsupervised discovery of nonlinear structure using contrastive backpropagation. Cognitive science, 30(4), 725-731.
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504-507.
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of physiology, 160(1), 106-154.
Izard, C. E. (1971). The face of emotion.
Jack, R. E. (2013). Culture and facial expressions of emotion. Visual Cognition, 21(9-10), 1248-1286.
Jack, R. E., Blais, C., Scheepers, C., Schyns, P. G., & Caldara, R. (2009). Cultural confusions show that facial expressions are not universal. Current biology, 19(18), 1543-1548.
Jack, R. E., Caldara, R., & Schyns, P. G. (2012). Internal representations reveal cultural diversity in expectations of facial expressions of emotion. Journal of Experimental Psychology: General, 141(1), 19.
Kafetsios, K., & Hess, U. (2015). Are you looking at me? The influence of facial orientation and cultural focus salience on the perception of emotion expressions. Cogent Psychology, 2(1), 1005493.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Paper presented at the Advances in neural information processing systems.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. Paper presented at the iccv.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.
Lucey, P., Cohn, J. F., Matthews, I., Lucey, S., Sridharan, S., Howlett, J., & Prkachin, K. M. (2010). Automatically detecting pain in video through facial action units. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(3), 664-674.
Marsh, A. A., Elfenbein, H. A., & Ambady, N. (2003). Nonverbal “accents” cultural differences in facial expressions of emotion. Psychological Science, 14(4), 373-376.
Masuda, T., Ellsworth, P. C., Mesquita, B., Leu, J., Tanida, S., & Van de Veerdonk, E. (2008). Placing the face in context: cultural differences in the perception of facial emotion. Journal of personality and social psychology, 94(3), 365.
Matsumoto, D., Olide, A., Schug, J., Willingham, B., & Callan, M. (2009). Cross-cultural judgments of spontaneous facial expressions of emotion. Journal of Nonverbal Behavior, 33(4), 213.
Matsumoto, D., Olide, A., & Willingham, B. (2009). Is there an ingroup advantage in recognizing spontaneously expressed emotions? Journal of Nonverbal Behavior, 33(3), 181.
Mehrabian, A., & Epstein, N. (1972). A measure of emotional empathy 1. Journal of personality, 40(4), 525-543.
Moriguchi, J., Ezaki, T., Tsukahara, T., Fukui, Y., Ukai, H., Okamoto, S., . . . Ikeda, M. (2005). Effects of aging on cadmium and tubular dysfunction markers in urine from adult women in non-polluted areas. International archives of occupational and environmental health, 78(6), 446-451.
Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1), 51-59.
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence(7), 971-987.
Rodger, H., Kelly, D. J., Blais, C., & Caldara, R. (2010). Inverting faces does not abolish cultural diversity in eye movements. Perception, 39(11), 1491-1503.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Tomkins, S. S. (1962). Affect imagery consciousness: Volume I: The positive affects (Vol. 1): Springer publishing company.
Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain. Image and vision computing, 27(12), 1743-1759.
Vural, E., Cetin, M., Ercil, A., Littlewort, G., Bartlett, M., & Movellan, J. (2007). Drowsy driver detection through facial movement analysis. Paper presented at the International Workshop on Human-Computer Interaction.
Yuki, M., Maddux, W. W., & Masuda, T. (2007). Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and the United States. Journal of Experimental Social Psychology, 43(2), 303-311.
陳建中, 卓淑玲, & 曾榮瑜. (2013). 台灣地區華人情緒與相關心理生理資料庫─ 專業表演者臉部表情常模資料. 中華心理學刊, 55(4), 439-454.
襲充文, 黃世琤, & 葉娟妤. (2013). 台灣地區華人情緒與相關心理生理資料庫—大學生基本情緒臉部表情資料庫. Chinese Journal of Psychology, 55(4), 455-474.