簡易檢索 / 詳目顯示

研究生: 林栢仟
Lin, Po-Chien
論文名稱: 應用於家庭機器人之人臉身分與情緒綜合影像理解系統
Image Understanding System on Facial Identity and Emotion for Home Robot
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 56
中文關鍵詞: 多人身分辨識多人情緒辨識骨架辨識HC-SVM自我學習機制
外文關鍵詞: Multiple Facial Expression Recognition, Multiple Identity Recognition, Skeleton Recognition, HC-SVM, SFP, Auto-learning mechanism
相關次數: 點閱:90下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在近幾年中家庭機器人的發展突飛猛進,讓家庭機器人擁有情緒辨識和身分辨識的視覺功能可以大幅增進人機互動的使用體驗。本篇論文使用微軟的深度相機Kinect v2,取得人臉部標記點、臉部彩色RGB影像以及骨架資訊當作視覺系統的輸入,並將系統分成兩個部分。第一個部分是多人的人臉身分辨識,使用臉部彩色RGB影像和骨架資訊來進行身分辨識。一般家庭的人數大約在二到六人,在小規模的資料比對中骨架資訊具有相當的可靠度,彌補以往單純使用臉部影像的身分辨識只能辨識正臉的問題。除此之外為了提高可用性,本篇論文提出一個自我學習的機制,會自動找出適合的影像和骨架資訊去更新訓練樣本。第二個部分是多人的情緒辨識,使用由Kinect v2提供的臉部標記點。為了處理最高到30度的側臉問題,本篇論文提出一個旋轉校正的方法去正規化臉部標記點,以及能凸顯表情特徵的Salient Facial Points(SFPs)來做特徵擷取,透過提出改良多類別SVM分類器的Hierarchy-Coherence SVM(HC-SVM)和動作偵測來辨識出10種不同的Home Expression Units(HEUs),最後組合成4種居家常見的情緒。實驗結果顯示身分辨識的準確率可以高達86.33%,情緒辨識的準確率高達86.25%。在實際應用中,本系統可同時辨識多達三人的情緒和身分。

    The home robot has developed rapidly in recent years, but the human-machine interaction almost focus on voice conversation system instead of image understanding. The home robot have visual ability to recognize human emotion and identity could improve the user experience. The proposed research is based on Microsoft depth camera – Kinect v2 to get the Facial Landmarks, Facial RGB image, and Skeleton information as input of visual system. This system is divided into two parts, the first one is multiple facial identity recognition, and use the Facial RGB image and Skeleton information to recognize the identity. A normal family population is from two to six, and skeleton information has high reliability in small scale data comparison, that compensate the traditional shortcoming such as only can recognize front face. In addition, we propose an auto-learning mechanism, which can find a suitable image and skeleton to expand the training samples after check the human identity of image. The second part is multiple facial emotion recognition, we use the 1347 facial landmarks provided by Kinect v2 as input. In order to process up to 30 degrees of side face problems, this paper proposed an orientation normalization for facial landmarks, and the feature extraction is based on Salient Facial Points (SFPs) which could obviously observe the human expression. The proposed Hierarchy-Coherence SVM (HC-SVM) can improve the performance in comparison of SVM in Home Expression Units (HEUs) recognition, and the result could map to 4 original emotions. The experimental results have demonstrated the effectiveness of the proposed system, the recognition rate can achieve 86.33% for identity and 86.26% for facial expression, respectively. In addition, the system can recognize up to 3 people in the same time.

    中文摘要 I Abstract II Contents V Table List VII Figure List VIII Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Thesis Objective 3 1.4 Thesis Organization 3 Chapter 2 Related Works 5 2.1 The Survey of Kinect v2 5 2.2 The Survey of Face Recognition 8 2.3 The Survey of Emotion Recognition 9 Chapter 3 Multi-Facial Emotion Recognition Based on Facial Landmarks 12 3.1 System Framework 12 3.2 Feature Extraction for Facial Landmark 16 3.3 HEUs Recognition Based on Hierarchy-Coherence SVM 23 3.4 Crying Detection 25 3.5 Emotion Mapping Strategy 28 Chapter 4 Multi-Identity Recognition Based on Face Image and Skeleton Information 30 4.1 System Framework 30 4.2 Feature Extraction for Facial RGB image and Skeleton 32 4.3 Training Phase 38 4.4 Testing Phase 39 4.5 Family Member Identity Dataset 41 4.6 Auto-Learning Mechanism 44 Chapter 5 Experimental Result 46 5.1 Data Analysis Setup/Platform 46 5.2 Emotion Recognition Result 47 5.3 Identity Recognition Result 49 Chapter 6 Conclusion and Future Works 52 Chapter 7 References 53

    [1] R. A. Calvo and S. D'Mello, "Affect detection: An interdisciplinary review of models, methods, and their applications," IEEE Transactions on affective computing, vol. 1, pp. 18-37, 2010.
    [2] P. Ekman, "Emotional and conversational nonverbal signals," in Language, knowledge, and representation, ed: pp. 39-50, 2004.
    [3] S. H. Lee, K. N. K. Plataniotis, and Y. M. Ro, "Intra-class variation reduction using training expression images for sparse representation based facial expression recognition," IEEE Transactions on Affective Computing, vol. 5, pp. 340-351, 2014.
    [4] S. Taheri, V. M. Patel, and R. Chellappa, "Component-based recognition of facesand facial expressions," IEEE Transactions on Affective Computing, vol. 4, pp. 360-371, 2013.
    [5] S. Eleftheriadis, O. Rudovic, and M. Pantic, "Joint Facial Action Unit Detection and Feature Fusion: A Multi-Conditional Learning Approach," IEEE Transactions on Image Processing, vol. 25, pp. 5727-5742, 2016.
    [6] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, "Labeled faces in the wild: A database for studying face recognition in unconstrained environments," Technical Report 07-49, University of Massachusetts, Amherst2007.
    [7] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and simile classifiers for face verification," in Computer Vision, 2009 IEEE 12th International Conference on, pp. 365-372, 2009.
    [8] Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1701-1708, 2014.
    [9] F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A unified embedding for face recognition and clustering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815-823, 2015.
    [10] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, "The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pp. 94-101, 2010.
    [11] M. F. Valstar and M. Pantic, "Fully automatic recognition of the temporal phases of facial actions," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, pp. 28-43, 2012.
    [12] P. Lucey, J. F. Cohn, I. Matthews, S. Lucey, S. Sridharan, J. Howlett, et al., "Automatically detecting pain in video through facial action units," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, pp. 664-674, 2011.
    [13] M. H. Mahoor, M. Zhou, K. L. Veon, S. M. Mavadati, and J. F. Cohn, "Facial action unit recognition with sparse representation," in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pp. 336-342, 2011.
    [14] S. Koelstra, M. Pantic, and I. Patras, "A dynamic texture-based approach to recognition of facial actions and their temporal models," IEEE transactions on pattern analysis and machine intelligence, vol. 32, pp. 1940-1954, 2010.
    [15] O. Rudovic, V. Pavlovic, and M. Pantic, "Context-sensitive dynamic ordinal regression for intensity estimation of facial action units," IEEE transactions on pattern analysis and machine intelligence, vol. 37, pp. 944-958, 2015.
    [16] M. S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, and J. Movellan, "Recognizing facial expression: machine learning and application to spontaneous behavior," in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, pp. 568-573, 2005.
    [17] W.-S. Chu, F. De la Torre, and J. F. Cohn, "Selective transfer machine for personalized facial action unit detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3515-3522, 2013.
    [18] O. Rudovic, V. Pavlovic, and M. Pantic, "Kernel conditional ordinal random fields for temporal segmentation of facial action units," in Computer Vision–ECCV 2012. Workshops and Demonstrations, pp. 260-269, 2012.
    [19] G. Yang and T. S. Huang, "Human face detection in a complex background," Pattern recognition, vol. 27, pp. 53-63, 1994.
    [20] T. A. McGregor, R. L. Klatzky, C. Hamilton, and S. J. Lederman, "Haptic classification of facial identity in 2D displays: Configural versus feature-based processing," Haptics, IEEE Transactions on, vol. 3, pp. 48-55, 2010.
    [21] C.-T. Tu and J.-J. J. Lien, "Automatic location of facial feature points and synthesis of facial sketches using direct combined model," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 40, pp. 1158-1169, 2010.
    [22] K. Sandeep and A. Rajagopalan, "Human Face Detection in Cluttered Color Images Using Skin Color, Edge Information," in ICVGIP, 2002.
    [23] R. Brunelli and T. Poggio, "Face recognition: Features versus templates," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 1042-1052, 1993.
    [24] A. K. Jain, Y. Zhong, and M.-P. Dubuisson-Jolly, "Deformable template models: A review," Signal processing, vol. 71, pp. 109-129, 1998.
    [25] B. Moghaddam and A. P. Pentland, "Face recognition using view-based and modular eigenspaces," in SPIE's 1994 International Symposium on Optics, Imaging, and Instrumentation, pp. 12-21, 1994.
    [26] Y. Fu and N. Zheng, "M-face: An appearance-based photorealistic model for multiple facial attributes rendering," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 16, pp. 830-842, 2006.
    [27] F. Dornaika and F. Davoine, "On appearance based face and facial action tracking," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 16, pp. 1107-1124, 2006.
    [28] K.-K. Sung and T. Poggio, "Example-based learning for view-based human face detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, pp. 39-51, 1998.
    [29] M. Kirby and L. Sirovich, "Application of the Karhunen-Loeve procedure for the characterization of human faces," IEEE Transactions on Pattern Analysis and Machine Intelligence,, vol. 12, pp. 103-108, 1990.
    [30] J. Miao, B. Yin, K. Wang, L. Shen, and X. Chen, "A hierarchical multiscale and multiangle system for human face detection in a complex background using gravity-center template," Pattern Recognition, vol. 32, pp. 1237-1248, 1999.
    [31] P. Ekman and W. V. Friesen, "Facial action coding system," 1977.
    [32] R. E. Jack, O. G. Garrod, H. Yu, R. Caldara, and P. G. Schyns, "Facial expressions of emotion are not culturally universal," Proceedings of the National Academy of Sciences, vol. 109, pp. 7241-7244, 2012.
    [33] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, p. 27, 2011.
    [34] M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski, "Measuring facial expressions by computer image analysis," Psychophysiology, vol. 36, pp. 253-263, 1999.
    [35] J. J.-J. Lien, T. Kanade, J. F. Cohn, and C.-C. Li, "Detection, tracking, and classification of action units in facial expression," Robotics and Autonomous Systems, vol. 31, pp. 131-146, 2000.
    [36] Y. Moses, D. Reynard, and A. Blake, "Determining facial expressions in real time," in Computer Vision, 1995. Proceedings., Fifth International Conference on, pp. 296-301, 1995.
    [37] W. Hamilton, "On Quaternions; or on a new System of Imaginaries in Algebra (letter to John T. Graves, dated October 17, 1843)," Philos. Magazine, vol. 25, pp. 489-495, 1843.
    [38] K. Kunze and H. Schaeben, "The bingham distribution of quaternions and its spherical radon transform in texture analysis," Mathematical Geology, vol. 36, pp. 917-943, 2004.
    [39] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.
    [40] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: a local SVM approach," in Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, pp. 32-36, 2004.
    [41] S. A. Dudani, "The distance-weighted k-nearest-neighbor rule," IEEE Transactions on Systems, Man, and Cybernetics, pp. 325-327, 1976.
    [42] S. Happy and A. Routray, "Automatic facial expression recognition using features of salient facial patches," IEEE transactions on Affective Computing, vol. 6, pp. 1-12, 2015.

    無法下載圖示 校內:2022-12-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE