簡易檢索 / 詳目顯示

研究生: 徐子媛
Hsu, Tzu-Yuan
論文名稱: 應用於三維人臉密集對齊及重建之生成對抗網路
Generative Adversarial Network For 3D Face Dense Alignment and Reconstruction
指導教授: 謝明得
Shieh, Ming-Der
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 56
中文關鍵詞: 3D人臉重建人臉對齊生成對抗網路學習深度學習電腦視覺
外文關鍵詞: 3D face reconstruction, face alignment, generative adversarial networks, deep learning, computer vision
相關次數: 點閱:103下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由單張二維圖像重建三維人臉並提供密集對齊(Dense alignment)及重建(Face reconstruction)是電腦視覺中具挑戰性的問題。尤其在二維平面上進行對齊時,如何處理大姿勢或遮擋即是一個非常困難的問題,許多學者亦積極尋找可以利用三維空間訊息的解法。在此研究中,有別於傳統的點雲迭代最近點演算法(Point cloud Iterative closest point, ICP),以及使用三維變形模型(3D Morphable Model, 3DMM)回歸出三維人臉參數去重建人臉的方法,本論文結合現有兩篇端到端的深度卷積網路(Deep Convolutional Neuron Network, DNN)架構,分別是位置映射回歸網路(PRNet)及二維輔助自監督學習(2DASL)的特點,除了使用前者所提出的利用二維UV位置映射圖存放三維空間的位置,且在圖像上仍保有位置語義的特點外,也加入後者提出的一種自我判別的機制使重建的三維人臉能更準確,進而提出符合上述兩者特性的一種基於圖像到圖像轉換之條件生成對抗網路(Image-to-image GAN),以此為基本架構來實現本研究所提出的單張二維圖像到三維空間位置轉換的架構以進行三維人臉重建和密集對齊。
    最後,本論文使用評測數據集AFLW2000-3D來分析性能,首先我們以二維68點人臉特徵點(keypoints)來比較人臉對齊的性能;相比於傳統演算法,我們的研究平均錯誤率可大幅下降0.5%至1%。此外,相比於主要比較對象PRNet,整體來說,我們的方法雖然平均錯誤率高出約0.5%,但仍具一定的水準,且在某些特定資料上甚至做得更好。另外,在人臉重建的議題上,相較於PRNet重建出的模糊三維人臉,我們的研究更能保有高頻資訊,因此產生出的三維人臉在五官輪廓上都有較好的精細度。

    Translating single 2D face image to 3D mesh for face alignment and reconstruction is a challenging problem in computer vision. In particular, it is recognized that 2D alignment is very difficult when dealing with large poses or occlusion problems. Many scholars have been actively looking for solutions that can use 3D spatial information. In this thesis, different from the adopting traditional point cloud iteration nearest point algorithm (ICP) or using 3D Morphable Model (3DMM) to regress the 3D face parameters to reconstruct the face, we combined the existing two end-to-end Deep Convolutional Neuron Network (DNN) architecture to improve the accuracy of 3D face reconstruction. Specifically, position map regression network and single 2D image assisted 3D face reconstruction were employed in the proposed work. In addition to using the 2D UV position map proposed in the former paper to represent the position in the 3D space while trying to retain the position semantics on the image, a kind of self-critic was also introduced in the latter paper. The discriminant mechanism makes the reconstructed 3D face more accurate. Moreover, image-to-image translation with conditional adversarial nets (Image-to-image GAN) is the baseline architecture used to realize the 3D dense alignment and face reconstruction of the proposed 2D image to 3D spatial position conversion architecture in this thesis.
    Finally, we used AFLW2000-3D as the evaluation dataset to analyze the resulting performance and compared the results of face alignment with 68 2D keypoints. Compared with traditional algorithms, the average error of our work is decreased significantly by 0.5% to 1%. The proposed method reveals almost the same level of accuracy as compared to the PRNet, the main target of our comparison. Overall, although the average error is higher than PRNet by about 0.5%, the experimental results show that high frequency information can be retained using the proposed scheme leading a better fineness in the contours of the 3D facial features.

    摘   要 iii ABSTRACT v 致謝 vii Content viii List of Tables x List of Figures xi Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related work 2 1.3 Thesis organization 4 Chapter 2 Background 5 2.1 Single 2D image to 3D face reconstruction algorithm 5 2.1.1 Model-based: 3D morphable model fitting 5 2.1.2 Model-free: End-to-end convolutional neuron networks 8 2.2 Generative adversarial networks 9 2.2.1 Probability-based generative adversarial networks 9 2.2.2 Energy-based generative adversarial networks 15 Chapter 3 Proposed Image-to-Position Translation Network Algorithm 17 3.1 Concept toward image-to-position translation 17 3.1.1 UV position map 17 3.1.2 Image-to-position translation 20 3.2 Network architecture and training details 22 3.2.1 Network architecture and backbone network 22 3.2.2 Loss function and optimization trick 25 3.2.3 Other implementation details 27 Chapter 4 Experimental Evaluation and Results Comparison 30 4.1 Experiment setup 30 4.1.1 Training and benchmarking datasets 30 4.1.2 Evaluation metric 32 4.2 Experimental evaluation of network architecture 33 4.2.1 Evaluation of transformation ratio of data augmentation 33 4.2.2 Evaluation of rotation angle range of data augmentation 35 4.2.3 Evaluation of strategy of data augmentation 36 4.2.4 Evaluation of extreme angle of data augmentation 37 4.3 Comparison with other state-of-the-art works 40 4.3.1 Comparison of 3D face alignment 40 4.3.2 Comparison of 3D face reconstruction 47 4.4 Extended application: data augmentation with different poses in 3D 49 Chapter 5 Conclusion and Future Work 51 5.1 Conclusion 51 5.2 Future work 52 References 53

    References
    [1] Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M., ‘‘Robust discriminative response map fitting with constrained local models’’, in Computer Vision and Pattern Recognition (CVPR), pages 3444–3451, 2013.
    [2] Alp Guler, R., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., & Kokkinos, I. ‘‘Densereg: Fully convolutional dense shape regression in-the-wild’’, in Computer Vision and Pattern Recognition (CVPR), pages 6799–6808, 2017.
    [3] Blanz, V., & Vetter, T.,‘‘A morphable model for the synthesis of 3d faces’’, in SIGGRAPH, volume 99, pages 187–194, 1999.
    [4] Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N., ‘‘Localizing parts of faces using a consensus of exemplars’’, in Computer Vision and Pattern Recognition (CVPR), pages 545–552, 2011.
    [5] Bas, A., Huber, P., Smith, W. A., Awais, M., & Kittler, J., ‘‘3d morphable models as spatial transformer networks’’, in International Conference on Computer Vision (ICCV), pages 904–912, 2017.
    [6] Berthelot, D., Schumm, T., & Metz, L., ‘‘Began: Boundary equilibrium generative adversarial networks’’, arXiv preprint arXiv:1703.10717, 2017.
    [7] Bhagavatula, C., Zhu, C., Luu, K., & Savvides, M., ‘‘Faster than real-time facial alignment: A 3d spatial transformer network approach in unconstrained poses’’, in International Conference on Computer Vision (ICCV), pages 3980–3989, 2017.
    [8] Bulat, A., & Tzimiropoulos, G., ‘‘How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks)’’, in International Conference on Computer Vision (ICCV), pages 1021–1030, 2017.
    [9] Cootes, T. F., Edwards, G. J., & Taylor, C. J., ‘‘Active appearance models’’, in European Conference on Computer Vision (ECCV), pages 484–498, 1998
    [10] Cao, C., Weng, Y., Zhou, S., Tong, Y., & Zhou, K., ‘‘Facewarehouse: A 3d facial expression database for visual computing’’, in Transactions on Visualization and Computer Graphics (TVCG), pages 413–425, 2013.
    [11] Chu, B., Romdhani, S., & Chen, L., ‘‘3d-aided face recognition robust to expression and pose variations’’, in Computer Vision and Pattern Recognition (CVPR), pages 1899–1906, 2014.
    [12] Floater, M. S., ‘‘Parametrization and smooth approximation of surface triangulations’’, Computer aided geometric design, pages 231–250, 1997.
    [13] Feng, Y., Wu, F., Shao, X., Wang, Y., & Zhou, X., ‘‘Joint 3d face reconstruction and dense alignment with position map regression network’’, in European Conference on Computer Vision (ECCV), pages 534–551, 2018.
    [14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y., ‘‘Generative adversarial nets’’, in arXiv preprint arXiv:1406.2661, 2014.
    [15] Gou, C., Wu, Y., Wang, F. Y., & Ji, Q., ‘‘Shape augmented regression for 3d face alignment’’, in European Conference on Computer Vision (ECCV), pages 604–615, 2016.
    [16] Hinton, G. E., & Salakhutdinov, R. R., ‘‘Reducing the dimensionality of data with neural networks’’, Science, 313(5786):504–507, 2006.
    [17] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A., ‘‘Imageto-image translation with conditional adversarial networks’’, arXiv:1611.07004, 2016.
    [18] Johnson, J., Alahi, A., & Fei-Fei, L., ‘‘Perceptual losses for real-time style transfer and super-resolution’’, in European Conference on Computer Vision (ECCV), pages 694–711, 2016.
    [19] Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G., ‘‘Large pose 3d face reconstruction from a single image via direct volumetric cnn regression’’, in International Conference on Computer Vision (ICCV), pages 1031–1039, 2017.
    [20] Kingma, D. P., & Ba, J., ‘‘Adam: A method for stochastic optimization’’, in International Conference on Learning Representations (ICLR), 2015.
    [21] Liu, Y., Jourabloo, A., Ren, W., Liu, X., ‘‘Dense face alignment’’, arXiv preprint arXiv:1709.01442, 2017.
    [22] Messer, K., Matas, J., Kittler, J., Luettin, J., & Maitre, G., ‘‘XM2VTSDB: The extended M2VTS database. In Second international conference on audio and video-based biometric person authentication’’, Citeseer, volume 964, pages 965–966, 1999.
    [23] Mirza, M., & Osindero, S., ‘‘Conditional generative adversarial nets’’, arXiv preprint arXiv:1411.1784, 2014.
    [24] Moschoglou, S., Ploumpis, S., Nicolaou, M. A., Papaioannou, A., & Zafeiriou, S., ‘‘3dfacegan: Adversarial nets for 3d face representation, generation, and translation’’, arXiv preprint arXiv:1905.00307, 2019.
    [25] Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T., ‘‘A 3d face model for pose and illumination invariant face recognition’’, in IEEE, pages 296–301, 2009.
    [26] Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A., ‘‘Context encoders: Feature learning by inpainting’’, in Computer Vision and Pattern Recognition (CVPR), pages 2536–2544, 2016.
    [27] Ronneberger, O., Fischer, P., & Brox, T., ‘‘U-net: Convolutional networks for biomedical image segmentation’’, in Medical Image Computing and Computer Assisted Intervention (MICCAI), pages 234–241, 2015.
    [28] Radford, A., Metz, L., & Chintala, S., ‘‘Unsupervised representation learning with deep convolutional generative adversarial networks’’, in International Conference on Learning Representations (ICLR), 2016.
    [29] Richardson, E., Sela, M., Or-El, R., Kimmel, R., ‘‘Learning detailed face reconstruction from a single image’’, in Computer Vision and Pattern Recognition (CVPR), pages 1259–1268, 2016.
    [30] Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M., ‘‘300 faces in-the-wild challenge: The first facial landmark localization challenge’’, in IEEE Computer Vision Workshops (ICCVW), pages 397– 403, 2013.
    [31] Tuan Tran, A., Hassner, T., Masi, I., & Medioni, G., ‘‘Regressing robust and discriminative 3d morphable models with a very deep neural network’’, in Computer Vision and Pattern Recognition (CVPR), pages 5163–5172, 2017.
    [32] Tu, X., Zhao, J., Xie, M., Jiang, Z., Balamurugan, A., Luo, Y., Zhao, Y., He, L., Ma, Z., Feng, J., ‘‘3D face reconstruction from a single image assisted by 2D face images in the wild’’, arXiv:1903.09359, 2019.
    [33] Wang, X., & Gupta, A., ‘‘Generative image modeling using style and structure adversarial networks’’, in European Conference on Computer Vision (ECCV), pages 318–335, 2016.
    [34] Yoo, D., Kim, N., Park, S., Paek, A. S., & Kweon, I. S., ‘‘Pixellevel domain transfer’’, European Conference on Computer Vision (ECCV), pages 517–532, 2016.
    [35] Zhu, X., & Ramanan, D., ‘‘Face detection, pose estimation, and landmark localization in the wild’’, in Computer Vision and Pattern Recognition (CVPR), pages 2879–2886, 2012.
    [36] Zhou, E., Fan, H., Cao, Z., Jiang, Y., & Yin, Q., ‘‘Extensive facial landmark localization with coarse-to-fine convolutional network cascade’’, in IEEE Computer Vision Workshops (ICCVW), pages 386–391, 2013.
    [37] Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S. Z., ‘‘Highfidelity pose and expression normalization for face recognition in the wild’’, in Computer Vision and Pattern Recognition (CVPR), pages 787–796, 2015.
    [38] Zhao, J., Mathieu, M., & LeCun, Y., ‘‘Energy-based generative adversarial network’’, arXiv preprint arXiv:1609.03126, 2016.
    [39] Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z., ‘‘Face alignment across large poses: A 3d solution’’, in Computer Vision and Pattern Recognition (CVPR), pages 146–155, 2016.

    下載圖示 校內:2024-07-31公開
    校外:2024-07-31公開
    QR CODE