簡易檢索 / 詳目顯示

研究生: 王敬傑
Wang, Ching-Chieh
論文名稱: 基於深度學習之人臉任意風格轉換系統
An Arbitrary Style Face Transformation System with Deep Learning
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 45
中文關鍵詞: 生成對抗網路深度學習風格轉換風格融合自適應實例標準化
外文關鍵詞: generative adversarial network, deep learning, style transform, style mixing, adaptive instance normalization
相關次數: 點閱:180下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著社交媒體網路或元宇宙的興起,人們越來越常將日常生活中的照片上傳到社交平台上,或是製造個人的虛擬分身,但生成虛擬分身通常需要花費大量的時間調整、渲染,所需要的調色基礎、器材成本會讓使用上產生一大問題。因此,在本論文中,我們基於DualStyleGAN為骨幹,利用其生成高質量人像的能力為輔助,將輸入的人臉肖像,透過風格標準化的方式,消除內容圖像的風格,並添加上輸入風格圖像的顏色、紋理,使兩者產生出融合的效果。我們也透過雙分支模組(Double-branch module),讓兩路徑的多尺度特徵圖在合成時增加權重的控制,讓網路在實際使用上能讓使用者自行調整風格融合的大小,並選出最合適的成果,更加貼近生活所需。最終實驗所示,本論文提出之系統能有效的將風格圖像融合進輸入的內容圖像中,並產生可視化後高解析度、高品質的成果,有效減少傳統方法所需要的時間、人力成本。

    In recent years, with the rise of social media networks and metaverse, people are increasingly uploading adjusted photos or creating virtual avatars of their daily lives to social platforms. However, generating those manipulated images often requires a significant amount of time for adjustment and rendering. The color correction operations and equipment costs can pose a major challenge in the process. Therefore, in this thesis, we build upon the backbone of DualStyleGAN, leveraging its ability to generate high-quality portraits. We employ style normalization to remove the style of the input facial portrait and add the color and texture of the input style image, resulting in a better fusion of them. Additionally, through the proposed double-branch module, we enable weighted control over the feature maps of all levels of both paths during synthesis. This allows users to adjust the magnitude of style fusion and select the most suitable outcome, making it more adaptable to real-life needs. As demonstrated in the experiments, the proposed system effectively integrates the style image into the input content image, producing visually appealing, high-resolution, and high-quality results to significantly reduce the time and the costs required by traditional methods.

    摘要 I Abstract II 誌謝 III Contents V List of Tables VIII List of Figures IX Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 2 1.3 Thesis Organization 3 Chapter 2 Related Work 4 2.1 Generative Adversarial Network (GAN) 4 2.2 Style Transfer 5 2.2.1 Adaptive Instance Normalization (AdaIN) 5 2.2.2 Double-Input Multi-Scale Arbitrary Style Transfer (DMAST) 6 2.2.3 StyleGAN 7 2.2.4 DualStyleGAN 8 2.3 Vision Transformer 9 2.3.1 StyleSwin 10 Chapter 3 The Proposed Arbitrary Style Face Transfer System 12 3.1 Overview of the Proposed DBStyleGAN System 13 3.2 Contour Extractor 14 3.3 Synthesis Network 15 3.3.1 Double-Branch Module 17 3.3.2 Feature Weighter 19 3.3.3 Interpolation Module 21 3.3.4 RGB Path Module 21 3.4 Discriminator 22 3.5 Training Loss Functions 23 3.5.1 Adversarial Loss 23 3.5.2 Perceptual Loss 25 3.5.3 Regularization Loss 25 3.5.4 Identity Loss 26 3.5.5 Total Loss 26 3.6 Implementation on Personal Computer 27 Chapter 4 Experiment Results 28 4.1 Environment Settings and Dataset 28 4.2 Training Details 29 4.3 Evaluation Metrics 29 4.3.1 Fréchet Inception Distance (FID) 30 4.3.2 Learned Perceptual Image Patch Similarity (LPIPS) 30 4.4 Comparison with Other Methods 31 4.5 Ablation Study 36 4.6 System Demonstration 39 Chapter 5 Conclusions 40 Chapter 6 Future Work 41 References 43

    [1] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232.
    [2] J. Kim, M. Kim, H. Kang, and K. Lee, “U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” arXiv preprint arXiv:1907.10830, 2019.
    [3] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.
    [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014
    [5] S. Yang, L. Jiang, Z. Liu, and C. C. Loy, “Pastiche master: Exemplar-based high-resolution portrait style transfer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7693–7702.
    [6] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1501–1510.
    [7] Y. Wen and J. Yang, “Deep learning neural networks for arbitrary style face cartoonization,” Master Thesis, National Cheng Kung University, 2021.
    [8] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110–8119.
    [9] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2287–2296.
    [10] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
    [11] B. Zhang, S. Gu, B. Zhang, J. Bao, D. Chen, F. Wen, Y. Wang, and B. Guo, “Styleswin: Transformer-based gan for high-resolution image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 304–11 314.
    [12] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
    [13] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
    [14] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 694–711.
    [15] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4690–4699.
    [16] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
    [18] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, vol. 32, 2019.
    [19] Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
    [20] J. N. Pinkney and D. Adler, “Resolution dependent gan interpolation for controllable image synthesis between domains,” arXiv preprint arXiv:2010.05334, 2020.
    [21] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in Neural Information Processing Systems, vol. 30, 2017.
    [22] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
    [23] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 684–10 695.

    無法下載圖示 校內:2028-07-31公開
    校外:2028-07-31公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE