研究生: |
仇騁越 Qiu, Cheng-Yue |
---|---|
論文名稱: |
相對可控的2D人物頭像生成與風格轉移 Controlled Modification of 2D anime face generate and style transfer |
指導教授: |
李同益
Lee, Tong-Yee |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 深度學習 、電腦視覺 、風格轉移與融合 |
外文關鍵詞: | Deep learning, Computer vision, Style transfer and mixing |
相關次數: | 點閱:73 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於風格的生成式對抗網絡架構(StyleGAN[1,2])在數據驅動(data-driven)的無條件生成圖像的任務中已經可以生成 state-of-the-art 的圖片。本文中,我們基於StyleGAN[1,2]提出了一個可以融合兩張及多張2D 動漫人物頭像特征的方法:輸入兩張及多張 2D 動漫人物頭像圖片,使用者可以選取不同輸入圖片中的臉部特征,例如選取輸入圖片 A 的臉型、髮型,與輸入圖片 B 的眼睛、神態表情,自動生成一張具備選取特征的特征融合圖,實現特征的混合。我們這個方法主要有兩個部分: 1、使用一個以優化初始隨機 latent code 來匹配輸入圖像的 Encoder 將輸入的動漫人物頭像圖片的特徵屬性分離,將圖片轉至適合生成器的latent code; 2、找到不同特征在latent code 中的位置,根據選取的特征修改latent code,並使用GAN 中的生成器,生成具備我們想要的特徵屬性的圖片。實驗結果表明,我們的方法能夠混合不同來源的2D 動漫人物頭像特征,並生成高質量的融合圖。
The style-based Generative Adversarial Network architecture StyleGAN[1, 2] can already generate the state-of-the-art images in the task of data-driven unconditional image generation. In this paper, based on StyleGAN[1, 2], we propose a method that can mix up style features of two or more 2D anime character face images. Input two or more 2D anime character face images, user choose different face features, like hairstyle and pose in image A and facial expression in image B. Our method can automatically generate a style-mixing image with chosen face features. Our method has two main parts: 1: Using an Encoder to optimize the initial random latent code to match the input image. This Encoder can separate the style features of the input anime character face image, and transfer the image to the latent code suitable for the Generator; 2: Found the positions of the different style features in the latent code, editing the latent code by chosen features, and using the Synthetic network from StyleGAN[1, 2] to generate the image with the chosen features. The experimental results show that our method can mix up the style features from different 2D anime character face images and generate high-quality style-mixing images.
[1] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4401-4410).
[2] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8110-8119).
[3] Abdal, R., Qin, Y., & Wonka, P. (2019). Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE international conference on computer vision (pp. 4432-4441).
[4] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[5] Johnson, J., Alahi, A., & Fei-Fei, L. (2016, October). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694-711). Springer, Cham.
[6] Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586-595).
[7] L. Gatys, A. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2414–2423, 2016.
[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley,S.Ozair,A.Courville,andY.Bengio. Generative Adversarial Networks. In NIPS, 2014.
[9] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
[10] A. Brock, J. Donahue, and K. Simonyan. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.
[11] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. CoRR, abs/1710.10196, 2017.
[12] T.Miyato,T.Kataoka,M.Koyama,andY.Yoshida. Spectral normalization for generative adversarial networks. CoRR, abs/1802.05957, 2018.
[13] Chen, Y., Lai, Y. K., & Liu, Y. J. (2018). Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9465-9474).
[14] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired imageto-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision, 2017.
[15] P. L. Rosin and J. Collomosse. Image and Video-Based Artistic Stylisation. Springer, 2013.
[16] Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576.
[17] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2414-2423).
[18] Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1501-1510).
[19] Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., & Yan, S. (2017). Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1222-1230).
[20] Li, C., & Wand, M. (2016, October). Precomputed real-time texture synthesis with markovian generative adversarial networks. In European conference on computer vision (pp. 702-716). Springer, Cham.
[21] Slossberg, R., Shamai, G., & Kimmel, R. (2018). High quality facial surface and texture synthesis via generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 0-0).
[22] Xian, W., Sangkloy, P., Agrawal, V., Raj, A., Lu, J., Fang, C., ... & Hays, J. (2018). Texturegan: Controlling deep image synthesis with texture patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8456-8465).
[23] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
[24] Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
[25] Liu, M. Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., & Kautz, J. (2019). Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 10551-10560).
[26] Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. In Advances in neural information processing systems (pp. 613-621).
[27] Wang, T. C., Liu, M. Y., Zhu, J. Y., Liu, G., Tao, A., Kautz, J., & Catanzaro, B. (2018). Video-to-video synthesis. arXiv preprint arXiv:1808.06601.
[28] Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Paul Smolley, S. (2017). Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2794-2802).
[29] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein gans. In Advances in neural information processing systems (pp. 5767-5777).
[30] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[31] Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016, October). Generative visual manipulation on the natural image manifold. In European conference on computer vision (pp. 597-613). Springer, Cham.
[32] Su, H., Niu, J., Liu, X., Li, Q., Cui, J., & Wan, J. (2020). Unpaired Photo-to-manga Translation Based on The Methodology of Manga Drawing. arXiv preprint arXiv:2004.10634.
[33] Antonin Chambolle, Vicent Caselles, Matteo Novaga, Daniel Cremers, Thomas Pock. AnintroductiontoTotalVariationforImageAnalysis. 2009. <hal-00437581>
[34] Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607-609.
[35] Branwen, G. (2019). Making Anime Faces With StyleGAN. https://www.gwern.net/Faces
[36] Li, X., & Yu, X. Generating Cartoon Style Facial Expressions with StackGAN.
[37] Kaggle,Safebooru - Anime Image Metadata https://www.kaggle.com/alamson/safebooru
[38] Pixiv https://www.pixiv.net/
[39] Pikax https://github.com/Redcxx/Pikax
[40] lbpcascade_animeface https://github.com/nagadomi/lbpcascade_animeface
[41] waifu2x https://github.com/nagadomi/waifu2x