研究生: |
王煜凱 Wang, Yu-Kai |
---|---|
論文名稱: |
具顏色資訊之動漫人臉生成對抗模型 Deep Learning Model for Anime Face Generation based on Sketches and Color Information |
指導教授: |
王宗一
Wang, Tzone-I |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 46 |
中文關鍵詞: | 深度學習 、生成對抗網路 、動漫人臉生成 |
外文關鍵詞: | Deep Learning, Generative Adversarial Networks, Anime Face Generation |
相關次數: | 點閱:119 下載:19 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
動漫人物的創作在各個領域均具有廣泛的應用,包括社群媒體、動畫、電玩、電影和小說等等。然而動漫的創作包含專業技能,對一般沒有繪畫經驗的人來說,是難以完成的任務。近年來,隨著人工智慧的迅速發展,人們期望能將其更實際地應用在日常生活中。生成對抗網路(Generative Adversarial Network-GAN)[1]為生成式人工智慧(Generative AI)的一種技術,其輸出結果需要盡量模仿訓練集中的真實樣本。思考如何能利用生成對抗網路生成動漫人臉圖片成為一個具有價值的研究方向。
然而以往的生成對抗網路模型常常存在著模式崩塌(mode collapse)的問題,生成的圖片不論是樣子還是顏色都會模仿真實樣本,導致使用者很難生成自己所期望客製化的圖片。若要訓練一個能讓使用者指定特定顏色的生成對抗網路模型,就需要為訓練集中的每個圖片進行人工標記,但生成對抗網路模型的訓練資料集通常都到達數萬張以上,這將是一個難以實現的任務。
本研究實現了一個利用草圖以生成動漫人臉的系統,該系統具備自動標記顏色功能,使用已訓練好的動漫人臉分割模型對訓練資料集中的圖片進行部位分割。對於每個分割出的部位,採用K-Means[2]演算法判定其對應的顏色,如此即可對訓練資料集中的動漫人臉圖片進行標記,例如標記頭髮、眼睛、皮膚等部位的顏色。而使用者只需提供簡單的草圖,並以希望產出的的顏色繪製相應部位以進行標記,系統所訓練之生成對抗網路即能生成符合使用者需求的動漫人臉。
The creation of anime characters finds broad applications across diverse fields, including social media, animation, video games, movies, and novels, among others. However, anime creation demands specialized skills, making it a daunting task for those lacking drawing experience. In recent years, with the rapid advancement of artificial intelligence, there's a growing aspiration to apply it in more practical ways in daily life. One promising avenue of research lies in the utilization of Generative Adversarial Networks (GANs) [1], a technique within Generative AI that seeks to generate outputs closely resembling real samples from the training data.
Nevertheless, traditional GAN models often grapple with the issue of mode collapse, where the generated images, in terms of appearance and colors, tend to closely mimic real samples, rendering it challenging for users to create customized images aligned with their preferences. To train a GAN model enabling users to specify particular colors, manual labeling would be indispensable for each image in the training dataset. Considering that GAN model training datasets typically encompass tens of thousands of images, this presents a formidable undertaking.
This research has implemented a system capable of generating anime faces from sketches, equipped with an automatic color labeling feature. A pre-trained anime face segmentation model is employed to partition the images in the training dataset into distinct components. For each segmented part, the K-Means algorithm [2] is utilized to ascertain its corresponding color. This facilitates the labeling of various components in anime face images in the training dataset, such as hair, eyes, skin, and more. Users merely need to provide simple sketches and annotate the relevant components with their desired colors. The trained GAN model is then proficient in generating anime faces that align with the user's specifications.
[1] I. Goodfellow et al., "Generative adversarial networks," Communications of the ACM, vol. 63, no. 11, pp. 139-144, 2020.
[2] J. A. Hartigan and M. A. Wong, "A k-means clustering algorithm," Applied statistics, vol. 28, no. 1, pp. 100-108, 1979.
[3] G. B. a. A. a. D. Community}, "{Danbooru2019 Portraits: A Large-Scale Anime Head Illustration Dataset}," {2019}. [Online]. Available: {https://www.gwern.net/Crops#danbooru2019-portraits}.
[4] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, 2015.
[5] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, "Bisenet: Bilateral segmentation network for real-time semantic segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 325-341.
[6] D. L. {Xiaoyu Xiang, Xiao Yang, Yiheng Zhu, Xiaohui Shen}. "{Anime2Sketch: A Sketch Extractor for Anime Arts with Deep Networks}." {GitHub}. {https://github.com/Mukosame/Anime2Sketch} (accessed.
[7] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, "Gans trained by a two time-scale update rule converge to a local nash equilibrium," Advances in neural information processing systems, vol. 30, 2017.
[8] L. A. Gatys, A. S. Ecker, and M. Bethge, "Image style transfer using convolutional neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2414-2423.
[9] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[10] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125-1134.
[11] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223-2232.
[12] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 2015: Springer, pp. 234-241.