| 研究生: |
江玟貞 Chiang, Wen-Chen |
|---|---|
| 論文名稱: |
可用草圖生成水墨畫的生成對抗網路 Sketch to Chinese Ink Wash Painting Translation with Generative Adversarial Network |
| 指導教授: |
王宗一
Wang, Tzone-I |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 水墨畫 、草圖生成圖像 、風格遷移 、生成對抗網路 |
| 外文關鍵詞: | Chinese Ink Wash Painting, Sketch to Image Translation, Style transfer, Generative Adversarial Network |
| 相關次數: | 點閱:68 下載:10 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
水墨畫是世界上最具特色的藝術表現之一,但要能掌握筆墨的使用,需要花費大量時間和心力來學習與練習,門檻相當高。倘若能實現利用草圖來生成水墨畫,可讓水墨畫藝術應用多元化,例如可以應用於兒童互動藝術,讓兒童以不一樣的方式接觸藝術,從中找靈感和創造興趣。因此本研究希望建立ㄧ個深度學習生成模型,給予模型草圖以及目標水墨畫風格,使模型能生成符合草圖構圖的水墨畫。
草圖生成水墨畫須通過兩項任務的相互作用,分別是圖像轉譯以及風格遷移。過去圖像轉譯著重於對照片、語義標籤圖作轉譯,較少對草圖作轉譯,相較於一張圖片,一張草圖所攜帶的資訊非常稀少,這增加任務的難度。而過去草圖生成圖像的研究,著重於草圖生成照片以及西方畫,較少著重於水墨畫,主要原因是水墨畫在視覺上和前者有很大的不同。因此本研究專注於對草圖作圖像轉譯以及水墨畫的風格遷移,基於 pix2pix 模型建立ㄧ個生成模型。除了使用現有的資料集外,還收集12位畫家共1206張水墨畫期望提升模型的泛化能力。
實驗結果顯示,使用AdaIN(Adaptive Instance Normalization)可以讓生成圖像較好地保留草圖內容、具有較自然的筆墨點染和較少的疊影,並縮短模型的收斂時間。定量評估方面,KID指標進步19%。定性評估方面,專家認為除了構圖一致性表現較好外,墨色、渲染和留白的效果和擬真程度仍需改善。此外,本模型使用多位畫家、具不同風格的水墨畫作訓練時,生成效果較不理想,目前較適合使用單一畫家水墨畫資料集作訓練。雖然給予模型手工繪製的草圖時,模型無法完整理解圖像的場景語義以生成逼真的水墨畫;但生成圖像仍保有輸入草圖的大部分線條及構圖,並具有水墨畫的墨色、渲染以及留白的效果。因此其效果應該可以滿足於多元應用,如兒童互動藝術,讓兒童以不一樣的方式接觸藝術,並從中找靈感和創造興趣。
Ink wash painting is one of the most distinctive artistic expressions in the world. If we can realize using sketches to generate ink wash painting, it would diversify the application of ink painting art. Therefore, we build a generative model. Given a sketch, and a reference image of the ink wash painting style, the model can generate an ink wash painting that conforms to the sketch composition. Based on the pix2pix framework, consisting of a generator, a discriminator, and a feature extractor. In addition to using the existing dataset, we collected 1,206 ink paintings from 12 artists, expecting to improve the generalization.
The results show that by adding AdaIN, the generated image better retains the content of the sketch and has more natural brush strokes. And the model with a shorter convergence time. In the quantitative evaluation, the Kernel Inception Distance metric show an improvement of 19%. In the qualitative evaluation, experts noted that the results have good composition consistency to the input sketch. But the effects of brush strokes, diffusion and void, and the fidelity of the ink wash paintings still need improvement. While given a hand-drawn sketch, the model cannot produce a realistic ink wash painting. However, the generated image retains most of the lines and composition of the input sketch and has the effect of brush strokes, diffusion, and voids. Therefore, its effect maybe be applied to children's interactive art, letting children contact art differently, find inspiration, and create interest.
王耀庭. 山水畫法 1, 2, 3: 雄獅圖書股份有限公司. (1984).
黃賓虹. 黃賓虹畫畫語錄圖釋: 西泠印社. (1993).
Bińkowski, Mikołaj, Sutherland, Danica J, Arbel, Michael, & Gretton, Arthur. Demystifying mmd gans. arXiv preprint arXiv:1801.01401. (2018).
Chen, Wengling, & Hays, James. Sketchygan: Towards diverse and realistic sketch to image synthesis. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2018).
Cheng, Shin-I, Chen, Yu-Jie, Chiu, Wei-Chen, Tseng, Hung-Yu, & Lee, Hsin-Ying. Adaptively-realistic image generation from stroke and sketch with diffusion model. Paper presented at the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. (2023).
Croitoru, Florinel-Alin, Hondru, Vlad, Ionescu, Radu Tudor, & Shah, Mubarak. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. (2023).
Gatys, Leon A, Ecker, Alexander S, & Bethge, Matthias. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. (2015).
Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, . . . Bengio, Yoshua. Generative Adversarial Networks. Paper presented at the Advances in Neural Information Processing Systems 27. in Advances in Neural Information Processing Systems 27. (2014).
He, Bin, Gao, Feng, Ma, Daiqian, Shi, Boxin, & Duan, Ling-Yu. Chipgan: A generative adversarial network for chinese ink wash painting style transfer. Paper presented at the Proceedings of the 26th ACM international conference on Multimedia. in Proceedings of the 26th ACM international conference on Multimedia. (2018a).
He, Bin, Gao, Feng, Ma, Daiqian, Shi, Boxin, & Duan, Ling-Yu. ChipGAN: A Generative Adversarial Network for Chinese Ink Wash Painting Style Transfer. Paper presented at the Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea. in Proceedings of the 26th ACM international conference on Multimedia. (2018b).
Hertzmann, Aaron, Jacobs, Charles E., Oliver, Nuria, Curless, Brian, & Salesin, D. Image analogies. Proceedings of the 28th annual conference on Computer graphics and interactive techniques. (2001).
Heusel, Martin, Ramsauer, Hubert, Unterthiner, Thomas, Nessler, Bernhard, & Hochreiter, Sepp. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30. (2017).
Huang, Xun, & Belongie, Serge. Arbitrary style transfer in real-time with adaptive instance normalization. Paper presented at the Proceedings of the IEEE international conference on computer vision. in Proceedings of the IEEE international conference on computer vision. (2017).
Isola, Phillip, Zhu, Jun-Yan, Zhou, Tinghui, & Efros, Alexei A. Image-to-image translation with conditional adversarial networks. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. (2017).
Jing, Yongcheng, Yang, Yezhou, Feng, Zunlei, Ye, Jingwen, Yu, Yizhou, & Song, Mingli. Neural style transfer: A review. IEEE transactions on visualization and computer graphics, 26(11), 3365-3385. (2019).
Karras, Tero, Aila, Timo, Laine, Samuli, & Lehtinen, Jaakko. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196. (2017).
Karras, Tero, Laine, Samuli, & Aila, Timo. A style-based generator architecture for generative adversarial networks. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019).
Karras, Tero, Laine, Samuli, Aittala, Miika, Hellsten, Janne, Lehtinen, Jaakko, & Aila, Timo. Analyzing and improving the image quality of stylegan. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2020).
Langr, J., & Bok, V. GANs in Action: Deep learning with Generative Adversarial Networks: Manning. (2019).
Liu, Bingchen, Song, Kunpeng, Zhu, Yizhe, & Elgammal, Ahmed. Sketch-to-art: Synthesizing stylized art images from sketches. Paper presented at the Proceedings of the Asian Conference on Computer Vision. in Proceedings of the Asian Conference on Computer Vision. (2020).
Liu, Bingchen, Zhu, Yizhe, Song, Kunpeng, & Elgammal, Ahmed. Self-supervised sketch-to-image synthesis. Paper presented at the Proceedings of the AAAI conference on artificial intelligence. in Proceedings of the AAAI conference on artificial intelligence. (2021).
Meng, Chenlin, He, Yutong, Song, Yang, Song, Jiaming, Wu, Jiajun, Zhu, Jun-Yan, & Ermon, Stefano. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073. (2021).
Mirza, Mehdi, & Osindero, Simon. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. (2014).
Odena, Augustus, Dumoulin, Vincent, & Olah, Chris. Deconvolution and checkerboard artifacts. Distill, 1(10), e3. (2016).
Park, Dae Young, & Lee, Kwang Hee. Arbitrary style transfer with style-attentional networks. Paper presented at the proceedings of the IEEE/CVF conference on computer vision and pattern recognition. in proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019).
Park, Taesung, Liu, Ming-Yu, Wang, Ting-Chun, & Zhu, Jun-Yan. Semantic image synthesis with spatially-adaptive normalization. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2019).
Richardson, Elad, Alaluf, Yuval, Patashnik, Or, Nitzan, Yotam, Azar, Yaniv, Shapiro, Stav, & Cohen-Or, Daniel. Encoding in style: a stylegan encoder for image-to-image translation. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2021).
Ronneberger, Olaf, Fischer, Philipp, & Brox, Thomas. U-net: Convolutional networks for biomedical image segmentation. Paper presented at the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. (2015).
Simo Serra, Edgar, Iizuka, Satoshi, Sasaki, Kazuma, & Ishikawa, Hiroshi. Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Transactions on Graphics (TOG), 35(4), 1-11. (2016).
Simonyan, Karen, & Zisserman, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. (2014).
Sohl-Dickstein, Jascha, Weiss, Eric, Maheswaranathan, Niru, & Ganguli, Surya. Deep unsupervised learning using nonequilibrium thermodynamics. Paper presented at the International conference on machine learning. in International conference on machine learning. (2015).
Song, Yang, Sohl-Dickstein, Jascha, Kingma, Diederik P, Kumar, Abhishek, Ermon, Stefano, & Poole, Ben. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456. (2020).
Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jon, & Wojna, Zbigniew. Rethinking the inception architecture for computer vision. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. (2016).
Ulyanov, Dmitry, Vedaldi, Andrea, & Lempitsky, Victor. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. (2016).
Wu, Yuxin, & He, Kaiming. Group normalization. Paper presented at the Proceedings of the European conference on computer vision (ECCV). in Proceedings of the European conference on computer vision (ECCV). (2018).
Xie, Saining, & Tu, Zhuowen. Holistically-nested edge detection. Paper presented at the Proceedings of the IEEE international conference on computer vision. in Proceedings of the IEEE international conference on computer vision. (2015).
Zhang, Lvmin, Ji, Yi, Lin, Xin, & Liu, Chunping. Style transfer for anime sketches with enhanced residual u-net and auxiliary classifier gan. Paper presented at the 2017 4th IAPR Asian conference on pattern recognition (ACPR). in 2017 4th IAPR Asian conference on pattern recognition (ACPR). (2017).
Zhu, Jun-Yan, Park, Taesung, Isola, Phillip, & Efros, Alexei A. Unpaired image-to-image translation using cycle-consistent adversarial networks. Paper presented at the Proceedings of the IEEE international conference on computer vision. in Proceedings of the IEEE international conference on computer vision. (2017).