簡易檢索 / 詳目顯示

研究生: 温育政
Wen, Yu-Cheng
論文名稱: 人臉任意卡通風格化之深度學習神經網路
Deep Learning Neural Networks for Arbitrary Style Face Cartoonization
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 42
中文關鍵詞: 深度學習風格轉換生成對抗網路多尺度路徑轉換特徵加權器風格提取器
外文關鍵詞: deep learning, style transfer, generative adversarial network, multi-scale path transformation, feature weighter, style extractor
相關次數: 點閱:118下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文提出了一種新穎的人臉卡通風格變換神經網絡。所提出的系統採用循環生成對抗網絡(CycleGAN)為主架構,其結構主要包括四個模塊:兩個生成器與兩個鑑別器。使用 CycleGAN 架構可以克服樣式轉換中缺少配對數據的問題。在我們系統中所使用的的重要概念,多尺度路徑,用於使轉換過程中具有更大的分辨率靈活性。多尺度路徑可以為我們的訓練生成不同尺度的信息,以供訓練之用。另一個重要模塊是特徵加權器。使用這個加權器可以使我們的系統更加專注於需要密集處理的人臉區域,例如眼睛,鼻子等。關於風格提取器模塊。使用這種風格提取方式不僅可以大大減少系統所使用的訓練參數量還可以保持網路所生成的圖像質量。我們認為這有助於在資源有限的平台上運行此系統。經過實驗,我們的系統可以通過進行真實人眼比較來達到不錯的效果。結果表明,我們所提出的系統生成結果可以在節省資源的前提下達到大多數人的接受要求。

    In this thesis, a human face cartoon style transformation neural network is proposed. The proposed system adapts the idea of cycle generative adversarial network (CycleGAN), structure which mainly includes four modules: two generators and two discriminators. Using CycleGAN structure can overcome the difficulty of lacking paired data in style transform. The important concept in our system, multi-scale path, is used to make the processing with more resolution flexibility. Multi-scale path can generate information at different scales for our training. Another important module is feature weighter. Using this module can make our system focus on regions that we need to be intensive processed, e.g. eyes, noses and etc. About the style extractor module, comparing to another method, we choose using direct calculation to replace deep learning method. Using this way can not only highly reduce the number of parameters we used and keep the image quality generated from our system. We think this may be helpful for installing this system on resource limited platforms. After the experiment, our system can reach good result by doing real human comparison. The results of the system we proposed can meet the acceptance requirements of most people on the premise of saving resources.

    摘 要 I Abstract II 誌 謝 III Contents IV List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1. Research Background 1 1.2. Motivations 3 1.3. Thesis Organization 4 Chapter 2 Related Work 6 2.1. CycleGAN 6 2.2. Adaptive Instance Normalization Layer 8 2.3. U-GAT-IT Network 9 Chapter 3 Face Cartoonization with Arbitrary Styles 11 3.1. Structure of the proposed DMAST system 12 3.2. Contour Extractor 14 3.3. Feature Weighter 16 3.4. Style Extractor 18 3.5. Style Replacer 19 3.6. Reconstruction Module 21 3.7. Discriminator 22 3.8. Loss Functions 23 3.8.1. Total Loss 24 3.8.2. Adversarial Loss 24 3.8.3. Cycle Loss 25 3.8.4. Identity Loss 26 3.8.5. Class Activation Mapping Loss (CAM Loss) 27 Chapter 4 Experimental Results 29 4.1. Experimental Environments 29 4.2. Ablation Study 30 4.2.1. DMAST with and without Cycle-structure 30 4.2.2. DMAST with and without Multi-scale module (content only) 32 4.2.3. DMAST with and without Multi-scale Module (content+style) 34 4.3. Result score comparison 36 Chapter 5 Conclusions 39 Chapter 6 Future Work 40 References 41

    [1] J.-Y. Zhu, "Unpaired image-to-image translation using cycle-consistent adversarial networks." Proc. of the IEEE International Conference on Computer Vision. 2017.
    [2] X. Huang, and S. Belongie. "Arbitrary style transfer in real-time with adaptive instance normalization." Proc. of the IEEE International Conference on Computer Vision. 2017.
    [3] J. Kim, M. Kim, H. Kang, K. Lee "U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation." Proc. of arXiv preprint arXiv:1907.10830, 2019.
    [4] J. Liou, J. Yang, S. Mao, "Real-time Multi-person 3D Pose Estimation Convolutional Neuron Networks." Intelligent Information Processing Systems and Applications, 2020.
    [5] S. Mao, H. Wu, G. Sandison, S. Fang, "Iterative volume morphing and learning for mobile tumor based on 4DCT," Physics in Medicine and Biology, 2017.
    [6] S. Mao, M. Ye, X. Li, F. Pang, J. Zhou, "Rapid vehicle logo region detection based on information theory," International Journal of Computers & Electrical Engineering, 39(3), 2013.
    [7] J. Zhou, M. Ye, J. Ding, S. Mao and H. J. Zhang, "Rapid and robust traffic accident detection based on orientation map," Optical Engineering, 51 (11), 2012.
    [8] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. Proc. of the IEEE conference on computer vision and pattern recognition (pp. 2414-2423).
    [9] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” Proc. of CVPR, 2016.
    [10] Chen, Y., Lai, Y. K., & Liu, Y. J. Cartoongan: Generative adversarial networks for photo cartoonization. Proc. of the IEEE conference on CVPR, pp. 9465-9474, 2018.
    [11] Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., & Salesin, D. H. (2001, August). Image analogies. Proc. of the 28th annual conference on Computer graphics and interactive techniques (pp. 327-340).
    [12] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proc. of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
    [13] Sangkloy, P., Lu, J., Fang, C., Yu, F., & Hays, J. (2017). Scribbler: Controlling deep image synthesis with sketch and color. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5400-5409).
    [14] Karacan, L., Akata, Z., Erdem, A., & Erdem, E. (2016). Learning to generate images of outdoor scenes from attributes and semantic layouts. Proc. of arXiv preprint arXiv:1612.00215.
    [15] Rosales, R., Achan, K., & Frey, B. J. (2003, October). Unsupervised image translation. Proc. of ICCV (pp. 472-478).
    [16] Liu, M. Y., & Tuzel, O. (2016). Coupled generative adversarial networks. Advances in neural information processing systems, 29, 469-477.
    [17] Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700-708).
    [18] Taigman, Y., Polyak, A., & Wolf, L. (2016). Unsupervised cross-domain image generation. Proc. of arXiv preprint arXiv:1611.02200.
    [19] Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. Proc. of the IEEE conference on computer vision and pattern recognition (pp. 3722-3731).
    [20] Kim, T., Cha, M., Kim, H., Lee, J. K., & Kim, J. (2017, July). Learning to discover cross-domain relations with generative adversarial networks. Proc. of International Conference on Machine Learning (pp. 1857-1865). PMLR.

    無法下載圖示 校內:2026-07-21公開
    校外:2026-07-21公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE