簡易檢索 / 詳目顯示

研究生: 李福悌
Lee, Fu-Ti
論文名稱: 基於CNN分類器預測分數的影像風格轉換生成對抗網路
Generative Adversarial Network Based on CNN Classifier Predicted Scores for Image Style Transfer
指導教授: 陳牧言
Chen, Mu-Yen
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 54
中文關鍵詞: 影像風格轉換生成對抗網路平滑化標籤
外文關鍵詞: Image Style Transfer, Generative Adversarial Networks, Smoothing Labels
相關次數: 點閱:105下載:21
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像風格轉換 (Image Style Transfer) 是近年來在電腦視覺領域中一個十分熱門的研究方向,影像風格轉換旨在將某張影像的風格轉換或混合其他不同風格,使轉換後的結果呈現原始影像所沒有的風格特徵。在過去,影像風格轉換主要使用特徵轉換或是濾波器等方式進行實現,其需要大量的人工設計轉換流程,因此,大多只能進行單一的風格轉換。而隨著深度學習技術的發展,許多研究開始將深度學習應用在風格轉換的任務上並取得顯著的進展。

    本研究對於影像風格轉換的任務,提出了從分類資料集中產生影像資料屬性標記,建構多風格轉換任務 GAN 的架構,並提出 CSS-GAN 模型。首先,本研究使用 CNN 分類器在分類資料集上進行訓練,經過驗證分類器的穩定性後,藉由分類器對影像資料進行預測,提取其輸出層特徵作為屬性標籤。其次,對於產生的屬性標籤進行了不同的前處理,用來實驗平滑化的標籤與二元分類標籤的性能差異。最後,使用產生的標籤訓練多風格轉換 GAN。

    實驗中本研究使用人臉屬性資料集對標籤模式與提出的模型架構進行比較,結果顯示使用分類器預測的特徵作為屬性標籤可以有效的提升 GAN 在風格轉換任務上生成影像的品質與穩定性,同時,本研究提出的 CSS-GAN 在評估影像品質的指標也有著更好的表現。

    Image Style Transfer has become a highly popular research direction in the field of computer vision in recent years. Its objective is to transform or blend the style of an image with different styles, resulting in a transformed image that exhibits unique stylistic features not present in the original image. In the past, image style transfer primarily relied on techniques such as feature transformation or filtering, which required extensive manual design of the transformation process, limiting it to single style transfers. However, with the development of deep learning techniques, many researchers have made significant progress by applying deep learning to style transfer tasks.
    In this study, a novel approach is proposed for the task of image style transfer. It involves generating image attribute labels from a classification dataset, constructing a framework for multi-style transfer tasks using GAN, and introducing the CSS-GAN model. Initially, a CNN classifier is trained on the classification dataset. After validating the stability of the classifier, the image data is predicted using the classifier, and the output layer features are extracted as attribute labels. Subsequently, different preprocessing techniques are applied to the generated attribute labels to experiment with the performance differences between smoothed labels and binary classification labels. Finally, the multi-style transfer GAN is trained using the generated labels.
    In the experiments, a facial attribute dataset is used to compare the label patterns and the proposed model architecture. The results demonstrate that using predicted features from the classifier as attribute labels effectively enhances the quality and stability of image generation in style transfer tasks. Additionally, the CSS-GAN model proposed in this study outperforms existing indicators in evaluating image quality.

    中文摘要 i Abstract ii 誌謝 v 目錄 vi 表目錄 ix 圖目錄 x 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 論文架構 4 第二章 文獻探討 5 2.1 影像風格轉換 5 2.1.1 傳統風格轉換 6 2.1.2 卷積類神經網路風格轉換 6 2.1.3 生成對抗網路風格轉換 9 2.2 條件式生成對抗網路 12 2.2.1 IcGAN 12 2.2.2 StarGAN 13 2.3 標籤平滑化 14 第三章 研究方法 16 3.1 使用分類網路生成向量標籤 16 3.1.1 影像屬性分類網路 17 3.1.2 模型訓練 18 3.1.3 輸出向量標籤 18 3.1.4 生成向量標籤虛擬碼 19 3.2 使用屬性向量的影像風格轉換 GAN 19 3.2.1 AdaIN 21 3.2.2 CSS-GAN 生成器 22 3.2.3 CSS-GAN 判別器 22 3.2.4 模型建立 23 3.2.5 Loss Function 23 3.2.6 CSS-GAN 建構虛擬碼 26 第四章 實驗結果與討論 28 4.1 實驗環境 28 4.2 資料集 29 4.3 評估指標 30 4.3.1 分類任務評估指標 30 4.3.2 生成影像評估指標 32 4.4 生成向量標籤 33 4.4.1 資料前處理 34 4.4.2 訓練超參數設定 34 4.4.3 分類效能 35 4.4.4 輸出向量標籤 38 4.5 風格轉換實驗結果 41 4.5.1 訓練超參數設定 42 4.5.2 模型性能 42 第五章 結論 46 參考文獻 48

    [1] C. Chan, F. Durand, and P. Isola, “Learning to generate line drawings that convey geometry and semantics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7915–7925.
    [2] J. Chen, G. Liu, and X. Chen, “Animegan: A novel lightweight gan for photo animation,” in International symposium on intelligence computation and applications. Springer, 2020, pp. 242–256.
    [3] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8789–8797.
    [4] Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “Stargan v2: Diverse image synthesis for multiple domains,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8188–8197.
    [5] V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” arXiv preprint arXiv:1610.07629, 2016.
    [6] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 2001, pp. 341–346.
    [7] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman, “Preserving color in neural artistic style transfer,” arXiv preprint arXiv:1606.05897, 2016.
    [8] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2414–2423.
    [9] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman, “Controlling perceptual factors in neural style transfer,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3985–3993.
    [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020, publisher: ACM New York, NY, USA.
    [11] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, vol. 30, 2017.
    [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
    [13] A. Hertzmann, “Painterly rendering with curved brush strokes of multiple sizes,” in Proceedings of the 25th annual conference on Computer graphics and interactive techniques, 1998, pp. 453–460.
    [14] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
    [15] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1501–1510.
    [16] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised imageto-image translation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 172–189.
    [17] Y. Huang, Y. Cheng, A. Bapna, O. Firat, D. Chen, M. Chen, H. Lee, J. Ngiam, Q. V. Le, and Y. Wu, “Gpipe: Efficient training of giant neural networks using pipeline parallelism,” Advances in neural information processing systems, vol. 32, 2019.
    [18] Y. Jing, Y. Yang, Z. Feng, J. Ye, Y. Yu, and M. Song, “Neural style transfer: A review,” IEEE transactions on visualization and computer graphics, vol. 26, no. 11, pp. 3365– 3385, 2019, publisher: IEEE.
    [19] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 694–711.
    [20] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
    [21] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” in International conference on machine learning. PMLR, 2017, pp. 1857–1865.
    [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017, publisher: AcM New York, NY, USA.
    [23] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Diversified texture synthesis with feed-forward networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3920–3928.
    [24] P. Litwinowicz, “Processing images and video for an impressionist effect,” in Proceedings of the 24th annual conference on Computer graphics and interactive techniques, 1997, pp. 407–414.
    [25] L. Liu, Z. Xi, R. Ji, and W. Ma, “Advanced deep learning techniques for image style transfer: a survey,” Signal Processing: Image Communication, vol. 78, pp. 465–470, 2019, publisher: Elsevier.
    [26] Z. Liu, P. Luo, X. Wang, and X. Tang, “Large-scale celebfaces attributes (celeba) dataset,” Retrieved August, vol. 15, no. 2018, p. 11, 2018.
    [27] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
    [28] R. Muller, S. Kornblith, and G. E. Hinton, “When does label smoothing help?” ¨ Advances in neural information processing systems, vol. 32, 2019.
    [29] G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. Alvarez, “Invertible conditional ´ gans for image editing,” arXiv preprint arXiv:1611.06355, 2016.
    [30] S. Qiao, H. Wang, C. Liu, W. Shen, and A. Yuille, “Micro-batch training with batchchannel normalization and weight standardization,” arXiv preprint arXiv:1903.10520, 2019.
    [31] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
    [32] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the aaai conference on artificial intelligence, vol. 33, 2019, pp. 4780–4789, issue: 01.
    [33] M. Ruder, A. Dosovitskiy, and T. Brox, “Artistic style transfer for videos,” in Pattern Recognition: 38th German Conference, GCPR 2016, Hannover, Germany, September 12-15, 2016, Proceedings 38. Springer, 2016, pp. 26–36.
    [34] S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
    [35] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, pp. 211–252, 2015, publisher: Springer.
    [36] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
    [37] A. Semmo, D. Limberger, J. E. Kyprianidis, and J. Dollner, “Image stylization by oil ¨ paint filtering using color palettes,” in Proceedings of the workshop on Computational Aesthetics, 2015, pp. 149–158.
    [38] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [39] Q. Sun, Y. Chen, W. Tao, H. Jiang, M. Zhang, K. Chen, and M. Erdt, “A GAN-based approach toward architectural line drawing colorization prototyping,” The Visual Computer, pp. 1–18, 2022, publisher: Springer.
    [40] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
    [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
    [42] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky, “Texture networks: Feedforward synthesis of textures and stylized images,” arXiv preprint arXiv:1603.03417, 2016.
    [43] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6924– 6932.
    [44] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004, publisher: IEEE.
    [45] Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for image-to-image translation,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2849–2857.
    [46] H. Zhang and K. Dana, “Multi-style generative network for real-time transfer,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
    [47] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
    [48] Y. Zhang, F. Tang, W. Dong, H. Huang, C. Ma, T.-Y. Lee, and C. Xu, “Domain enhanced arbitrary image style transfer via contrastive learning,” in ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–8.
    [49] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
    [50] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE