簡易檢索 / 詳目顯示

研究生: 姚巧緣
Yao, Qiao-Yuan
論文名稱: 雙策略銳化以加速擴散模型採樣:使用一對一目標重映射與無訓練自引導降低分佈模糊性
Dual-Strategy Sharpening for Accelerating Diffusion Sampling: Reducing Distribution Ambiguity with One-to-One Target Remapping and Training-Free Self-Guidance
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 60
中文關鍵詞: 擴散模型生成模型
外文關鍵詞: Diffusion Models, Generative Models
相關次數: 點閱:17下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 擴散模型(Diffusion Models, DMs)作為當前生成模型領域的關鍵研究方向,透過學習逐步去噪的過程,使其能從隨機雜訊中逐步生成高品質影像,並已在多項圖像生成任務中取得顯著成果。然而,其高昂的生成延遲限制了在即時應用場景中的實用性。因此,如何在不犧牲生成品質的前提下有效減少推論步數,成為推進擴散模型實務應用的核心課題。 本論文提出一套名為「雙策略銳化 (Dual-Strategy Sharpening)」的創新方法,旨在解決上述挑戰。此方法結合了「一對一目標重映射 (One-to-One Target Remapping)」與「免訓練自我引導 (Training-Free Self-Guidance)」兩項核心策略,以改善生成分佈的模糊性。透過引導模型避免陷入低機率或生成分佈模糊的區域,本方法顯著提升了樣本品質。雙策略銳化具有高度的模型相容性,可直接應用於任何現有的擴散模型架構。實驗結果表明,在 CIFAR-10 資料集上,本方法成功將 EDM2 模型的 FID 從 2.84 降低至 2.70。更值得注意的是,即使在極低推論步數(NFE=7)的嚴苛條件下,本方法仍能將 FID 穩定控制在 13.03,充分展現了其在資源受限情境下維持優質生成效果的巨大潛力。

    Diffusion Models (DMs) are a pivotal research direction in generative AI. By learning a step-by-step denoising process, DMs can progressively generate high-quality images from random noise and have achieved significant results in various image generation tasks. However, their high generation latency limits their practicality in real-time applications. Thus, reducing DMs inference steps without sacrificing quality is key to practical application. This paper proposes "Dual-Strategy Sharpening," an innovative method that combines "One-to-One Target Remapping" and "Training-Free Self-Guidance." It reduces generated distribution ambiguity and significantly enhances sample quality by guiding the model to avoid low-probability or distributionally ambiguous regions. Dual-Strategy Sharpening is highly compatible, directly applying to any existing diffusion model. On CIFAR-10, it reduced the EDM2 model's FID from 2.84 to 2.70. Even at extremely low inference steps (NFE=7), our method maintained an FID of 13.03, demonstrating its potential for high-quality generation in resource-constrained environments.

    摘要 I Abstract II 致謝 III Content IV List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Literature Review-Accelerating DPMs 2 1.3.1 Training-Free Methods 3 1.3.2 Training-Based Methods 3 1.3.3 Distillation—Diffusion Distillation 4 1.3.4 Distillation—Consistency models(CMs) 4 1.4 Problems 5 1.5 Brief Description of Research Methods 7 Chapter 2 Preliminaries 8 2.1 Unifying Diffusion Model Variants: From DPM to EDM 10 2.1.1 Denoising Diffusion Probabilistic Models (DDPM) 10 2.1.2 Elucidating the Design Space of Diffusion Models (EDM) 10 2.2 Training Objectives and Closed-Form Solution 12 2.3 Guidance Techniques 14 2.3.1 Classifier Guidance 14 2.3.2 Classier-Free Guidance 16 2.3.3 Autoguidance 18 Chapter 3 Proposed Method 20 3.1 Target Remapping 20 3.1.1 Target Remapping Algorithm 23 3.1.2 Simulate the Entire Dataset with Buffer 25 3.2 Self-Guidance 27 3.2.1 Self-Guidance Algorithm 29 Chapter 4 Experiments 31 4.1 Evaluation Metrics 31 4.1.1 Fréchet Inception Distance (FID) 31 4.1.2 Fréchet distances using DINOv2 (FDDINOv2) 32 4.1.3 Precision and Recall 33 4.2 Experiment Setup 34 4.2.1 CIFAR-10 Dataset 34 4.2.2 EDM2 Architecture Description 35 4.2.3 Proposed Model Extension 37 4.2.4 Training Configuration 39 4.2.5 Baseline: EDM2 39 4.3 Results 39 4.3.1 Analysis of Target Remapping Results 40 4.3.2 Analysis of Self-guidance Results 41 4.3.3 Dual-Strategy Analysis 41 4.3.4 Impact of Sampling Steps on Generation Quality 42 Chapter 5 Conclusion and Future Work 45 Reference 47

    [1] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, S. Ganguli, Deep unsupervised learning using nonequilibrium thermodynamics, International conference on machine learning, pmlr, 2015, pp. 2256-2265.
    [2] J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models, Advances in neural information processing systems 33 (2020) 6840-6851.
    [3] T. Karras, M. Aittala, T. Aila, S. Laine, Elucidating the design space of diffusion-based generative models, Advances in neural information processing systems 35 (2022) 26565-26577.
    [4] A. Süleyman, G. Biricik, Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation, arXiv preprint arXiv:2501.09194 (2025).
    [5] C. Wang, Z. Hao, Y. Tang, J. Guo, Y. Yang, K. Han, Y. Wang, Sam-diffsr: Structure-modulated diffusion model for image super-resolution, arXiv preprint arXiv:2402.17133 (2024).
    [6] B.B. Moser, A.S. Shanbhag, F. Raue, S. Frolov, S. Palacio, A. Dengel, Diffusion models, image super-resolution, and everything: A survey, IEEE Transactions on Neural Networks and Learning Systems (2024).
    [7] S. Kim, S. Suh, M. Lee, Rad: Region-aware diffusion models for image inpainting, Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 2439-2448.
    [8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, Communications of the ACM 63(11) (2020) 139-144.
    [9] D.P. Kingma, M. Welling, Auto-encoding variational bayes.
    [10] K. Sohn, H. Lee, X. Yan, Learning structured output representation using deep conditional generative models, Advances in neural information processing systems 28 (2015).
    [11] Y. Song, J. Sohl-Dickstein, D.P. Kingma, A. Kumar, S. Ermon, B. Poole, Score-Based Generative Modeling through Stochastic Differential Equations, International Conference on Learning Representations.
    [12] J. Whang, M. Delbracio, H. Talebi, C. Saharia, A.G. Dimakis, P. Milanfar, Deblurring via stochastic refinement, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16293-16303.
    [13] C. Saharia, J. Ho, W. Chan, T. Salimans, D.J. Fleet, M. Norouzi, Image super-resolution via iterative refinement, IEEE transactions on pattern analysis and machine intelligence 45(4) (2022) 4713-4726.
    [14] C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet, M. Norouzi, Palette: Image-to-image diffusion models, ACM SIGGRAPH 2022 conference proceedings, 2022, pp. 1-10.
    [15] M. Delbracio, P. Milanfar, Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration, Transactions on Machine Learning Research.
    [16] S. Luo, Y. Tan, L. Huang, J. Li, H. Zhao, Latent Consistency Models: Synthesizing High-Resolution Images with Few-step Inference.
    [17] J. Song, C. Meng, S. Ermon, Denoising diffusion implicit models, arXiv preprint arXiv:2010.02502 (2020).
    [18] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, J. Zhu, Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, Advances in Neural Information Processing Systems 35 (2022) 5775-5787.
    [19] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, J. Zhu, Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models, Machine Intelligence Research (2025) 1-22.
    [20] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, I. Mitliagkas, Gotta Go Fast When Generating Data with Score-Based Models.
    [21] H. Zheng, W. Nie, A. Vahdat, K. Azizzadenesheli, A. Anandkumar, Fast sampling of diffusion models via operator learning, International conference on machine learning, PMLR, 2023, pp. 42390-42402.
    [22] C. Lu, Y. Song, Simplifying, stabilizing and scaling continuous-time consistency models, arXiv preprint arXiv:2410.11081 (2024).
    [23] E. Luhman, T. Luhman, Knowledge distillation in iterative generative models for improved sampling speed, arXiv preprint arXiv:2101.02388 (2021).
    [24] Z. Wang, H. Zheng, P. He, W. Chen, M. Zhou, Diffusion-GAN: Training GANs with Diffusion, International Conference on Learning Representations, 2023.
    [25] A. Sauer, D. Lorenz, A. Blattmann, R. Rombach, Adversarial diffusion distillation, European Conference on Computer Vision, Springer, 2024, pp. 87-103.
    [26] Y. Xu, Y. Zhao, Z. Xiao, T. Hou, Ufogen: You forward once large scale text-to-image generation via diffusion gans, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8196-8206.
    [27] T. Salimans, J. Ho, Progressive Distillation for Fast Sampling of Diffusion Models, International Conference on Learning Representations.
    [28] C. Meng, R. Rombach, R. Gao, D. Kingma, S. Ermon, J. Ho, T. Salimans, On distillation of guided diffusion models, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14297-14306.
    [29] Y. Li, H. Wang, Q. Jin, J. Hu, P. Chemerys, Y. Fu, Y. Wang, S. Tulyakov, J. Ren, Snapfusion: Text-to-image diffusion model on mobile devices within two seconds, Advances in Neural Information Processing Systems 36 (2023) 20662-20678.
    [30] Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, J. Zhu, Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, Advances in neural information processing systems 36 (2023) 8406-8441.
    [31] W. Luo, T. Hu, S. Zhang, J. Sun, Z. Li, Z. Zhang, Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models, Advances in Neural Information Processing Systems 36 (2023) 76525-76546.
    [32] T. Yin, M. Gharbi, R. Zhang, E. Shechtman, F. Durand, W.T. Freeman, T. Park, One-step diffusion with distribution matching distillation, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 6613-6623.
    [33] T. Yin, M. Gharbi, T. Park, R. Zhang, E. Shechtman, F. Durand, B. Freeman, Improved distribution matching distillation for fast image synthesis, Advances in neural information processing systems 37 (2024) 47455-47487.
    [34] S. Xie, Z. Xiao, D. Kingma, T. Hou, Y.N. Wu, K.P. Murphy, T. Salimans, B. Poole, R. Gao, Em distillation for one-step diffusion models, Advances in Neural Information Processing Systems 37 (2024) 45073-45104.
    [35] T. Salimans, T. Mensink, J. Heek, E. Hoogeboom, Multistep distillation of diffusion models via moment matching, Advances in Neural Information Processing Systems 37 (2024) 36046-36070.
    [36] Y. Song, P. Dhariwal, M. Chen, I. Sutskever, Consistency Models, International Conference on Machine Learning, PMLR, 2023, pp. 32211-32252.
    [37] Z. Geng, A. Pokle, W. Luo, J. Lin, J. Zico Kolter, Consistency Models Made Easy, arXiv e-prints (2024) arXiv: 2406.14548.
    [38] F.-Y. Wang, Z. Geng, H. Li, Stable consistency tuning: Understanding and improving consistency models, arXiv preprint arXiv:2410.18958 (2024).
    [39] J. Gu, S. Zhai, Y. Zhang, L. Liu, Boot: Data-free distillation of denoising diffusion models with bootstrapping, 2023.
    [40] A. Brock, J. Donahue, K. Simonyan, Large Scale GAN Training for High Fidelity Natural Image Synthesis, International Conference on Learning Representations.
    [41] P. Dhariwal, A. Nichol, Diffusion models beat gans on image synthesis, Advances in neural information processing systems 34 (2021) 8780-8794.
    [42] J. Ho, T. Salimans, CLASSIFIER-FREE DIFFUSION GUIDANCE, arXiv preprint arXiv:2207.12598 (2022).
    [43] T. Karras, M. Aittala, T. Kynkäänniemi, J. Lehtinen, T. Aila, S. Laine, Guiding a Diffusion Model with a Bad Version of Itself, The Thirty-eighth Annual Conference on Neural Information Processing Systems.
    [44] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in neural information processing systems 30 (2017).
    [45] T. Kynkäänniemi, T. Karras, S. Laine, J. Lehtinen, T. Aila, Improved precision and recall metric for assessing generative models, Advances in neural information processing systems 32 (2019).
    [46] A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009.
    [47] T. Karras, M. Aittala, J. Lehtinen, J. Hellsten, T. Aila, S. Laine, Analyzing and Improving the Training Dynamics of Diffusion Models, CVPR, 2024.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE