| 研究生: |
姚巧緣 Yao, Qiao-Yuan |
|---|---|
| 論文名稱: |
雙策略銳化以加速擴散模型採樣:使用一對一目標重映射與無訓練自引導降低分佈模糊性 Dual-Strategy Sharpening for Accelerating Diffusion Sampling: Reducing Distribution Ambiguity with One-to-One Target Remapping and Training-Free Self-Guidance |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 擴散模型 、生成模型 |
| 外文關鍵詞: | Diffusion Models, Generative Models |
| 相關次數: | 點閱:17 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
擴散模型(Diffusion Models, DMs)作為當前生成模型領域的關鍵研究方向,透過學習逐步去噪的過程,使其能從隨機雜訊中逐步生成高品質影像,並已在多項圖像生成任務中取得顯著成果。然而,其高昂的生成延遲限制了在即時應用場景中的實用性。因此,如何在不犧牲生成品質的前提下有效減少推論步數,成為推進擴散模型實務應用的核心課題。 本論文提出一套名為「雙策略銳化 (Dual-Strategy Sharpening)」的創新方法,旨在解決上述挑戰。此方法結合了「一對一目標重映射 (One-to-One Target Remapping)」與「免訓練自我引導 (Training-Free Self-Guidance)」兩項核心策略,以改善生成分佈的模糊性。透過引導模型避免陷入低機率或生成分佈模糊的區域,本方法顯著提升了樣本品質。雙策略銳化具有高度的模型相容性,可直接應用於任何現有的擴散模型架構。實驗結果表明,在 CIFAR-10 資料集上,本方法成功將 EDM2 模型的 FID 從 2.84 降低至 2.70。更值得注意的是,即使在極低推論步數(NFE=7)的嚴苛條件下,本方法仍能將 FID 穩定控制在 13.03,充分展現了其在資源受限情境下維持優質生成效果的巨大潛力。
Diffusion Models (DMs) are a pivotal research direction in generative AI. By learning a step-by-step denoising process, DMs can progressively generate high-quality images from random noise and have achieved significant results in various image generation tasks. However, their high generation latency limits their practicality in real-time applications. Thus, reducing DMs inference steps without sacrificing quality is key to practical application. This paper proposes "Dual-Strategy Sharpening," an innovative method that combines "One-to-One Target Remapping" and "Training-Free Self-Guidance." It reduces generated distribution ambiguity and significantly enhances sample quality by guiding the model to avoid low-probability or distributionally ambiguous regions. Dual-Strategy Sharpening is highly compatible, directly applying to any existing diffusion model. On CIFAR-10, it reduced the EDM2 model's FID from 2.84 to 2.70. Even at extremely low inference steps (NFE=7), our method maintained an FID of 13.03, demonstrating its potential for high-quality generation in resource-constrained environments.
[1] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, S. Ganguli, Deep unsupervised learning using nonequilibrium thermodynamics, International conference on machine learning, pmlr, 2015, pp. 2256-2265.
[2] J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models, Advances in neural information processing systems 33 (2020) 6840-6851.
[3] T. Karras, M. Aittala, T. Aila, S. Laine, Elucidating the design space of diffusion-based generative models, Advances in neural information processing systems 35 (2022) 26565-26577.
[4] A. Süleyman, G. Biricik, Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation, arXiv preprint arXiv:2501.09194 (2025).
[5] C. Wang, Z. Hao, Y. Tang, J. Guo, Y. Yang, K. Han, Y. Wang, Sam-diffsr: Structure-modulated diffusion model for image super-resolution, arXiv preprint arXiv:2402.17133 (2024).
[6] B.B. Moser, A.S. Shanbhag, F. Raue, S. Frolov, S. Palacio, A. Dengel, Diffusion models, image super-resolution, and everything: A survey, IEEE Transactions on Neural Networks and Learning Systems (2024).
[7] S. Kim, S. Suh, M. Lee, Rad: Region-aware diffusion models for image inpainting, Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 2439-2448.
[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, Communications of the ACM 63(11) (2020) 139-144.
[9] D.P. Kingma, M. Welling, Auto-encoding variational bayes.
[10] K. Sohn, H. Lee, X. Yan, Learning structured output representation using deep conditional generative models, Advances in neural information processing systems 28 (2015).
[11] Y. Song, J. Sohl-Dickstein, D.P. Kingma, A. Kumar, S. Ermon, B. Poole, Score-Based Generative Modeling through Stochastic Differential Equations, International Conference on Learning Representations.
[12] J. Whang, M. Delbracio, H. Talebi, C. Saharia, A.G. Dimakis, P. Milanfar, Deblurring via stochastic refinement, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16293-16303.
[13] C. Saharia, J. Ho, W. Chan, T. Salimans, D.J. Fleet, M. Norouzi, Image super-resolution via iterative refinement, IEEE transactions on pattern analysis and machine intelligence 45(4) (2022) 4713-4726.
[14] C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet, M. Norouzi, Palette: Image-to-image diffusion models, ACM SIGGRAPH 2022 conference proceedings, 2022, pp. 1-10.
[15] M. Delbracio, P. Milanfar, Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration, Transactions on Machine Learning Research.
[16] S. Luo, Y. Tan, L. Huang, J. Li, H. Zhao, Latent Consistency Models: Synthesizing High-Resolution Images with Few-step Inference.
[17] J. Song, C. Meng, S. Ermon, Denoising diffusion implicit models, arXiv preprint arXiv:2010.02502 (2020).
[18] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, J. Zhu, Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, Advances in Neural Information Processing Systems 35 (2022) 5775-5787.
[19] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, J. Zhu, Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models, Machine Intelligence Research (2025) 1-22.
[20] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, I. Mitliagkas, Gotta Go Fast When Generating Data with Score-Based Models.
[21] H. Zheng, W. Nie, A. Vahdat, K. Azizzadenesheli, A. Anandkumar, Fast sampling of diffusion models via operator learning, International conference on machine learning, PMLR, 2023, pp. 42390-42402.
[22] C. Lu, Y. Song, Simplifying, stabilizing and scaling continuous-time consistency models, arXiv preprint arXiv:2410.11081 (2024).
[23] E. Luhman, T. Luhman, Knowledge distillation in iterative generative models for improved sampling speed, arXiv preprint arXiv:2101.02388 (2021).
[24] Z. Wang, H. Zheng, P. He, W. Chen, M. Zhou, Diffusion-GAN: Training GANs with Diffusion, International Conference on Learning Representations, 2023.
[25] A. Sauer, D. Lorenz, A. Blattmann, R. Rombach, Adversarial diffusion distillation, European Conference on Computer Vision, Springer, 2024, pp. 87-103.
[26] Y. Xu, Y. Zhao, Z. Xiao, T. Hou, Ufogen: You forward once large scale text-to-image generation via diffusion gans, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8196-8206.
[27] T. Salimans, J. Ho, Progressive Distillation for Fast Sampling of Diffusion Models, International Conference on Learning Representations.
[28] C. Meng, R. Rombach, R. Gao, D. Kingma, S. Ermon, J. Ho, T. Salimans, On distillation of guided diffusion models, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14297-14306.
[29] Y. Li, H. Wang, Q. Jin, J. Hu, P. Chemerys, Y. Fu, Y. Wang, S. Tulyakov, J. Ren, Snapfusion: Text-to-image diffusion model on mobile devices within two seconds, Advances in Neural Information Processing Systems 36 (2023) 20662-20678.
[30] Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, J. Zhu, Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, Advances in neural information processing systems 36 (2023) 8406-8441.
[31] W. Luo, T. Hu, S. Zhang, J. Sun, Z. Li, Z. Zhang, Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models, Advances in Neural Information Processing Systems 36 (2023) 76525-76546.
[32] T. Yin, M. Gharbi, R. Zhang, E. Shechtman, F. Durand, W.T. Freeman, T. Park, One-step diffusion with distribution matching distillation, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 6613-6623.
[33] T. Yin, M. Gharbi, T. Park, R. Zhang, E. Shechtman, F. Durand, B. Freeman, Improved distribution matching distillation for fast image synthesis, Advances in neural information processing systems 37 (2024) 47455-47487.
[34] S. Xie, Z. Xiao, D. Kingma, T. Hou, Y.N. Wu, K.P. Murphy, T. Salimans, B. Poole, R. Gao, Em distillation for one-step diffusion models, Advances in Neural Information Processing Systems 37 (2024) 45073-45104.
[35] T. Salimans, T. Mensink, J. Heek, E. Hoogeboom, Multistep distillation of diffusion models via moment matching, Advances in Neural Information Processing Systems 37 (2024) 36046-36070.
[36] Y. Song, P. Dhariwal, M. Chen, I. Sutskever, Consistency Models, International Conference on Machine Learning, PMLR, 2023, pp. 32211-32252.
[37] Z. Geng, A. Pokle, W. Luo, J. Lin, J. Zico Kolter, Consistency Models Made Easy, arXiv e-prints (2024) arXiv: 2406.14548.
[38] F.-Y. Wang, Z. Geng, H. Li, Stable consistency tuning: Understanding and improving consistency models, arXiv preprint arXiv:2410.18958 (2024).
[39] J. Gu, S. Zhai, Y. Zhang, L. Liu, Boot: Data-free distillation of denoising diffusion models with bootstrapping, 2023.
[40] A. Brock, J. Donahue, K. Simonyan, Large Scale GAN Training for High Fidelity Natural Image Synthesis, International Conference on Learning Representations.
[41] P. Dhariwal, A. Nichol, Diffusion models beat gans on image synthesis, Advances in neural information processing systems 34 (2021) 8780-8794.
[42] J. Ho, T. Salimans, CLASSIFIER-FREE DIFFUSION GUIDANCE, arXiv preprint arXiv:2207.12598 (2022).
[43] T. Karras, M. Aittala, T. Kynkäänniemi, J. Lehtinen, T. Aila, S. Laine, Guiding a Diffusion Model with a Bad Version of Itself, The Thirty-eighth Annual Conference on Neural Information Processing Systems.
[44] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in neural information processing systems 30 (2017).
[45] T. Kynkäänniemi, T. Karras, S. Laine, J. Lehtinen, T. Aila, Improved precision and recall metric for assessing generative models, Advances in neural information processing systems 32 (2019).
[46] A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009.
[47] T. Karras, M. Aittala, J. Lehtinen, J. Hellsten, T. Aila, S. Laine, Analyzing and Improving the Training Dynamics of Diffusion Models, CVPR, 2024.