成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	姚巧緣 Yao, Qiao-Yuan
論文名稱：	雙策略銳化以加速擴散模型採樣：使用一對一目標重映射與無訓練自引導降低分佈模糊性 Dual-Strategy Sharpening for Accelerating Diffusion Sampling: Reducing Distribution Ambiguity with One-to-One Target Remapping and Training-Free Self-Guidance
指導教授：	吳宗憲 Wu, Chung-Hsien
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	英文
論文頁數：	60
中文關鍵詞：	擴散模型、生成模型
外文關鍵詞：	Diffusion Models, Generative Models
相關次數：	點閱：17 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

擴散模型（Diffusion Models, DMs）作為當前生成模型領域的關鍵研究方向，透過學習逐步去噪的過程，使其能從隨機雜訊中逐步生成高品質影像，並已在多項圖像生成任務中取得顯著成果。然而，其高昂的生成延遲限制了在即時應用場景中的實用性。因此，如何在不犧牲生成品質的前提下有效減少推論步數，成為推進擴散模型實務應用的核心課題。本論文提出一套名為「雙策略銳化 (Dual-Strategy Sharpening)」的創新方法，旨在解決上述挑戰。此方法結合了「一對一目標重映射 (One-to-One Target Remapping)」與「免訓練自我引導 (Training-Free Self-Guidance)」兩項核心策略，以改善生成分佈的模糊性。透過引導模型避免陷入低機率或生成分佈模糊的區域，本方法顯著提升了樣本品質。雙策略銳化具有高度的模型相容性，可直接應用於任何現有的擴散模型架構。實驗結果表明，在 CIFAR-10 資料集上，本方法成功將 EDM2 模型的 FID 從 2.84 降低至 2.70。更值得注意的是，即使在極低推論步數（NFE=7）的嚴苛條件下，本方法仍能將 FID 穩定控制在 13.03，充分展現了其在資源受限情境下維持優質生成效果的巨大潛力。

Diffusion Models (DMs) are a pivotal research direction in generative AI. By learning a step-by-step denoising process, DMs can progressively generate high-quality images from random noise and have achieved significant results in various image generation tasks. However, their high generation latency limits their practicality in real-time applications. Thus, reducing DMs inference steps without sacrificing quality is key to practical application. This paper proposes "Dual-Strategy Sharpening," an innovative method that combines "One-to-One Target Remapping" and "Training-Free Self-Guidance." It reduces generated distribution ambiguity and significantly enhances sample quality by guiding the model to avoid low-probability or distributionally ambiguous regions. Dual-Strategy Sharpening is highly compatible, directly applying to any existing diffusion model. On CIFAR-10, it reduced the EDM2 model's FID from 2.84 to 2.70. Even at extremely low inference steps (NFE=7), our method maintained an FID of 13.03, demonstrating its potential for high-quality generation in resource-constrained environments.

摘要	I
Abstract	II
致謝	III
Content	IV
List of Tables	VII
List of Figures	VIII
Chapter 1	Introduction	1
1	Background	1
2	Motivation	2
3	Literature Review-Accelerating DPMs	2
3.1	Training-Free Methods	3
3.2	Training-Based Methods	3
3.3	Distillation—Diffusion Distillation	4
3.4	Distillation—Consistency models(CMs)	4
4	Problems	5
5	Brief Description of Research Methods	7
Chapter 2	Preliminaries	8
1	Unifying Diffusion Model Variants: From DPM to EDM	10
1.1	Denoising Diffusion Probabilistic Models (DDPM)	10
1.2	Elucidating the Design Space of Diffusion Models (EDM)	10
2	Training Objectives and Closed-Form Solution	12
3	Guidance Techniques	14
3.1	Classifier Guidance	14
3.2 	Classier-Free Guidance	16
3.3 	Autoguidance	18
Chapter 3	Proposed Method	20
1	Target Remapping	20
1.1	Target Remapping Algorithm	23
1.2	Simulate the Entire Dataset with Buffer	25
2	Self-Guidance	27
2.1	Self-Guidance Algorithm	29
Chapter 4	Experiments	31
1	Evaluation Metrics	31
1.1	Fréchet Inception Distance (FID)	31
1.2	Fréchet distances using DINOv2 (FDDINOv2)	32
1.3	Precision and Recall	33
2	Experiment Setup	34
2.1	CIFAR-10 Dataset	34
2.2	EDM2 Architecture Description	35
2.3	Proposed Model Extension	37
2.4	Training Configuration	39
2.5	Baseline: EDM2	39
3	Results	39
3.1	Analysis of Target Remapping Results	40
3.2	Analysis of Self-guidance Results	41
3.3	Dual-Strategy Analysis	41
3.4	Impact of Sampling Steps on Generation Quality	42
Chapter 5	Conclusion and Future Work	45
Reference	47
                                    

[1] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, S. Ganguli, Deep unsupervised learning using nonequilibrium thermodynamics, International conference on machine learning, pmlr, 2015, pp. 2256-2265.
[2] J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models, Advances in neural information processing systems 33 (2020) 6840-6851.
[3] T. Karras, M. Aittala, T. Aila, S. Laine, Elucidating the design space of diffusion-based generative models, Advances in neural information processing systems 35 (2022) 26565-26577.
[4] A. Süleyman, G. Biricik, Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation, arXiv preprint arXiv:2501.09194 (2025).
[5] C. Wang, Z. Hao, Y. Tang, J. Guo, Y. Yang, K. Han, Y. Wang, Sam-diffsr: Structure-modulated diffusion model for image super-resolution, arXiv preprint arXiv:2402.17133 (2024).
[6] B.B. Moser, A.S. Shanbhag, F. Raue, S. Frolov, S. Palacio, A. Dengel, Diffusion models, image super-resolution, and everything: A survey, IEEE Transactions on Neural Networks and Learning Systems (2024).
[7] S. Kim, S. Suh, M. Lee, Rad: Region-aware diffusion models for image inpainting, Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 2439-2448.
[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, Communications of the ACM 63(11) (2020) 139-144.
[9] D.P. Kingma, M. Welling, Auto-encoding variational bayes.
[10] K. Sohn, H. Lee, X. Yan, Learning structured output representation using deep conditional generative models, Advances in neural information processing systems 28 (2015).
[11] Y. Song, J. Sohl-Dickstein, D.P. Kingma, A. Kumar, S. Ermon, B. Poole, Score-Based Generative Modeling through Stochastic Differential Equations, International Conference on Learning Representations.
[12] J. Whang, M. Delbracio, H. Talebi, C. Saharia, A.G. Dimakis, P. Milanfar, Deblurring via stochastic refinement, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16293-16303.
[13] C. Saharia, J. Ho, W. Chan, T. Salimans, D.J. Fleet, M. Norouzi, Image super-resolution via iterative refinement, IEEE transactions on pattern analysis and machine intelligence 45(4) (2022) 4713-4726.
[14] C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet, M. Norouzi, Palette: Image-to-image diffusion models, ACM SIGGRAPH 2022 conference proceedings, 2022, pp. 1-10.
[15] M. Delbracio, P. Milanfar, Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration, Transactions on Machine Learning Research.
[16] S. Luo, Y. Tan, L. Huang, J. Li, H. Zhao, Latent Consistency Models: Synthesizing High-Resolution Images with Few-step Inference.
[17] J. Song, C. Meng, S. Ermon, Denoising diffusion implicit models, arXiv preprint arXiv:2010.02502 (2020).
[18] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, J. Zhu, Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, Advances in Neural Information Processing Systems 35 (2022) 5775-5787.
[19] C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, J. Zhu, Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models, Machine Intelligence Research (2025) 1-22.
[20] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, I. Mitliagkas, Gotta Go Fast When Generating Data with Score-Based Models.
[21] H. Zheng, W. Nie, A. Vahdat, K. Azizzadenesheli, A. Anandkumar, Fast sampling of diffusion models via operator learning, International conference on machine learning, PMLR, 2023, pp. 42390-42402.
[22] C. Lu, Y. Song, Simplifying, stabilizing and scaling continuous-time consistency models, arXiv preprint arXiv:2410.11081 (2024).
[23] E. Luhman, T. Luhman, Knowledge distillation in iterative generative models for improved sampling speed, arXiv preprint arXiv:2101.02388 (2021).
[24] Z. Wang, H. Zheng, P. He, W. Chen, M. Zhou, Diffusion-GAN: Training GANs with Diffusion, International Conference on Learning Representations, 2023.
[25] A. Sauer, D. Lorenz, A. Blattmann, R. Rombach, Adversarial diffusion distillation, European Conference on Computer Vision, Springer, 2024, pp. 87-103.
[26] Y. Xu, Y. Zhao, Z. Xiao, T. Hou, Ufogen: You forward once large scale text-to-image generation via diffusion gans, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8196-8206.
[27] T. Salimans, J. Ho, Progressive Distillation for Fast Sampling of Diffusion Models, International Conference on Learning Representations.
[28] C. Meng, R. Rombach, R. Gao, D. Kingma, S. Ermon, J. Ho, T. Salimans, On distillation of guided diffusion models, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14297-14306.
[29] Y. Li, H. Wang, Q. Jin, J. Hu, P. Chemerys, Y. Fu, Y. Wang, S. Tulyakov, J. Ren, Snapfusion: Text-to-image diffusion model on mobile devices within two seconds, Advances in Neural Information Processing Systems 36 (2023) 20662-20678.
[30] Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, J. Zhu, Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, Advances in neural information processing systems 36 (2023) 8406-8441.
[31] W. Luo, T. Hu, S. Zhang, J. Sun, Z. Li, Z. Zhang, Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models, Advances in Neural Information Processing Systems 36 (2023) 76525-76546.
[32] T. Yin, M. Gharbi, R. Zhang, E. Shechtman, F. Durand, W.T. Freeman, T. Park, One-step diffusion with distribution matching distillation, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 6613-6623.
[33] T. Yin, M. Gharbi, T. Park, R. Zhang, E. Shechtman, F. Durand, B. Freeman, Improved distribution matching distillation for fast image synthesis, Advances in neural information processing systems 37 (2024) 47455-47487.
[34] S. Xie, Z. Xiao, D. Kingma, T. Hou, Y.N. Wu, K.P. Murphy, T. Salimans, B. Poole, R. Gao, Em distillation for one-step diffusion models, Advances in Neural Information Processing Systems 37 (2024) 45073-45104.
[35] T. Salimans, T. Mensink, J. Heek, E. Hoogeboom, Multistep distillation of diffusion models via moment matching, Advances in Neural Information Processing Systems 37 (2024) 36046-36070.
[36] Y. Song, P. Dhariwal, M. Chen, I. Sutskever, Consistency Models, International Conference on Machine Learning, PMLR, 2023, pp. 32211-32252.
[37] Z. Geng, A. Pokle, W. Luo, J. Lin, J. Zico Kolter, Consistency Models Made Easy, arXiv e-prints (2024) arXiv: 2406.14548.
[38] F.-Y. Wang, Z. Geng, H. Li, Stable consistency tuning: Understanding and improving consistency models, arXiv preprint arXiv:2410.18958 (2024).
[39] J. Gu, S. Zhai, Y. Zhang, L. Liu, Boot: Data-free distillation of denoising diffusion models with bootstrapping, 2023.
[40] A. Brock, J. Donahue, K. Simonyan, Large Scale GAN Training for High Fidelity Natural Image Synthesis, International Conference on Learning Representations.
[41] P. Dhariwal, A. Nichol, Diffusion models beat gans on image synthesis, Advances in neural information processing systems 34 (2021) 8780-8794.
[42] J. Ho, T. Salimans, CLASSIFIER-FREE DIFFUSION GUIDANCE, arXiv preprint arXiv:2207.12598 (2022).
[43] T. Karras, M. Aittala, T. Kynkäänniemi, J. Lehtinen, T. Aila, S. Laine, Guiding a Diffusion Model with a Bad Version of Itself, The Thirty-eighth Annual Conference on Neural Information Processing Systems.
[44] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in neural information processing systems 30 (2017).
[45] T. Kynkäänniemi, T. Karras, S. Laine, J. Lehtinen, T. Aila, Improved precision and recall metric for assessing generative models, Advances in neural information processing systems 32 (2019).
[46] A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009.
[47] T. Karras, M. Aittala, J. Lehtinen, J. Hellsten, T. Aila, S. Laine, Analyzing and Improving the Training Dynamics of Diffusion Models, CVPR, 2024.

校內：立即公開
校外：立即公開

簡易檢索 / 詳目顯示

相關論文