簡易檢索 / 詳目顯示

研究生: 董屹煊
Dong, Yixuan
論文名稱: 基於標籤保持資料擴增之特定領域圖像分類性能優化研究
A Method for Boosting Performance in Domain-Specific Image Classification by Utilizing Data Augmentation that Preserves Class Labels
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 92
中文關鍵詞: 生成式模型數據增強擴散模型圖像分類
外文關鍵詞: Generative Model, Data Augmentation, Diffusion Models, Image Classification
相關次數: 點閱:26下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在特定領域圖像分類任務中,數據增強技術對於提升模型泛化能力與解決數據稀缺問題至關重要。本研究提出了一種創新的數據增強框架——顯著性引導的擴散混合方法,旨在解決現有方法在提升生成數據之「多樣性」、「忠實性」與「標籤清晰度」時面臨的瓶頸。顯著性引導的擴散混合方法首次實現了對此三大核心維度的同步優化,為細粒度識別、長尾分佈及少樣本學習等挑戰性場景中高品質增強數據的生成確立了新的方法論基準。本框架特別針對標籤保留、語義一致性與背景多樣性之間的協調,提出了一套創新的解決方案。
    傳統的非生成式混合方法,因其線性合成的本質,常導致語義模糊,難以確保標籤與內容的一致性。另一方面,儘管基於擴散模型的增強策略具備非線性生成能力,卻普遍面臨語義漂移、標籤模糊以及生成變異性不足等問題。現有方法往往僅側重於優化部分維度,未能充分解決擴散模型在強轉換條件下的語義不穩定性與模型依賴性,從而限制了其在實際應用中持續提升模型泛化性能的潛力。
    為克服上述局限,顯著性引導的擴散混合方法框架融合了顯著性引導機制與擴散模型精煉技術。本框架透過以下步驟實現語義前景的精確保留、背景的高度多樣化替換以及標籤的高度一致性:首先,透過計算顯著圖之間的相異性度量,選取與源圖像前景區域重疊最小的目標圖像,以最大化背景內容的差異。其次,採用先進的圖像分割與遮罩融合技術,對前景與背景進行像素級重組,確保僅保留源圖像的核心語義區域,從而精確控制語義信息的流動。最終,將重組後的圖像輸入至一個經過特定微調的擴散模型中,進行非線性高保真重建。該生成過程藉由結構化的文本提示(包含可學習的類別嵌入與描述符)進行引導,並通過調節轉換強度參數,精細地平衡生成圖像的忠實度與多樣性。此整體設計確保了源圖像標籤的完整保留,從根本上規避了標籤與語義失配的問題,保證了標籤的清晰度。
    本研究之實驗結果驗證了顯著性引導的擴散混合方法框架在多種挑戰性場景下的卓越性能,在不同細粒度圖像分類任務中,應用顯著性引導的擴散混合方法於殘差神經網路與視覺變換器等骨幹網路後,平均準確率分別達到了92.14%和92.51%,創下新的記錄。在長尾分佈數據集上的實驗表明,顯著性引導的擴散混合方法能夠顯著提升模型在不同失衡程度下的分類性能,尤其對於稀有類別的識別效果改善特別突出。在少樣本學習場景下,無論是每類別五個訓練樣本的少樣本學習還是每類別十個訓練樣本的少樣本學習設定,顯著性引導的擴散混合方法均超越了現有方法,充分展示了其在數據極度匱乏情境下的強大生成與增強能力。此外,通過其前景-背景解耦設計,顯著性引導的擴散混合方法亦有效提升了模型對背景變化的魯棒性。
    顯著性引導的擴散混合方法以其一體化架構,首次實現了語義可靠的前景內容與語境多元的背景信息的協同生成,有效應對了數據增強在語義保持、多樣性轉換與標籤穩定性之間長期存在的挑戰。本研究為特定領域圖像分類任務提供了一種全新且具備良好擴展性的高品質數據增強方式,對於解決數據增強領域長期存在的性能瓶頸具有重要的理論價值,並為未來在可控圖像生成與語義保持方面的研究奠定了堅實的基礎。

    In domain-specific image classification tasks, data augmentation techniques are crucial for enhancing model generalization and addressing data scarcity issues. This study introduces an innovative data augmentation framework, SGD-Mix, designed to overcome the structural bottlenecks faced by existing methods in enhancing the “diversity,” “fidelity,” and “label clarity” of generated data. SGD-Mix achieves the first simultaneous optimization of these three core dimensions, establishing a new methodological benchmark for generating high-quality augmented data in challenging scenarios such as fine-grained recognition, long-tailed distributions, and few-shot learning. The framework specifically proposes an innovative solution to coordinate label preservation, semantic consistency, and background diversity.
    Traditional non-generative mixing methods, due to their inherent linear synthesis nature, often lead to semantic ambiguity, making it difficult to ensure consistency between labels and content. On the other hand, while augmentation strategies based on diffusion models possess non-linear generative capabilities, they commonly suffer from issues such as semantic drift, label ambiguity, and insufficient generative variability. Existing methods tend to focus only on optimizing partial dimensions, failing to adequately address the semantic instability and model dependency of diffusion models under strong transformation conditions, thereby limiting their potential to continuously improve model generalization performance in practical applications.
    To overcome current limitations, we propose SGD-Mix, a novel framework that integrates a saliency-guided mechanism with diffusion model refinement. This approach precisely preserves semantic foregrounds, achieves highly diverse background replacements, and ensures strong label consistency. SGD-Mix operates by first selecting target images with minimal foreground overlap to maximize background diversity. Then, it employs advanced segmentation and mask fusion for pixel-level recombination, preserving only the core semantic regions. Finally, a fine-tuned diffusion model reconstructs the image with high fidelity, guided by structured text prompts and an adjustable transformation strength to balance faithfulness and diversity. This design fundamentally circumvents label-semantic mismatch, guaranteeing label clarity and enhancing model generalization.
    Extensive experimental results validate the superior performance of the SGD-Mix framework across various challenging scenarios. In fine-grained image classification tasks (on datasets such as CUB, Cars, Flowers, Dogs, Aircraft), applying SGD-Mix to backbone networks like ResNet50 and ViT resulted in average accuracies of 92.14% and 92.51%, respectively, setting new records. Experiments on long-tailed distribution datasets demonstrated that SGD-Mix significantly improves model classification performance under different imbalance ratios, with particularly notable improvements in the recognition of rare classes. In few-shot learning scenarios, whether in 5-shot or 10-shot settings, SGD-Mix surpassed existing methods, fully showcasing its powerful generation and augmentation capabilities in situations of extreme data scarcity. Furthermore, through its foreground-background decoupling design, SGD-Mix also effectively enhances model robustness to background variations.
    With its integrated architecture, SGD-Mix is the first to achieve the synergistic generation of semantically reliable foreground content and contextually diverse background information, effectively addressing the long-standing challenges in data augmentation regarding semantic preservation, diversity transformation, and label stability. This research provides a novel and highly scalable path for high-quality data augmentation in domain-specific image classification tasks, holding significant theoretical and practical value for resolving persistent performance bottlenecks in the field of data augmentation and laying a solid foundation for future research in controllable image generation and semantic preservation.

    中文摘要 i Abstract iii 誌謝 vi Contents viii List of Tables xi List of Figures xiii 1 Introduction 1 1.1 Background 1 1.2 Research Motivation 2 1.3 Research Objectives and Contributions 4 1.4 Thesis Organization 5 2 Literature Review 7 2.1 Data Augmentation Techniques 7 2.1.1 Traditional Mix-Based Methods 7 2.1.2 Saliency-Guided and Semantic-Aware Methods 8 2.1.3 Generative Model-Based Augmentation 9 2.2 Saliency Detection and Thresholding 10 2.3 Diffusion Models 12 2.3.1 Fundamentals 12 2.3.2 Fine-Tuning Techniques 13 2.4 Summary 13 3 Problem Analysis 15 3.1 Key Factors in Effective Data Augmentation 15 3.2 Limitations of Existing Methods 16 3.2.1 Non-Generative Methods 16 3.2.2 Diffusion-Based Methods 17 3.3 Problem Statement 22 4 SGD-Mix Framework 23 4.1 Stage1: Saliency-Guided Foreground Alignment 25 4.2 Stage2: Semantic Compositing 25 4.3 Stage3: Diffusion-Based Domain Refinement 26 5 Experiments 30 5.1 Fine-Grained Visual Classification 31 5.2 Long-Tail Classification 34 5.3 Few-Shot Classification 37 5.4 Background Robustness 39 6 Discussion 41 7 Conclusion and Future Works 51 7.1 Conclusion 51 7.2 Limitations 52 7.3 Future Works 53 References 55 Appendix A 65 A.1 Experimental Details 65 A.1.1 Dataset Sources 65 A.1.2 Backbones 67 A.1.3 Implementation Details of SGD-Mix 67 A.1.4 Sub-Experiments 69 A.2 Visualization 70 A.2.1 Complete Attention Maps Before and After Saliency-Guided Mixing 70 A.2.2 Complete Examples of Generated Images Under Varying Translation Strengths 72

    [1] Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, and Karthik Nandakumar. Diffusemix: Label-preserving data augmentation with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27621-27630, 2024.
    [2] Alhassan Mumuni and Fuseini Mumuni. Data augmentation: A comprehensive survey of modern approaches. Array, 16:100258, 2022.
    [3] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sanchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60-88, 2017.
    [4] Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1-48, 2019.
    [5] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
    [6] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6023-6032, 2019.
    [7] Nikita Araslanov and Stefan Roth. Self-supervised augmentation consistency for adapting semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15384-15394, 2021.
    [8] Huafeng Qin, Xin Jin, Hongyu Zhu, Hongchao Liao, Mounim A El-Yacoubi, and Xinbo Gao. Sumix: Mixup with semantic and uncertain information. In European Conference on Computer Vision, pages 70-88. Springer, 2024.
    [9] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
    [10] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840-6851, 2020.
    [11] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162-8171. PMLR, 2021.
    [12] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations.
    [13] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations.
    [14] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684-10695, 2022.
    [15] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479-36494, 2022.
    [16] Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780-8794, 2021.
    [17] Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784-16804. PMLR, 2022.
    [18] Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, and Qi Tian. Enhance image classification via inter-class image mixup with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17223-17233, 2024.
    [19] Minsoo Kang and Suhyun Kim. Guidedmixup: An efficient mixup strategy guided by saliency maps. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1096-1104, 2023.
    [20] Jie Qin, Jiemin Fang, Qian Zhang, Wenyu Liu, Xingang Wang, and Xinggang Wang. Resizemix: Mixing data with preserved object information and true labels. arXiv preprint arXiv:2012.11101, 2020.
    [21] Jang-Hyun Kim, Wonho Choo, and Hyun Oh Song. Puzzle mix: Exploiting saliency and local statistics for optimal mixup. In International Conference on Machine Learning, pages 5275-5285. PMLR, 2020.
    [22] Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, and David J Fleet. Synthetic data from diffusion models improves imagenet classification. Transactions on Machine Learning Research.
    [23] Laurent Itti, Christof Koch, and Ernst Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259, 2002.
    [24] Xiaodi Hou and Liqing Zhang. Saliency detection: A spectral residual approach. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2007.
    [25] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 618-626, 2017.
    [26] K Simonyan, A Vedaldi, and A Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. International Conference on Learning Representations, 2014.
    [27] Rui Zhao, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. Saliency detection by multi-context deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1265-1274, 2015.
    [28] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2921-2929, 2016.
    [29] Nobuyuki Otsu et al. A threshold selection method from gray-level histograms. Automatica, 11(285-296):23-27, 1975.
    [30] Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-or. An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations.
    [31] Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500-22510, 2023.
    [32] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234-241. Springer, 2015.
    [33] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022.
    [34] Yixuan Dong, Fang-Yi Su, and Jung-Hsien Chiang. Sgd-mix: Enhancing domain specific image classification with label-preserving data augmentation. arXiv preprint arXiv:2505.11813, 2025.
    [35] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations.
    [36] Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, and Yongdong Zhang. Deadiff: An efficient stylization diffusion model with disentangled representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8693-8702, 2024.
    [37] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. 2011.
    [38] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 554-561, 2013.
    [39] Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722-729. IEEE, 2008.
    [40] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), volume 2, 2011.
    [41] Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. Fine-grained visual classification of aircraft. 2013.
    [42] Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks. In International Conference on Learning Representations.
    [43] Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, and Xiaojuan Qi. Is synthetic data from generative models ready for image recognition? In The Eleventh International Conference on Learning Representations.
    [44] Brandon Trabucco, Kyle Doherty, Max Gurinas, and Ruslan Salakhutdinov. Effective data augmentation with diffusion models. In R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models.
    [45] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.
    [46] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, G Heigold, S Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
    [47] Dvir Samuel, Yuval Atzmon, and Gal Chechik. From generalized zero-shot leaning to long-tail with class descriptors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 286-295, 2021.
    [48] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019.
    [49] Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, and Tae-Hyun Oh. Exploiting synthetic data for data imbalance problems: Baselines from a data perspective. arXiv preprint arXiv:2308.00994, 6, 2023.
    [50] AFM Shahab Uddin, Mst Sirazam Monira, Wheemyung Shin, TaeChoong Chung, and Sung-Ho Bae. Saliencymix: A saliency guided data augmentation strategy for better regularization. In International Conference on Learning Representations.
    [51] Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2918-2928, 2021.
    [52] Seulki Park, Youngkyu Hong, Byeongho Heo, Sangdoo Yun, and Jin Young Choi. The majority can help the minority: Context-rich minority oversampling for long-tailed classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6887-6896, 2022.
    [53] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452-1464, 2017.
    [54] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748-8763. PMLR, 2021.
    [55] Olivier Chapelle, Jason Weston, Leon Bottou, and Vladimir Vapnik. Vicinal risk minimization. Advances in neural information processing systems, 13, 2000.
    [56] Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. "Grabcut" interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG), 23(3):309-314, 2004.
    [57] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5505-5514, 2018.
    [58] Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steven M Seitz, and Ira Kemelmacher-Shlizerman. Background matting: The world is your green screen. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2291-2300, 2020.
    [59] Cheng Lu, Yuhao Zhou, FanBao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775-5787, 2022, 2022.
    [60] Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1931-1941, 2023.
    [61] Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, and Bernhard Scholkopf. Controlling text-to-image diffusion by orthogonal finetuning. Advances in Neural Information Processing Systems, 36:79320-79362, 2023.
    [62] Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2537-2546, 2019.
    [63] Stability AI, RunwayML, and CompVis. Stable diffusion v1.5, 2022. Accessed: March 05, 2025.
    [64] Rafael Muller, Simon Kornblith, and Geoffrey E Hinton. When does label smoothing help? Advances in neural information processing systems, 32, 2019.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE