簡易檢索 / 詳目顯示

研究生: 楊佳恩
Yang, Chia-En
論文名稱: 基於擴散模型的零樣本影像異常偵測
Stable Diffusion Zero-Shot Anomaly Detection with Segment Anything
指導教授: 李韶曼
Lee, Shao-Man
學位類別: 碩士
Master
系所名稱: 敏求智慧運算學院 - 智慧科技系統碩士學位學程
MS Degree Program on Intelligent Technology Systems
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 73
中文關鍵詞: 異常檢測擴散模型影像分割基礎模型工業品質檢驗
外文關鍵詞: anomaly detection, diffusion model, segment anything, Industrial quality inspection
相關次數: 點閱:101下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像異常偵測旨在從大量正常樣品影像中識別出少量異常特徵,這在自動化工業品質檢驗和醫學診斷中扮演著重要角色。以往的研究主要專注於為每個任務訓練專屬模型,這需要大量人力來獲取和訓練特定任務模型所需的圖像與標註。在本研究中,我們打破這種傳統做法,從基於重建圖像的異常偵測方法中汲取靈感,利用基礎模型適應各種場景的能力。我們提出一個全新框架——穩定擴散異常偵測(SDAD),該框架通過使用預訓練的擴散模型重建目標圖像,並利用影像分割基礎模型增強現代基礎模型在異常偵測方面的適應性。在 VisA 與 MVTec-AD 資料集上,SDAD 在零樣本異常偵測中取得優異的成果,而且不需要任何圖像資料進行微調。這突顯我們的框架在實現異常偵測的有效性,同時避免傳統方法所面臨的限制。

    Visual anomaly detection is essential for industrial quality inspection and medical diagnosis. Previous research in this field has focused on training custom models for each specific task, which requires thousands of images and annotation. In this work, we depart from this approach, drawing inspiration from reconstruction-based methodologies and leveraging the remarkable zero-shot generalization capabilities of foundation models. We propose a novel framework, Stable Diffusion Anomaly Detection (SDAD), which operates by reconstructing target images using pre-trained diffusion models and employs Segment Anything to enhance the adaptability of modern foundation models to anomaly detection. In VisA and MVTec-AD dataset, SDAD achieves state-of-the-art results in zero-shot visual anomaly detection without further tuning. This highlights the effectiveness of our framework in achieving superior anomaly detection performance without the task-specific constraints of traditional approaches.

    Abstract Ⅰ Contents Ⅲ List of Figures VI List of Tables ⅥI Chapter 1 Introduction 9 Chapter 2 Related Works 11 Chapter 2.1 Anomaly Detection 11 Chapter 2.1.1 Autoencoder-based Anomaly Detection 11 Chapter 2.1.2 GAN-based Anomaly Detection 13 Chapter 2.1.3 Diffusion-based Anomaly Detection 14 Chapter 2.1.4 Zero-shot Anomaly Detection 15 Chapter 2.2 Vision Transformer 17 Chapter 2.3 Large Pre-trained Model 17 Chapter 2.3.1 Masked Autoencoders 17 Chapter 2.3.2 CLIP 18 Chapter 2.3.3 Segment Anything 18 Chapter 2.3.4 Stable Diffusion 19 Chapter 2.4 Adapters for Large Pre-trained Models 19 Chapter 3 Methodology 21 Chapter 3.1 Framework 21 Chapter 3.2 Background Remover 22 Chapter 3.3 Image Reconstructor 24 Chapter 3.4 Change Detector 26 Chapter 4 Experiments 28 Chapter 4.1 Datasets 28 Chapter 4.1.1 MVTec-AD 29 Chapter 4.1.2 VisA 30 Chapter 4.1.3 UCSD Ped2 30 Chapter 4.2 Evaluation Metrics 33 Chapter 4.2.1 AUROC 33 Chapter 4.2.2 F1 score 34 Chapter 4.3 Implementation Details 35 Chapter 4.4 Results 36 Chapter 4.4.1 Quantitative Results 36 Chapter 4.4.2 Qualitative Results 43 Chapter 4.5 Ablation study 48 Chapter 4.5.1 Background Remover 49 Chapter 4.5.2 Image prompt 51 Chapter 4.5.3 Anomaly score 53 Chapter 5 Discussion 56 Chapter 5.1 Image Reconstructor 56 Chapter 5.1.1 Image to Image Inpainting Stable Diffusion 56 Chapter 5.1.2 ControlNet 58 Chapter 5.1.3 RePaint 59 Chapter 5.1.4 Stable diffusion Twins 60 5.2 Background Remover 61 5.3 Change Detector 62 Chapter 6 Conclusion 65 Chapter 7 Future Works and Limitations 66

    [1] M. S. Minhas and J. Zelek, “Semi-supervised Anomaly Detection using AutoEncoders,” arXiv.org, Jan. 06, 2020.
    [2] N. Shvetsova, B. Bakker, I. Fedulova, H. Schulz, and D. V. Dylov, “Anomaly Detection in Medical Imaging With Deep Perceptual Autoencoders,” IEEE Access, vol. 9, pp. 118571–118583, 2021.
    [3] C. Baur, R. Graf, Benedikt Wiestler, Shadi Albarqouni, and Nassir Navab, “SteGANomaly: Inhibiting CycleGAN Steganography for Unsupervised Anomaly Detection in Brain MRI,” Lecture Notes in Computer Science, pp. 718–727, Jan. 2020.
    [4] D. Stepec and D. Skocaj, “Unsupervised Detection of Cancerous Regions in Histology Imagery using Image-to-Image Translation,” arXiv.org, Apr. 28, 2021.
    [5] J. Wolleb, F. Bieder, R. Sandkühler, and P. C. Cattin, “Diffusion Models for Medical Anomaly Detection,” arXiv:2203.04306 [cs, eess], Oct. 2022.
    [6] J. Wyatt, A. Leach, S. Schmon, and C. Willcocks, “AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise.”
    [7] V. Zavrtanik, M. Kristan, and D. Skočaj, “DRAEM -- A discriminatively trained reconstruction embedding for surface anomaly detection,” arXiv.org, 2021.
    [8] H. Zhang, Z. Wang, Z. Wu, and Y.-G. Jiang, “DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection,” arXiv.org, 2023.
    [9] A. Mousakhan, T. Brox, and J. Tayyub, “Anomaly Detection with Conditioned Denoising Diffusion Models,” arXiv.org, 2023.
    [10] J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation,” arXiv.org, Mar. 26, 2023.
    [11] Y. Cao et al., “Segment Any Anomaly without Training via Hybrid Prompt Regularization,” arXiv.org, May 18, 2023.
    [12] C. Schuhmann et al., “LAION-5B: An open large-scale dataset for training next generation image-text models,” arXiv:2210.08402 [cs], Oct. 2022.
    [13] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” arXiv:2112.10752 [cs], Apr. 2022.
    [14] A. Kirillov et al., “Segment Anything,” arXiv:2304.02643 [cs], Apr. 2023.
    [15] H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, “IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models,” arXiv.org, Aug. 13, 2023.
    [16] J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” arXiv:2006.11239 [cs, stat], Dec. 2020.
    [17] A. Nichol and P. Dhariwal, “Improved Denoising Diffusion Probabilistic Models,” arXiv:2102.09672 [cs, stat], Feb. 2021.
    [18] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.
    [19] "MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection | IEEE Conference Publication | IEEE Xplore, " ieeexplore.ieee.org.
    [20] Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation,” arXiv.org, Jul. 28, 2022.
    [21] J. Song, C. Meng, and S. Ermon, “Denoising Diffusion Implicit Models,” arXiv:2010.02502 [cs], Jun. 2022.
    [22] A. Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” arXiv:2103.00020 [cs], Feb. 2021.
    [23] T. Defard, Aleksandr Setkov, A. Loesch, and Romaric Audigier, “PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization,” arXiv (Cornell University), Nov. 2020.
    [24] A. Neubeck and L. Van Gool, “Efficient Non-Maximum Suppression,” IEEE Xplore, Aug. 01, 2006.
    [25] D. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” Semantic Scholar, 1986.
    [26] Weakly Supervised Learning for Industrial Optical Inspection,” hci.iwr.uni-heidelberg.de, Oct. 05, 2015.
    [27] J. Gan, Q. Li, J. Wang, and H. Yu, “A Hierarchical Extractor-Based Visual Rail Surface Inspection System,” vol. 17, no. 23, pp. 7935–7944, Dec. 2017.
    [28] B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, J. A. Van Der Laak, M. Hermsen, Q. F. Manson, M. Balkenhol et al., “Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer,” Jama, vol. 318, no. 22, pp. 2199–2210, 2017.
    [29] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017.
    [30] I. J. Goodfellow et al., “Generative Adversarial Networks,” arXiv.org, Jun. 10, 2014.
    [31] C. Chu, A. Zhmoginov, and M. Sandler, “CycleGAN, a Master of Steganography,” arXiv.org, Dec. 16, 2017.
    [32] M. Salehi, N. Sadjadi, S. Baselizadeh, M. H. Rohban, and Rabiee, Hamid R, “Multiresolution Knowledge Distillation for Anomaly Detection,” arXiv.org, 2020.
    [33] H. Deng and X. Li, “Anomaly Detection via Reverse Distillation from One-Class Embedding,” arXiv.org, 2022.
    [34] A. Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” arXiv:2103.00020 [cs], Feb. 2021.
    [35] A. Vaswani et al., “Attention Is All You Need,” arXiv.org, Jun. 12, 2017.
    [36] N. Houlsby et al., “Parameter-Efficient Transfer Learning for NLP,” arXiv:1902.00751 [cs, stat], Jun. 2019.
    [37] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked Autoencoders Are Scalable Vision Learners,” arXiv:2111.06377 [cs], Dec. 2021.
    [38] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” arXiv:2010.11929 [cs], Oct. 2020.
    [39] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” arXiv:2005.12872 [cs], May 2020.
    [40] Y. Li, H. Mao, R. Girshick, and K. He, “Exploring Plain Vision Transformer Backbones for Object Detection,” arXiv:2203.16527 [cs], Jun. 2022.
    [41] M. Tancik et al., “Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains,” arXiv:2006.10739 [cs], Jun. 2020.
    [42] J. Nilsson and T. Akenine-Möller, “Understanding SSIM,” arXiv:2006.13846 [cs, eess], Jun. 2020.
    [43] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.
    [44] J. Zhang et al., “GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection,” arXiv.org, Apr. 16, 2024.
    [45] OpenAI. (2023) Gpt-4v system card. Accessed: 2023-11-05. [Online].
    Available: https://openai.com/research/gpt-4v-system-card
    [46] A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. Van Gool, “RePaint: Inpainting using Denoising Diffusion Probabilistic Models,” arXiv:2201.09865 [cs], Feb. 2022.
    [47] T. Lüddecke and A. S. Ecker, “Image Segmentation Using Text and Image Prompts,” arXiv:2112.10003 [cs], Mar. 2022.
    [48] S. Suzuki and K. be, “Topological structural analysis of digitized binary images by border following,” Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp. 32–46, Apr. 1985.
    [49] L. Vincent, “Grayscale area openings and closings, their efficient implementation and applications,” 1993.
    [50] Cuisenaire, O. and Macq, B., “Fast Euclidean morphological operators using local distance transformation by propagation, and applications,” Image Processing And Its Applications, 1999. Seventh International Conference on (Conf. Publ. No. 465), 1999.
    [51] T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks,” Medical Image Analysis, vol. 54, pp. 30–44, May 2019.
    [52] C. Lu, J. Shi, and J. Jia. Abnormal Event Detection at 150 FPS in MATLAB. 2013.
    [53] W. Luo, W. Liu, and S. Gao, “A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework.” 2017.
    [54] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2010.
    [55] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” arXiv.org, 2017.
    [56] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2010.
    [57] X. Chen et al., “CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection,” arXiv.org, Mar. 02, 2024.
    [58] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv.org, Dec. 10, 2015.

    無法下載圖示 校內:2027-08-20公開
    校外:2027-08-20公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE