| 研究生: |
賈承翔 Chia, Cheng-Hsiang |
|---|---|
| 論文名稱: |
在多模態對比學習中利用具有可見性與空間感知的對抗擾動實現資料高效的後門防禦方法 A Data-efficient Backdoor Defense in Multimodal Contrastive Learning Using Visibility-Spatially-aware Adversarial Perturbation |
| 指導教授: |
郭耀煌
Kuo, Yau-Hwang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 114 |
| 中文關鍵詞: | 後門攻擊 、多模態對比學習 、對抗樣本 |
| 外文關鍵詞: | Backdoor Attack, Multimodal Contrastive Learning, Adversarial Example |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,多模態對比學習因其能利用圖像與文本之間的對齊關係來提升語意理解能力而倍受關注。儘管這種學習方式提升了模型的性能與泛化能力,卻也同時導致其容易受到後門攻擊的威脅,即攻擊者於訓練資料中植入帶有特定觸發器的中毒樣本,並指定特定詞語作為目標標籤,誘使模型在推論階段遇到觸發器時產生錯誤的預測,使攻擊者得以操控模型的行為。一般來說,後門攻擊者除了希望提升攻擊成功率外,亦會要求降低對於乾淨樣本上預測表現的影響,藉此達到隱蔽攻擊的效果,這也讓後門的偵測變得更加困難,尤其是在面對新型多模態後門攻擊──BadCLIP時,現有的防禦方法不是需要大量的乾淨資料來進行訓練而極難實現,就是會嚴重影響模型在乾淨樣本上的預測表現,因此,目前沒有一個現有的防禦方法能有效抵禦BadCLIP。
為此,本論文提出一具可見性與空間感知能力的對抗擾動防禦方法,能在僅用少量乾淨資料且後門觸發器型態未知的情況下,有效的將中毒模型恢復為良性模型。本方法包含三大模組:首先,具可見性與空間感知能力的對抗樣本生成器利用少量的乾淨資料來產生與後門樣本具有相似特徵的對抗樣本;接著,透過基於對抗樣本與乾淨資料對的相似度差異之目標標籤偵測器,即利用不同模態間語意交互關係的差異,來從圖片文字成對的資料集中,精確地推測出攻擊者所指定的目標標籤;最後,透過目標標籤感知的反學習模組,最小化對抗樣本特徵在目標標籤語意方向上的投影量來達成移除後門影響的目的,這樣做不只能維持模型在乾淨樣本上的預測表現,也能降低本方法對乾淨資料的需求。
實驗結果顯示,本方法能在多種後門攻擊的情境中將中毒模型恢復成良性模型,將平均攻擊成功率從99.64% 降低至0.02%,與現有方法相比最多提升31.25%,且能維持模型在乾淨樣本上的預測表現。此外,即使面對不同攻擊設定與防禦情境,特別是在防禦者僅擁有少量乾淨資料的情況下,本方法依舊能維持穩定的防禦表現,顯示其資料利用效率極高。由此可見,本研究為多模態對比學習提供了一套兼具模型效能與實用性的後門防禦方案。
In recent years, multimodal contrastive learning has attracted widespread attention for its ability to leverage the alignment between images and text to enhance semantic understanding. Although this learning paradigm improves model performance and generalization, it also makes models particularly vulnerable to backdoor attacks. In such attacks, an attacker injects poisoned samples bearing a specific trigger into the training data and designates particular words as the target label, causing the model to produce incorrect predictions when the trigger is present at inference time—thereby granting the attacker control over the model’s behavior. Typically, backdoor attackers aim not only to maximize attack success rate but also to minimize impact on clean accuracy, thereby achieving stealthy attacks and complicating detection efforts. This challenge is especially acute against novel multimodal contrastive backdoor attacks such as BadCLIP, for which existing defenses either require large amounts of clean data, making them impractical, or severely degrade clean accuracy. As a result, no current defense can effectively thwart BadCLIP.
To address this, this research proposes a Data-efficient Backdoor Defense in Multimodal Contrastive Learning Using Visibility-Spatially-aware Adversarial Perturbation (VASP-BD) that can restore a poisoned model to benign behavior using only a small amount of clean data and without prior knowledge of the backdoor trigger’s form. This defense method comprises three key modules. First, an Adversarial Examples Generator (VSAEG) uses a small amount of clean data to produce adversarial examples whose features resemble those of the backdoored samples. Second, an Adversarial-to-Clean Similarity-based Target Label Detector (ASTLD) exploits the semantic interaction disparities across modalities to accurately infer the attacker’s designated target label from an image–text paired dataset. Finally, a Target-Label-Aware Unlearning Module (TLUM) removes the backdoor effect by minimizing the projection of adversarial example features onto the target label semantic direction. This defense method not only preserves the model’s clean accuracy but also reduces the amount of clean data required.
Experimental results demonstrate that this defense method can restore poisoned models to benign ones under various backdoor attack scenarios, reducing the average attack success rate from 99.64% to 0.02%—an improvement of up to 31.25% compared with existing methods—while maintaining clean accuracy. Moreover, even under different attack settings and defense conditions, particularly when the defender has only a small amount of clean data, this defense method consistently delivers robust defense performance, highlighting its high data efficiency. Thus, this research offers a practical and effective backdoor defense solution for multimodal contrastive learning.
[1] P. V. Anusha, C. Anuradha, P. C. Murty, and C. S. Kiran, "Detecting outliers in high dimensional data sets using Z-score methodology," International Journal of Innovative Technology and Exploring Engineering, vol. 9, no. 1, pp. 48-53, 2019.
[2] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, "How to backdoor federated learning," in International conference on artificial intelligence and statistics, 2020: PMLR, pp. 2938-2948.
[3] H. Bansal, N. Singhi, Y. Yang, F. Yin, A. Grover, and K.-W. Chang, "Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 112-123.
[4] K. Bayoudh, "A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges," Information Fusion, vol. 105, p. 102217, 2024.
[5] L. Bourtoule et al., "Machine unlearning," in 2021 IEEE symposium on security and privacy (SP), 2021: IEEE, pp. 141-159.
[6] N. Carlini and A. Terzis, "Poisoning and backdooring contrastive learning," arXiv preprint arXiv:2106.09667, 2021.
[7] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in International conference on machine learning, 2020: PmLR, pp. 1597-1607.
[8] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, "Targeted backdoor attacks on deep learning systems using data poisoning," arXiv preprint arXiv:1712.05526, 2017.
[9] X. Chen et al., "Badnl: Backdoor attacks against nlp models with semantic-preserving improvements," in Proceedings of the 37th Annual Computer Security Applications Conference, 2021, pp. 554-569.
[10] D. Cousineau and S. Chartier, "Outliers detection and treatment: a review," International journal of psychological research, vol. 3, no. 1, pp. 58-67, 2010.
[11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[12] I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," arXiv preprint arXiv:1412.6572, 2014.
[13] T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, "Badnets: Evaluating backdooring attacks on deep neural networks," IEEE Access, vol. 7, pp. 47230-47244, 2019.
[14] K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, "Transformer in transformer," Advances in neural information processing systems, vol. 34, pp. 15908-15919, 2021.
[15] B. He, X. Jia, S. Liang, T. Lou, Y. Liu, and X. Cao, "Sa-attack: Improving adversarial transferability of vision-language pre-training models via self-augmentation," arXiv preprint arXiv:2312.04913, 2023.
[16] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, "Momentum contrast for unsupervised visual representation learning," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729-9738.
[17] C. Jia et al., "Scaling up visual and vision-language representation learning with noisy text supervision," in International conference on machine learning, 2021: PMLR, pp. 4904-4916.
[18] J. Jia, Y. Liu, and N. Z. Gong, "Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning," in 2022 IEEE Symposium on Security and Privacy (SP), 2022: IEEE, pp. 2043-2059.
[19] P. Kiourti, K. Wardega, S. Jha, and W. Li, "Trojdrl: evaluation of backdoor attacks on deep reinforcement learning," in 2020 57th ACM/IEEE Design Automation Conference (DAC), 2020: IEEE, pp. 1-6.
[20] N. Klingler. "CLIP: Contrastive Language-Image Pre-Training." https://viso.ai/deep-learning/clip-machine-learning/ (accessed.
[21] B. Koonce, "ResNet 50," in Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization: Springer, 2021, pp. 63-72.
[22] J. Kuang, S. Liang, J. Liang, K. Liu, and X. Cao, "Adversarial backdoor defense in clip," arXiv preprint arXiv:2409.15968, 2024.
[23] J. Li, D. Li, C. Xiong, and S. Hoi, "Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation," in International conference on machine learning, 2022: PMLR, pp. 12888-12900.
[24] Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, "Backdoor learning: A survey," IEEE transactions on neural networks and learning systems, vol. 35, no. 1, pp. 5-22, 2022.
[25] Y. Li, S. Zhang, W. Wang, and H. Song, "Backdoor attacks to deep learning models and countermeasures: A survey," IEEE Open Journal of the Computer Society, vol. 4, pp. 134-146, 2023.
[26] S. Liang, M. Zhu, A. Liu, B. Wu, X. Cao, and E.-C. Chang, "Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24645-24654.
[27] J. Lin, H. Yin, W. Ping, P. Molchanov, M. Shoeybi, and S. Han, "Vila: On pre-training for visual language models," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 26689-26699.
[28] H. Liu, C. Li, Y. Li, and Y. J. Lee, "Improved baselines with visual instruction tuning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 26296-26306.
[29] I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," arXiv preprint arXiv:1711.05101, 2017.
[30] D. Lu, Z. Wang, T. Wang, W. Guan, H. Gao, and F. Zheng, "Set-level guidance attack: Boosting adversarial transferability of vision-language pre-training models," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 102-111.
[31] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, "Towards deep learning models resistant to adversarial attacks," arXiv preprint arXiv:1706.06083, 2017.
[32] L. McInnes, J. Healy, and S. Astels, "hdbscan: Hierarchical density based clustering," J. Open Source Softw., vol. 2, no. 11, p. 205, 2017.
[33] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, "Universal adversarial perturbations," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1765-1773.
[34] B. Mu et al., "Progressive backdoor erasing via connecting backdoor and adversarial attacks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20495-20503.
[35] A. Nguyen and A. Tran, "Wanet--imperceptible warping-based backdoor attack," arXiv preprint arXiv:2102.10369, 2021.
[36] Z. Niu, Y. Sun, Q. Miao, R. Jin, and G. Hua, "Towards unified robustness against both backdoor and adversarial attacks," IEEE transactions on pattern analysis and machine intelligence, 2024.
[37] V. Ordonez, G. Kulkarni, and T. Berg, "Im2text: Describing images using 1 million captioned photographs," Advances in neural information processing systems, vol. 24, 2011.
[38] X. Pan, M. Zhang, B. Sheng, J. Zhu, and M. Yang, "Hidden trigger backdoor attack on {NLP} models via linguistic style manipulation," in 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 3611-3628.
[39] A. Radford et al., "Learning transferable visual models from natural language supervision," in International conference on machine learning, 2021: PmLR, pp. 8748-8763.
[40] D. Ramachandram and G. W. Taylor, "Deep multimodal learning: A survey on recent advances and trends," IEEE signal processing magazine, vol. 34, no. 6, pp. 96-108, 2017.
[41] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618-626.
[42] P. Sharma, N. Ding, S. Goodman, and R. Soricut, "Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2556-2565.
[43] N. D. Singh, F. Croce, and M. Hein, "Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP," arXiv preprint arXiv:2412.00727, 2024.
[44] Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, "Can you really backdoor federated learning?," arXiv preprint arXiv:1911.07963, 2019.
[45] G. Tao, Z. Wang, S. Feng, G. Shen, S. Ma, and X. Zhang, "Distribution preserving backdoor attack in self-supervised learning," in 2024 IEEE Symposium on Security and Privacy (SP), 2024: IEEE, pp. 2029-2047.
[46] M. Walmer, K. Sikka, I. Sur, A. Shrivastava, and S. Jha, "Dual-key multimodal backdoors for visual question answering," in Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 15375-15385.
[47] L. Wang, Z. Javed, X. Wu, W. Guo, X. Xing, and D. Song, "Backdoorl: Backdoor attack against competitive reinforcement learning," arXiv preprint arXiv:2105.00579, 2021.
[48] W. Yang, J. Gao, and B. Mirzasoleiman, "Better safe than sorry: Pre-training clip against targeted data poisoning and backdoor attacks," arXiv preprint arXiv:2310.05862, 2023.
[49] W. Yang, J. Gao, and B. Mirzasoleiman, "Robust contrastive language-image pretraining against data poisoning and backdoor attacks," Advances in Neural Information Processing Systems, vol. 36, pp. 10678-10691, 2023.
[50] W. Yang, Y. Lin, P. Li, J. Zhou, and X. Sun, "Rethinking stealthiness of backdoor attack against nlp models," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 5543-5557.
[51] J. Zhang, Q. Yi, and J. Sang, "Towards adversarial attack on vision-language pre-training models," in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 5005-5013.
[52] P.-F. Zhang, Z. Huang, and G. Bai, "Universal adversarial perturbations for vision-language pre-trained models," in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, pp. 862-871.
[53] Z. Zhou, S. Hu, M. Li, H. Zhang, Y. Zhang, and H. Jin, "Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning," in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 6311-6320.
校內:2026-08-11公開