簡易檢索 / 詳目顯示

研究生: 顏廷安
Yen, Ting-An
論文名稱: 基於特徵聚集以提升抵抗對抗性攻擊之強健性之卷積神經網路訓練法
CNN Training Based on Feature Clustering for Robustness against Adversarial Attacks
指導教授: 詹寶珠
Chung, Pau-Choo
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2020
畢業學年度: 109
語文別: 英文
論文頁數: 69
中文關鍵詞: 對抗性攻擊對抗性防禦強健性卷積神經網路特徵凝聚性
外文關鍵詞: Adversarial Attacks, Adversarial Defense, Robustness, Convolutional Neural Network, Feature Compactness
相關次數: 點閱:98下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來卷積神經網路被用於各式各樣的領域及應用場域,然而,有多數研究顯示卷積神經網路容易受到對抗性攻擊而造成效果大幅下降。因應此現象,有許多方法被提出以防禦此類的攻擊。雖然這些方法都在它們論文中被以實驗證明能夠帶來足夠的強健性,但是大多方法都被後來的研究找出其弱點。例如:基於偵測後拒絕的防禦方法通常會制定能夠分辨對抗例的標準,而此種標準是基於實驗中觀察到之一般圖片及對抗性之間相異的特性。然而這此類防禦方法容易被適應性攻擊擊潰,其中適應性攻擊是將攻擊時的目標函數因應防禦的標準而修改所特化成的攻擊。另一個例子是:基於復原的防禦方法是利用額外的生成型卷積神經網路將對抗例復原成正常的圖像以抵抗攻擊。而此類防禦方法的效果會受到所利用之生成型卷積神經網路的效能所限制,因此效果較大部分其他類防禦方法低。因此,此碩論從分析卷積神經網路所學習到的特徵之分布特性開始下手。實驗中,本研究觀察到了一個奇異的特性:兩個互相特徵非常相近的圖像可能會被網路分類成不同的類別,但同時互相非常遠離的圖像卻可能會被分類成同一個類別。有一些研究也注意到了類似的特性,並且提出了能夠拉近同類別特徵的方法或是直接將訓練時的損失函數替換成能夠考慮特徵間距離的函數,而非使用傳統的柔性最大傳遞函數搭配交叉熵損失函數的方法。此碩論設計了一個新興的卷積神經網路訓練方法,此方法直接對特徵空間進行操作,以確保網路能學習到有高聚集性的特徵以提升其強健性。此碩論提供所提出之方法與其他方法的效果比較,以及所提出方法在各式參數設定下的比較。實驗之結果顯示出使用本研究所提出之訓練法訓練出的卷積神經網路能達到比現有最先進的方法更高的強健性。

    Convolutional neural networks (CNNs) are used in a wide variety of fields and applications nowadays. However, many studies have shown that CNNs are vulnerable to adversarial attacks. Various methods for defending against such attacks have been proposed. However, while these methods are proven empirically to bring robustness to the network, they have several fundamental weaknesses. For example, rejection-based methods usually define metrics for detecting adversarial examples based on statistically-observed properties which are different from those of clean samples in the feature space. However, such methods are easily bypassed by simply adapting the objective function of the attacks to these particular properties. Another example is that recovery-based methods usually have relatively lower robustness due to the limited performance of the generative models used. Accordingly, this study commences by analyzing the distribution properties of the features extracted by CNNs from typical datasets in the feature space. It is found that feature distribution of traditionally trained CNNs has an odd property that samples close to each other may be classified differently while far-apart samples may fall into the same class. Some works developed methods either to increase the same-class feature compactness or to directly use distance-based loss and prediction criterion rather than using traditional softmax and cross-entropy loss. A novel training process referred to as Manifold-Aware Training (MAT) is then proposed in this study, which forces CNNs to learn compact features, and hence improves their robustness against adversarial attack. The effectiveness of the proposed method is evaluated via a comparison with existing defense mechanisms. In addition, detailed performance evaluations are reported under various settings. The results show that the MAT-trained CNNs (following MAT-specific transformation) have a significantly better robustness than existing state-of-the-art trained models. Additionally, when under adaptive attacks that bypass the transformation, the models still have state-of-the-art performance.

    摘要i Abstract ii 誌謝iii Table of Contents iv List of Tables vi List of Figures viii Nomenclature ix Chapter 1. Introduction 1 1.1 Background . . . . . . . . . . . . . 1 1.2 Existing Defenses . . . . . . . . . . . . . . . . . . 4 1.3 Motivation . . . . . . . . . . . . . . . . . . . . . 7 1.4 Contributions . . . . . . . . . . . . . . 8 Chapter 2. Related Work 10 2.1 Attack . . . . . . . . . . . . . . . . . . . 10 2.1.1 Gradient-based Attack . . . . . . . 10 2.1.2 Score-based Attack . . . . . . . . . . . . . . 11 2.1.3 Decision-based Attack . . . . . . 13 2.2 Defense . . . . . . . . . . . . . . . . . 14 2.2.1 Adversarial Training . . . . . . . . . . 14 2.2.2 Detection and Rejection . . . . . . . . . . 15 2.2.3 Recovery . . . . . . . . . . . . . . . . 16 2.2.4 Feature Compactness . . . . . . .. . . . 17 2.3 Properties of Adversarial Examples . . . . . .. . . 19 2.4 Conclusions . . . . . . . . . . . . . . . . . . 20 Chapter 3. Materials and Methods 22 3.1 Preliminaries . . . . . . . . . . . . . . . . 22 3.1.1 tSNE . . . . . 22 3.1.2 Ward’s Hierarchical Clustering . . . . . . . . . 24 3.2 Observations . . . . . . . . . . . . . . . . . . 24 3.2.1 tSNE visualizations . . . . . 25 3.2.2 Ward’s Hierarchical Clustering Analysis . . . . . . 31 3.2.3 Numerical Observation using k-nearest neighbors (knn) . . . . . . . 35 3.3 Manifold-Aware Prediction (MAP) . . . . . . . 36 3.3.1 Distance Functions . . . . . .. . . . 38 3.4 Manifold-Aware Training (MAT) . . . . . . . 39 3.5 Auxiliary Techniques of MAT . . . . . . . . . . . . 41 3.5.1 Max-Mahalanobis Distribution . . . . . . . . . . 41 3.5.2 Second-order Loss . . . . . . . . . . . . 43 3.5.3 Bounded-Input-Bounded-Output (BIBO) Loss . . . . . . . 44 3.5.4 Total Loss of MAT . . . . . . . . . . . . . . 45 3.6 MAT Transformation . . . . . . . . . . . . . . . . . 45 3.7 Properties of MAT . . . . . . . . . . . . . . . 46 Chapter 4. Experimental Results 48 4.1 Robust Accuracy Gain by MAP . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Robustness of MAT-trained CNNs . . . . . . . . 51 4.4 Comparison with MMC . . . . . . . . . . . . 53 4.5 Sanity Test . . . . . . . . . . . . . . . . . 53 4.6 Adaptive Attacks . . . . . . . . . . . . . . . . . 56 4.7 Feature Clustering Performance by MAT . . . . . . . . . . . 57 4.8 Implementation Details . . . . . . . . . . . . . . 58 4.8.1 Adaption of MAT to Common CNN Architecture . . . . . . . . . . 58 4.8.2 Training Hyperparameters . . . . . . . . . . . . . 61 4.8.3 Execution Time . . . . . . . . . . . . . 61 Chapter 5. Conclusion 64 References 66

    [1] Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations (ICLR), 2020.
    [2] Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, and Jun Zhu. Rethinking softmax cross-entropy loss for adversarial robustness. In International Conference on Learning Representations (ICLR), 2020.
    [3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.
    [4] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
    [5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, realtime object detection. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [6] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [7] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
    [8] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations (ICLR), 2017.
    [9] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018.
    [10] Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. Neurocomputing, 2015.
    [11] SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [12] Sanli Tang, Xiaolin Huang, Mingjian Chen, Chengjin Sun, and Jie Yang. Adversarial attack type i: Cheat classifiers by significant changes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
    [13] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In International Conference of Machine Learning (ICML), 2019.
    [14] Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial Logit Pairing. arXiv eprints arXiv:1803.06373, 2018.
    [15] Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Mma training: Direct input space margin maximization through adversarial training. In International Conference on Learning Representations (ICLR), 2020.
    [16] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (Statistical) Detection of Adversarial Examples. arXiv eprints arXiv:1702.06280, 2017.
    [17] Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. Detecting Adversarial Samples from Artifacts. arXiv eprints arXiv:1703.00410, 2017.
    [18] Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Michael E. Houle, Dawn Song, and James Bailey. Characterizing adversarial subspaces using local intrinsic dimensionality. In International Conference on Learning Representations (ICLR), 2018.
    [19] Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In International Conference on Neural Information Processing Systems (NeurIPS), 2018.
    [20] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017.
    [21] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Conference on Neural Information Processing Systems (NeurIPS), 2018.
    [22] Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, and Baishakhi Ray. Metric learning for adversarial robustness. In Advances in Neural Information Processing Systems, 2019.
    [23] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [24] Boris Polyak. Some methods of speeding up the convergence of iteration methods. Ussr Computational Mathematics and Mathematical Physics, 1964.
    [25] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. IEEE Symposium on Security and Privacy (SP), 2017.
    [26] Florian Tramèr and Dan Boneh. Adversarial training and robustness for multiple perturbations. In Conference on Neural Information Processing Systems (NeurIPS), 2019.
    [27] Takeru Miyato, Shinichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. Distributional smoothing with virtual adversarial training. In International Conference on Learning Representations (ICLR), 2016.
    [28] Jérôme Rony, Luiz Gustavo Hafemann, Luiz Soares de Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [29] PinYu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec), 2017.
    [30] ChunChen Tu, Paishun Ting, PinYu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, ChoJui Hsieh, and ShinMing Cheng. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
    [31] Moustafa Alzantot, Yash Sharma, Supriyo Chakraborty, Huan Zhang, ChoJui Hsieh, and Mani B. Srivastava. Genattack: Practical black-box attacks with gradient-free optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), 2019.
    [32] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decisionbased adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations (ICLR), 2018.
    [33] Lukas Schott, Jonas Rauber, Matthias Bethge, and Wieland Brendel. Towards the first adversarially robust neural network model on MNIST. In International Conference on Learning Representations (ICLR), 2019.
    [34] Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. Hopskipjumpattack: A query-efficient decision-based attack. arXiv preprint arXiv:1904.02144, 2019.
    [35] David Stutz, Matthias Hein, and Bernt Schiele. Disentangling adversarial robustness and generalization. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [36] Dongyu Meng and Hao Chen. Magnet: A two-pronged
    defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.
    [37] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defensegan: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations (ICLR), 2018.
    [38] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014.
    [39] Xiang Li and Shihao Ji. Defensevae: A fast and accurate defense against adversarial attacks. arXiv preprint arXiv:1812.06570, 2018.
    [40] Tianyu Pang, Chao Du, and Jun Zhu. Max-mahalanobis linear discriminant analysis networks. In International Conference on Machine Learning (ICML), 2018.
    [41] Preetum Nakkiran. Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532, 2019.
    [42] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
    [43] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [44] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016.
    [45] L.J.P. van der Maaten and G.E. Hinton. Visualizing highdimensional data using tsne. Journal of Machine Learning Research (JMLR), 2008.
    [46] Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association (JASA), 1963.
    [47] Geoffrey E Hinton and Sam T. Roweis. Stochastic neighbor embedding. In Advances in Neural Information Processing Systems, 2003.
    [48] Ziang Yan, Yiwen Guo, and Changshui Zhang. Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, 2018.
    [49] Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
    [50] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. In International Conference on Learning Representations (ICLR), 2015.
    [51] Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. In Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning (ICML), 2017.
    [52] Jonathan Uesato, Brendan O’Donoghue, Pushmeet Kohli, and Aaron van den Oord. Adversarial risk and the dangers of evaluating against weak attacks. In Proceedings of Machine Learning Research (PMLR), 2018.

    無法下載圖示 校內:2025-10-12公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE