| 研究生: | 吳明軒 Wu, Ming-Hsuan | 
|---|---|
| 論文名稱: | BFDM:用於自駕車系統對抗式防禦的鳥瞰圖特徵去噪模組 BFDM: A BEV Feature Denoising Module for Adversarial Defense in Autonomous Driving Systems | 
| 指導教授: | 許志仲 Hsu, Chih-Chung 鄭順林 Jeng, Shuen-Lin | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 管理學院 - 數據科學研究所 Institute of Data Science | 
| 論文出版年: | 2025 | 
| 畢業學年度: | 113 | 
| 語文別: | 中文 | 
| 論文頁數: | 79 | 
| 中文關鍵詞: | 3D物件辨識 、自駕車系統 、對抗式防禦 、特徵去雜訊 、非局部平均法 | 
| 外文關鍵詞: | 3D Object Detection, Autonomous Driving System,, Adversarial Defense, Feature Denoising, Non,–Local Mean | 
| 相關次數: | 點閱:6 下載:1 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
鳥瞰圖(BEV)多感測器融合雖已成為自駕車三維物件偵測的主流方案,深度模型對細微對抗擾動的高度脆弱卻為行車安全埋下隱憂。針對此一問題,本研究在無需重訓骨幹網路的前提下,提出一套部署於 BEV 特徵層、同時兼具運算平衡與穩健性的防禦框架。
方法 Patch-wise Non-Local Mean(PNLM)在局部視窗內估計相似度,抑制高頻尖峰雜訊;Augmented Multi-Head Self-Attention(AMHSA)融入多尺度深度可分離卷積與旋轉/翻轉對稱增強,於全域重建長程語意關係並消弭殘餘擾動。兩模組皆為模組化設計,可插拔於任一 BEV-based 架構而無須調整主網路。
實驗以 nuScenes 資料集與 BEVFusion 為基礎,於 FGSM、PGD 與 AutoPGD 的白箱及灰箱攻擊場景下進行評估;最嚴苛的 AutoPGD-20 白箱條件下,NDS 與 mAP 由 0.3217/0.1899 提升至 0.5116/0.4493,而乾淨資料亦上升至 0.5701/0.5620。消融分析證實多尺度卷積與對稱增強皆為穩健性提升的關鍵要素。整體結果顯示,本研究所提特徵層防禦機制能在極低參數開銷的條件下,有效抵禦惡意擾動並提升基礎模型準確度。
Bird's-Eye-View (BEV) multi-sensor fusion has become the de-facto standard for 3-D object detection in autonomous vehicles, yet recent studies reveal that state-of-the-art models such as BEVFusion are acutely susceptible to imperceptible adversarial perturbations that can trigger severe detection errors and compromise road safety. This work introduces a real-time, feature-level defense framework that hardens BEV-based perception without retraining the backbone network.Built upon the nuScenes benchmark and a vanilla BEVFusion backbone, the proposed system inserts two lightweight, plug-and-play denoising modules at the post-fusion BEV feature layer. The first module, emph{Patch-wise Non-Local Mean} (PNLM), suppresses high-frequency spike noise by computing attention weights within non-overlapping sliding windows. The second, emph{Augmented Multi-Head Self-Attention} (AMHSA), incorporates multi-scale depth-wise separable convolutions together with rotation and flip symmetry augmentation to restore long-range semantic consistency across the BEV map and eradicate residual perturbations. Both modules are parameter-efficient and require neither structural modifications to the backbone nor additional large-scale training.Under FGSM, PGD, and AutoPGD attacks in both white-box and gray-box scenarios, the defended model raises performance against the strongest AutoPGD-20 setting from 0.3217 / 0.1899 to 0.5116 / 0.4493 in NDS / mAP. On clean samples the scores increase from 0.5280 / 0.4867 to 0.5701 / 0.5620, indicating that the framework not only recovers performance under attack but also provides regularization benefits in benign conditions. Ablation studies further confirm that each augmentation component within AMHSA independently enhances robustness.
[1] Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. In 31st Conference on Neural Information Processing Systems (NIPS 2017) Workshop on Machine Learning and Computer Security, 2017.
[2] Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. A non-local algorithm for image denoising. CVPR, 2005.
[3] Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027, 2019.
[4] Yulong Cao, Chaowei Xiao, Dawei Yang, Jing Fang, Ruigang Yang, Mingyan Liu, and Bo Li. Adversarial sensor attack on lidar-based perception in autonomous driving. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 2267–2281, 2019.
[5] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017.
[6] Shang-Tse Chen, Cory Cornelius, Jason Martin, and Polo Chau Duan. Shapeshifter: Robust physical adversarial attack on faster r-cnn object detector. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 52–68, 2018.
[7] Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, and Xiangyu Zhang. Fusion is not enough: Single modal attacks on fusion models for 3d object detection. arXiv preprint arXiv:2304.14614, 2023.
[8] MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d, 2020.
[9] Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel M Roy. A study of the effect of jpeg compression on adversarial images. In arXiv preprint arXiv:1608.00853, 2016.
[10] Di Feng, Christian Haase-Schütz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, and Klaus Dietmayer. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2020.
[11] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
[12] Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. Countering adversarial images using input transformations. In International Conference on Learning Representations (ICLR), 2018.
[13] Junjie Huang, Guan Huang, Zheng Zhu, and Dalong Du. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16497–16506, 2021.
[14] Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. Bevdet: High- performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
[15] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems, pages 125–136, 2019.
[16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
[17] Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L Waslander. Joint 3d proposal generation and object detection from view aggregation. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 1–8. IEEE, 2018.
[18] Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018.
[19] Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019.
[20] Hanrui Li, Chen Liang, Chao Ning, Tianshu Zhou, Xinyu Ma, Qianci Chen, Bohan Zhou, Jiefeng Chen, Zichen Chen, Jiayue Lu, Ziming Wang, Yi Li, and Hang Zhang. Adversarial attacks on 3d perception for autonomous driving: A survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18782–18791, 2023.
[21] Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 1477–1485, 2023.
[22] Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, and Jifeng Dai. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022.
[23] Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
[24] Ze Liu, Yutong Lin, Yue Cao, and et al. Swin transformer: Hierarchical vision transformer using shifted windows. ICCV, 2021.
[25] Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542, 2022.
[26] Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, and Wujie Wen. Feature distillation: Dnn-oriented jpeg compression against adversarial examples. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 860–868. IEEE, 2019.
[27] Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth. Adversarial examples for object detectors. In Workshop on Adversarial Training, NIPS, 2017.
[28] Shiqing Ma and Yingqi Liu. Nic: Detecting adversarial samples with neural network invariant checking. In Proceedings of the 26th network and distributed system security symposium (NDSS 2019), 2019.
[29] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
[30] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8):1847–1860, 2017.
[31] Muzammal Naseer, Salman Arora, Fahad Khan, Mohsen Abavisani, Shafiq Khan, and Dimitris Metaxas. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In Advances in Neural Information Processing Systems, 2020.
[32] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519, 2017.
[33] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.
[34] Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In European conference on computer vision, pages 194–210. Springer, 2020.
[35] Sylvain Rebuffi, Mitchell Wortsman, Ali Farhadi, Subhransu Maji, and Deva Ramanan. Fixing data augmentation to improve adversarial robustness. In International Conference on Learning Representations, 2021.
[36] Sara Sabour, Yuxin Cao, Fartash Faghri, and David J Fleet. Adversarial manipulation of deep representations. In International Conference on Learning Representations, 2016.
[37] Ali Shafahi, Mahyar Najibi, Amir Ghiasi, Zheng Xu, J. Dickerson, L. Davis, G. Taylor, and Tom Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems, pages 3353–3364, 2019.
[38] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
[39] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
[40] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.
[41] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Alek- sander Madry. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019.
[42] James Tu, Mengye Ren, Sivabalan Manivasagam, Ming Liang, Bin Yang, Richard Du, Frank Cheng, and Raquel Urtasun. Physically realizable adversarial examples for lidar object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13716–13725, 2020.
[43] Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training. In International Conference on Learning Representations, 2020. [44] World Health Organization. Global Status Report on Road Safety 2023. World Health Organization, Geneva, 2023. Accessed 2025-07-14.
[45] Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. Fea- ture denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 501–509, 2019. [46] Sizhe Xu, Kun Wang, Zhiyuan Chen, and Huan Su. Exploring multimodal vulnerabilities in 3d perception. In Advances in Neural Information Processing Systems, 2022.
[47] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. In NDSS, 2018.
[48] Chenyu Yang, Yuntao Chen, Haofei Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Y. Qiao, Lewei Lu, Jie Zhou, and Jifeng Dai. Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. ArXiv, 2022.
[49] Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11784–11793, 2021.
[50] Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access, 8:58443–58469, 2020.
[51] Zhaohan Daniel Zhang, Chengrun Jia, J. Zico Kolter, and Aleksander Madry. Attacks which do not kill training make adversarial learning stronger. In International Conference on Machine Learning, pages 11278–11287, 2021.
[52] Shuai Zhao, Boyuan Zhang, Yucheng Shi, Yang Zhai, Yahong Han, and Qinghua Hu. A comprehensive survey of physical adversarial vulnerabilities in autonomous driving systems. Frontiers of Information Technology & Electronic Engineering, 26(4):510– 533, 2025.
[53] Yin Zhou and Oncel Tuzel. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4490–4499, 2018.
[54] Zijian Zhu, Yichi Zhang, Hai Chen, Yinpeng Dong, Shu Zhao, Wenbo Ding, Jiachen Zhong, and Shibao Zheng. Understanding the robustness of 3d object detection with bird’s-eye-view representations in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21600–21610, 2023.