簡易檢索 / 詳目顯示

研究生: 廖子齊
Liao, Tzu-Chi
論文名稱: 針對DeiT-tiny剪枝的重要性指標與層級敏感度之比較研究
A Comparative Study of Importance Metrics and Layer-wise Sensitivity for DeiT-tiny Pruning
指導教授: 陳中和
Chen, Chung-Ho
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 136
中文關鍵詞: DeiT-tiny結構化剪枝重要性指標層級敏感度分析
外文關鍵詞: DeiT-tiny, Structured Pruning, Importance Metric, Layer-wise Sensitivity Analysis
相關次數: 點閱:7下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文針對輕量級Vision Transformer模型 – DeiT-tiny進行結構化剪枝與層級敏感度分析,旨在探討不同剪枝條件下的模型表現,並評估各層級對模型性能的重要性。
    實驗採用三種常見的重要性指標:L2 範數(L2 Norm)、一階Taylor展開(First-order Taylor Expansion)與Hessian導數(Hessian Derivative),並利用Dependency Graph 管理模組間的依賴關係,設計Global Pruning與Local Pruning兩種剪枝策略進行比較。每種設定皆進行完整微調(Full Fine-tuning),以觀察剪枝後模型性能的恢復程度。此外,為分析模型在不同層級的剪枝敏感度,本論文設計逐層與多層剪枝實驗,透過剪枝後的Top-1 Accuracy Drop作為各層重要性評估依據,分析不同層級對於模型的重要性。
    此研究可為未來 ViT 模型於資源受限裝置上的壓縮部署提供剪枝策略與層級選擇的依據。

    This research focuses on structured pruning and layer-wise sensitivity analysis for the lightweight Vision Transformer model – DeiT-tiny. The objective is to investigate the model’s performance under various pruning conditions and evaluate the importance of different layers to overall model accuracy.
    The experiments adopt three commonly used importance metrics: L2 Norm, First-order Taylor Expansion, and Hessian Derivative. A Dependency Graph is employed to manage inter-module dependencies, enabling the design and comparison of two pruning strategies: Global Pruning and Local Pruning. Each configuration undergoes full fine-tuning to evaluate the extent to which performance can be recovered after pruning. Furthermore, to analyze the model’s sensitivity across different layers, this study designs both single-layer and multi-layer pruning experiments. Layer importance is assessed based on the Top-1 Accuracy Drop after pruning, offering insights into how various layers contribute to model performance.
    This research provides practical guidance on pruning strategies and layer selection for deploying ViT models on resource-constrained devices.

    摘要 I 誌謝 XVI ⽬錄 XVII 圖⽬錄 XXI 表⽬錄 XXVI 第1章 緒論 1 1.1 論⽂動機 2 1.2 論⽂貢獻 2 1.3 論⽂架構 3 第2章 背景知識 4 2.1 Vision Transformer[1]與 DeiT-tiny[2]架構介紹 4 2.1.1 ViT 模型整體流程概述 5 2.1.2 ViT 模型各層詳細架構與流程說明 6 2.1.3 DeiT-tiny 9 2.2 ViT 剪枝技術與重要性指標 10 2.2.1 ViT 與 CNN 剪枝技術之⽐較 10 2.2.2 結構化剪枝與⾮結構化剪枝 12 2.2.3 重要性指標 14 2.2.3.1 常⾒之重要性指標介紹 14 2.2.3.2 重要性指標於 Transformer 架構之通⽤性探討 17 2.3 Dependency Graph(DepGraph) 剪枝⽅法[6] 17 2.3.1 模型剪枝中的依賴問題 18 2.3.2 剪枝依賴關係的圖論建模 19 2.3.3 網路分解與依賴建模 20 2.3.4 剪枝依賴圖與傳遞流程⽰例 22 第3章 實驗⽅法 25 3.1 相關術語定義 25 3.2 實驗流程概述 27 3.3 完整微調(Full Fine-tuning)實驗 28 3.4 層級敏感度分析(Layer-wise Sensitivity Analysis)實驗 29 3.4.1 剪枝對象與選擇原因 29 3.4.2 實驗流程與設定 33 第4章 完整微調實驗結果分析 35 4.1 實驗設定 35 4.1.1 模型與資料集 35 4.1.2 完整微調訓練參數 36 4.2 完整微調結果 38 4.3 重要性指標⽐較 45 4.3.1 Channel Group 的重要性分數計算流程 46 4.3.2 準確率分析 47 4.3.3 剪枝耗時分析 47 4.3.4 權重剪枝率與 MAC 降幅率分析 49 4.3.5 重要性指標⽐較總結 50 4.4 Global Pruning 與 Local Pruning ⽐較 50 4.4.1 Accuracy 與 Training Loss 曲線⽐較(以 Hessian 導數為例) 51 4.4.2 剪枝後的模型結構差異與壓縮效果分析 54 4.4.2.1 剪枝依賴圖分析 55 4.4.2.2 剪枝後模型結構差異 59 4.4.2.3 通道剪枝率與權重剪枝率的關係 62 4.4.2.4 更⾼壓縮率下的剪枝結果⽐較66 4.4.3 Global Pruning 與 Local Pruning ⽐較總結 69 4.5 Global Pruning 中各重要性指標剪枝傾向分析 70 第5章 層級敏感度實驗結果分析 75 5.1 實驗設定 75 5.2 單層剪枝敏感度分析 77 5.2.1 以 fc1_O 為 Pruning Unit 77 5.2.2 以 qkv_O 為 Pruning Unit 80 5.3 多層剪枝敏感度分析 82 5.3.1 以 fc1_O 為 Pruning Unit 82 5.3.2 以 qkv_O 為 Pruning Unit 85 5.4 層級敏感度分析總結 87 第6章 結論與未來展望 89 6.1 結論 89 6.2 未來展望 89 參考⽂獻 91 附錄 93

    [1] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. 2020
    [2] Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. Training data-efficient image transformers & distillation through attention. International conference on machine learning. 10347-10357. 2021
    [3] Filters’Importance, D. Pruning Filters for Efficient ConvNets. 2016
    [4] Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440. 2016
    [5] LeCun, Y., Denker, J., & Solla, S. Optimal brain damage. Advances in neural information processing systems. 2. 1989
    [6] Fang, G., Ma, X., Song, M., Mi, M. B., & Wang, X. Depgraph: Towards any structural pruning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16091-16101. 2023
    [7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. Attention is all you need. Advances in neural information processing systems. 30. 2017
    [8] Yu, L., & Xiang, W. X-pruner: explainable pruning for vision transformers. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 24355-24363. 2023
    [9] Yu, F., Huang, K., Wang, M., Cheng, Y., Chu, W., & Cui, L. Width & depth pruning for vision transformers. Proceedings of the AAAI Conference on Artificial Intelligence. 36. 3. 3143-3151. 2022
    [10] He, Y., Zhang, X., & Sun, J. Channel pruning for accelerating very deep neural networks. Proceedings of the IEEE international conference on computer vision. 1389-1397. 2017
    [11] Elkerdawy, S., Elhoushi, M., Singh, A., Zhang, H., & Ray, N. To filter prune, or to layer prune, that is the question. proceedings of the Asian conference on computer vision. 2020
    [12] Voita, E., Talbot, D., Moiseev, F., Sennrich, R., & Titov, I. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418. 2019
    [13] Zhang, P., Tian, C., Zhao, L., & Duan, Z. Intra-Head Pruning for Vision Transformers Via Inter-Layer Dimension Relationship Modeling. Available at SSRN 5049611.
    [14] Li, Q., Zhang, B., & Chu, X. Eapruning: evolutionary pruning for vision transformers and cnns. arXiv preprint arXiv:2210.00181. 2022
    [15] Dong, X., Chen, S., & Pan, S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in neural information processing systems. 30. 2017
    [16] Chen, S., & Zhao, Q. Shallowing deep networks: Layer-wise pruning based on feature representations. IEEE transactions on pattern analysis and machine intelligence. 41. 12. 3048-3056. 2018
    [17] Zhu, M., Tang, Y., & Han, K. Vision transformer pruning. arXiv preprint arXiv:2104.08500. 2021
    [18] Yang, H., Yin, H., Molchanov, P., Li, H., & Kautz, J. Nvit: Vision transformer compression and parameter redistribution. 2021
    [19] Yu, H., & Wu, J. A unified pruning framework for vision transformers. Science China Information Sciences. 66. 7. 179101. 2023
    [20] Rao, Y., Zhao, W., Liu, B., Lu, J., Zhou, J., & Hsieh, C.-J. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems. 34. 13937-13949. 2021
    [21] Han, S., Pool, J., Tran, J., & Dally, W. Learning both weights and connections for efficient neural network. Advances in neural information processing systems. 28. 2015
    [22] Kong, Z., Dong, P., Ma, X., Meng, X., Niu, W., Sun, M., Shen, X., Yuan, G., Ren, B., & Tang, H. Spvit: Enabling faster vision transformers via latency-aware soft token pruning. European conference on computer vision. 620-640. 2022
    [23] Ly, A., Marsman, M., Verhagen, J., Grasman, R. P., & Wagenmakers, E.-J. A tutorial on Fisher information. Journal of Mathematical Psychology. 80. 40-55. 2017
    [24] Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677. 2017

    無法下載圖示 校內:2028-06-11公開
    校外:2028-06-11公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE