| 研究生: |
廖子齊 Liao, Tzu-Chi |
|---|---|
| 論文名稱: |
針對DeiT-tiny剪枝的重要性指標與層級敏感度之比較研究 A Comparative Study of Importance Metrics and Layer-wise Sensitivity for DeiT-tiny Pruning |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 136 |
| 中文關鍵詞: | DeiT-tiny 、結構化剪枝 、重要性指標 、層級敏感度分析 |
| 外文關鍵詞: | DeiT-tiny, Structured Pruning, Importance Metric, Layer-wise Sensitivity Analysis |
| 相關次數: | 點閱:7 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文針對輕量級Vision Transformer模型 – DeiT-tiny進行結構化剪枝與層級敏感度分析,旨在探討不同剪枝條件下的模型表現,並評估各層級對模型性能的重要性。
實驗採用三種常見的重要性指標:L2 範數(L2 Norm)、一階Taylor展開(First-order Taylor Expansion)與Hessian導數(Hessian Derivative),並利用Dependency Graph 管理模組間的依賴關係,設計Global Pruning與Local Pruning兩種剪枝策略進行比較。每種設定皆進行完整微調(Full Fine-tuning),以觀察剪枝後模型性能的恢復程度。此外,為分析模型在不同層級的剪枝敏感度,本論文設計逐層與多層剪枝實驗,透過剪枝後的Top-1 Accuracy Drop作為各層重要性評估依據,分析不同層級對於模型的重要性。
此研究可為未來 ViT 模型於資源受限裝置上的壓縮部署提供剪枝策略與層級選擇的依據。
This research focuses on structured pruning and layer-wise sensitivity analysis for the lightweight Vision Transformer model – DeiT-tiny. The objective is to investigate the model’s performance under various pruning conditions and evaluate the importance of different layers to overall model accuracy.
The experiments adopt three commonly used importance metrics: L2 Norm, First-order Taylor Expansion, and Hessian Derivative. A Dependency Graph is employed to manage inter-module dependencies, enabling the design and comparison of two pruning strategies: Global Pruning and Local Pruning. Each configuration undergoes full fine-tuning to evaluate the extent to which performance can be recovered after pruning. Furthermore, to analyze the model’s sensitivity across different layers, this study designs both single-layer and multi-layer pruning experiments. Layer importance is assessed based on the Top-1 Accuracy Drop after pruning, offering insights into how various layers contribute to model performance.
This research provides practical guidance on pruning strategies and layer selection for deploying ViT models on resource-constrained devices.
[1] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. 2020
[2] Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. Training data-efficient image transformers & distillation through attention. International conference on machine learning. 10347-10357. 2021
[3] Filters’Importance, D. Pruning Filters for Efficient ConvNets. 2016
[4] Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440. 2016
[5] LeCun, Y., Denker, J., & Solla, S. Optimal brain damage. Advances in neural information processing systems. 2. 1989
[6] Fang, G., Ma, X., Song, M., Mi, M. B., & Wang, X. Depgraph: Towards any structural pruning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16091-16101. 2023
[7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. Attention is all you need. Advances in neural information processing systems. 30. 2017
[8] Yu, L., & Xiang, W. X-pruner: explainable pruning for vision transformers. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 24355-24363. 2023
[9] Yu, F., Huang, K., Wang, M., Cheng, Y., Chu, W., & Cui, L. Width & depth pruning for vision transformers. Proceedings of the AAAI Conference on Artificial Intelligence. 36. 3. 3143-3151. 2022
[10] He, Y., Zhang, X., & Sun, J. Channel pruning for accelerating very deep neural networks. Proceedings of the IEEE international conference on computer vision. 1389-1397. 2017
[11] Elkerdawy, S., Elhoushi, M., Singh, A., Zhang, H., & Ray, N. To filter prune, or to layer prune, that is the question. proceedings of the Asian conference on computer vision. 2020
[12] Voita, E., Talbot, D., Moiseev, F., Sennrich, R., & Titov, I. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418. 2019
[13] Zhang, P., Tian, C., Zhao, L., & Duan, Z. Intra-Head Pruning for Vision Transformers Via Inter-Layer Dimension Relationship Modeling. Available at SSRN 5049611.
[14] Li, Q., Zhang, B., & Chu, X. Eapruning: evolutionary pruning for vision transformers and cnns. arXiv preprint arXiv:2210.00181. 2022
[15] Dong, X., Chen, S., & Pan, S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in neural information processing systems. 30. 2017
[16] Chen, S., & Zhao, Q. Shallowing deep networks: Layer-wise pruning based on feature representations. IEEE transactions on pattern analysis and machine intelligence. 41. 12. 3048-3056. 2018
[17] Zhu, M., Tang, Y., & Han, K. Vision transformer pruning. arXiv preprint arXiv:2104.08500. 2021
[18] Yang, H., Yin, H., Molchanov, P., Li, H., & Kautz, J. Nvit: Vision transformer compression and parameter redistribution. 2021
[19] Yu, H., & Wu, J. A unified pruning framework for vision transformers. Science China Information Sciences. 66. 7. 179101. 2023
[20] Rao, Y., Zhao, W., Liu, B., Lu, J., Zhou, J., & Hsieh, C.-J. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems. 34. 13937-13949. 2021
[21] Han, S., Pool, J., Tran, J., & Dally, W. Learning both weights and connections for efficient neural network. Advances in neural information processing systems. 28. 2015
[22] Kong, Z., Dong, P., Ma, X., Meng, X., Niu, W., Sun, M., Shen, X., Yuan, G., Ren, B., & Tang, H. Spvit: Enabling faster vision transformers via latency-aware soft token pruning. European conference on computer vision. 620-640. 2022
[23] Ly, A., Marsman, M., Verhagen, J., Grasman, R. P., & Wagenmakers, E.-J. A tutorial on Fisher information. Journal of Mathematical Psychology. 80. 40-55. 2017
[24] Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677. 2017
校內:2028-06-11公開