| 研究生: |
方莉珺 Fang, Li-Chun |
|---|---|
| 論文名稱: |
應用變分遮罩剪枝與量化感知訓練之神經網路壓縮策略 Convolutional Neural Network Compression Strategies with Variational Mask Pruning and Quantization-Aware Training |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 卷積神經網路 、模型量化 、參數剪枝 、貝葉斯深度學習 |
| 外文關鍵詞: | convolutional neural network, model quantization,, parameter pruning, bayesian deep learning |
| 相關次數: | 點閱:26 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年來卷積神經網路(Convolutional Neural Networks, CNNs)已在許多領域中達到前所未有的成果,例如圖像分類、物件辨識等領域,取得優越模型準確率的同時,也伴隨著高額的儲存空間與計算資源,使得CNNs不易部署於資源有限的穿戴或移動裝置上執行實時推斷。為了解決這些問題,模型壓縮被廣泛應用於卷積神經網路,在保持效能的同時降低網路的運算成本。
本論文提出CNNs的剪枝與量化方法,降低網路的複雜度與能耗。針對網路剪枝部分,提出一種基於變分遮罩的通道剪枝方法,變分遮罩同時考慮批量歸一層中縮放和移位參數的貢獻,更精確地決定計算網路通道的重要性。藉由貝葉斯深度學習訓練變分遮罩的分佈,模擬訓練過程中的不確定因素,有助於提升網路強健性。針對網路量化部分,使用量化感知訓練將激活與權重皆量化為8位元整數,在訓練階段模擬CIM(Computing in Memory)類比加速器的乘加運算,並考慮類比電路的非理想效應,取得更加適應加速器的整數模型。根據實驗結果,以CIFAR-10數據集針對分類模型VGG-16執行剪枝,在不損失準確率下,能減少95.39%的參數量與70.43%的運算量。後續再對剪枝模型執行量化操作,在準確率只下降0.2%的情況下,能進一步減少16倍的位元運算量。
In recent years, Convolutional Neural Networks (CNNs) have accomplished state-of-the-art performance in many fields. However, the significant memory access and computational resources that come with CNNs constrain the deployment in resource-constrained edge devices to perform real-time inference. To alleviate this limitation, model compression strategies have been proposed to accelerate the inference of CNNs.
This paper proposes pruning and quantization methods to reduce model memory storage and computational complexity. A structured pruning method based on variational masks that simultaneously consider the contributions of scaling and shifting parameters in batch normalization layers to determine the importance of each channel. Using Bayesian deep learning to train the distributions of variational masks to simulate the uncertainty during training, making the model more robust. For the network quantization, we apply the quantization-aware training approach to quantize the activation and weight into 8-bit integers and simulate the computation behavior of the analog CIM accelerator and consider the non-ideal effects of the analog circuit.
The pruning experiment conducted on the CIFAR-10 dataset with VGG-16 can achieve 95.39% parameter saving and 70.43% computation reduction without losing accuracy. When quantizing the pruned model with the proposed approach, the number of bit operations can further be reduced by 16x when only drops 0.2% in accuracy.
[1] H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally, “Exploring the regularity of sparse structure in convolutional neural networks,” in Conference on Neural Information Processing Systems, 2017.
[2] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” Advances in Neural Information Processing Systems, 2016.
[3] S. Ioffe, and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp. 448-456, PMLR, 2015.
[4] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” Proceedings of the IEEE International Conference on Computer Vision, pp. 2736-2744, 2017.
[5] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” Advances in neural information processing systems, vol. 28, pp. 2575-2583, 2015.
[6] D. Molchanov, A. Ashukha and D. Vetrov, “Variational dropout sparsifies deep neural networks,” in International Conference on Machine Learning, pp. 2498-2507, PMLR, 2017.
[7] C. Zhao, B. Ni, J. Zhang, Q. Zhao, W. Zhang, and Q. Tian, “Variational convolutional neural network pruning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2780-2789, 2019.
[8] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang, “Soft filter pruning for accelerating deep convolutional neural networks,” in the International Joint Conferences on Artificial Intelligence, pp. 2234–2240, 2018.
[9] X. Ning, T. Zhao, W. Li, P. Lei, Y. Wang, and H. Yang, “DSA: More efficient budgeted pruning via differentiable sparsity allocation,” in Proceedings of the European Conference on Computer Vision, pp. 592–607, 2020.
[10] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, “Filter pruning via geometric median for deep convolutional neural networks acceleration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4340-4349, 2019.
[11] Y. Tang, Y. Wang, Y. Xu, D. Tao, C. Xu, and C. Xu, “SCOP: Scientific control for reliable neural network pruning,” in Neural Information Processing Systems, 33, 10936-10947, 2020.
[12] M. Kang, and B. Han, “Operation-aware soft channel pruning using differentiable Masks,” In International Conference on Machine Learning, pp. 5122-5131, 2020.
[13] M. Lin, R. Ji, Y. Wang, Y. Zhang, B. Zhang, Y. Tian, and L. Shao, “HRank: Filter pruning using high-rank feature map,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1529-1538, 2020.
[14] T. Zhuang, Z. Zhang, Y. Huang, X. Zeng, K. Shuang, and X. Li, “Neuron-level structured pruning using polarization regularizer,” in Neural Information Processing Systems, 33, pp. 9865-9877, 2020.
[15] Y. Li, S. Gu, C. Mayer, L. V. Gool, and R. Timofte, “Group sparsity: The hinge between filter pruning and decomposition for network compression,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8015–8024, 2020.
[16] D. Silver, A, Huang, C. J. Maddison, A. Guez, L. Siffire, G. Van Dan Driessche, J.Schrittwieser, I. Antonoglou, V. Panneershelvam, and M. Lanctot, “Mastering the game of go with deep neural networks and tree search.” Nature, vol. 529, no. 7587, pp.484-489, 2016.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[18] 傅俊輝, “一個適用於卷積神經網路之電荷重新分佈式記憶體內運算加速器,” 碩士論文, 國立成功大學電機工程學系, 2021。
[19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[20] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
[21] C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, ‘‘Inception-v4, inceptionresnet and the impact of residual connections on learning,’’ in Proc. 31st AAAI Conf. Artif. Intell., 2017, pp. 4278–4284.
[22] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition., 2016., pp. 770-778.
[23] G. Huang, Z Liu, L. Van Der Maaten, and K. Q. Weinberger. “Densely connected convolutional networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017.
[24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779-788.
[25] J. Redmon, and A. Farhadi, “YOLOv3: An incremental improvement.” in arXiv preprint arXiv:1804.02767, 2018.
[26] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, ‘‘YOLOv4: Optimal speed and accuracy of object detection,’’ in arXiv:2004.10934, 2020.
[27] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[28] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2018, pp. 4510–4520.
[29] A. Howard et al., “Searching for MobileNetV3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1314–1324.
[30] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size,” in International Conference on Learning Representations, 2017.
[31] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: efficient inference engine on compressed deep neural network,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 243-254, 2016.
[32] J. Frankle, and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in International Conference on Learning Representations, 2019.
[33] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in Proc. ICLR, 2016.
[34] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” Advances in Neural Information Processing Systems, 2016.
[35] B. Jacob, et al., “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704-2713, 2018.
[36] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network quantization: Towards lossless CNNs with low-precision weights,” in International Conference on Learning Representations, 2017.
[37] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1,” Advances in Neural Information Processing Systems, 2015.
[38] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” Advances in Neural Information Processing Systems, 2016.
[39] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-net: ImageNet classification using binary convolutional neural networks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 525–542.
[40] S. Migacz, “Nvidia 8-bit inference width TensorRT,” in GPU Technology Conference, 2017.
[41] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” in arXiv:1308.3432, 2013.
[42] R. Gong, X. Liu, S. Jiang, T. Li, P. Hu, J. Lin, and J. Yan, “Differentiable soft quantization: Bridging full-precision and low-bit neural networks,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852-4861, 2019.
[43] H. Wang and D.-Y. Yeung, ‘‘Towards Bayesian deep learning: A framework and some existing methods,’’ IEEE Trans. Knowl. Data Eng., vol. 28, no. 12, pp. 3395–3408, Dec. 2016.
[44] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, ‘‘Robust recovery of subspace structures by low-rank representation,’’ IEEE transactions on pattern analysis and machine intelligence,’’ vol. 35, no. 1, pp. 171-184, 2013.
[45] Y. Kim and A. M. Rush, “Sequence-level knowledge distillation,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), pp. 1317–1327, 2016.
[46] B. Zoph and Q. V. Le, “Neural Architecture Search with Reinforcement Learning,” in International Conference on Learning Representations (ICLR), 2017.
[47] R.M. Gray, “Vector Quantization,” IEEE ASSP, vol. 1, pp. 4-29, 1984.
[48] P. Singh, V. K. Verma, P. Rai, and V. P. Namboodiri, “Play and prune: Adaptive filter pruning for deep model compression,” in Proc. 28th Int. Joint Conf. Artif. Intell., pp. 3460–3466, 2019.
[49] Y. Tang et al., “Manifold regularized dynamic network pruning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5018–5028, 2021.