簡易檢索 / 詳目顯示

研究生: 方莉珺
Fang, Li-Chun
論文名稱: 應用變分遮罩剪枝與量化感知訓練之神經網路壓縮策略
Convolutional Neural Network Compression Strategies with Variational Mask Pruning and Quantization-Aware Training
指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 67
中文關鍵詞: 卷積神經網路模型量化參數剪枝貝葉斯深度學習
外文關鍵詞: convolutional neural network, model quantization,, parameter pruning, bayesian deep learning
相關次數: 點閱:26下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年來卷積神經網路(Convolutional Neural Networks, CNNs)已在許多領域中達到前所未有的成果,例如圖像分類、物件辨識等領域,取得優越模型準確率的同時,也伴隨著高額的儲存空間與計算資源,使得CNNs不易部署於資源有限的穿戴或移動裝置上執行實時推斷。為了解決這些問題,模型壓縮被廣泛應用於卷積神經網路,在保持效能的同時降低網路的運算成本。
    本論文提出CNNs的剪枝與量化方法,降低網路的複雜度與能耗。針對網路剪枝部分,提出一種基於變分遮罩的通道剪枝方法,變分遮罩同時考慮批量歸一層中縮放和移位參數的貢獻,更精確地決定計算網路通道的重要性。藉由貝葉斯深度學習訓練變分遮罩的分佈,模擬訓練過程中的不確定因素,有助於提升網路強健性。針對網路量化部分,使用量化感知訓練將激活與權重皆量化為8位元整數,在訓練階段模擬CIM(Computing in Memory)類比加速器的乘加運算,並考慮類比電路的非理想效應,取得更加適應加速器的整數模型。根據實驗結果,以CIFAR-10數據集針對分類模型VGG-16執行剪枝,在不損失準確率下,能減少95.39%的參數量與70.43%的運算量。後續再對剪枝模型執行量化操作,在準確率只下降0.2%的情況下,能進一步減少16倍的位元運算量。

    In recent years, Convolutional Neural Networks (CNNs) have accomplished state-of-the-art performance in many fields. However, the significant memory access and computational resources that come with CNNs constrain the deployment in resource-constrained edge devices to perform real-time inference. To alleviate this limitation, model compression strategies have been proposed to accelerate the inference of CNNs.
    This paper proposes pruning and quantization methods to reduce model memory storage and computational complexity. A structured pruning method based on variational masks that simultaneously consider the contributions of scaling and shifting parameters in batch normalization layers to determine the importance of each channel. Using Bayesian deep learning to train the distributions of variational masks to simulate the uncertainty during training, making the model more robust. For the network quantization, we apply the quantization-aware training approach to quantize the activation and weight into 8-bit integers and simulate the computation behavior of the analog CIM accelerator and consider the non-ideal effects of the analog circuit.
    The pruning experiment conducted on the CIFAR-10 dataset with VGG-16 can achieve 95.39% parameter saving and 70.43% computation reduction without losing accuracy. When quantizing the pruned model with the proposed approach, the number of bit operations can further be reduced by 16x when only drops 0.2% in accuracy.

    目錄 中文摘要 I 目錄 XIII 圖目錄 XVI 表目錄 XVIII 第一章 緒論 1 1-1 前言 1 1-2 研究動機 1 1-3 研究貢獻 2 1-4 論文架構 3 第二章 相關研究背景介紹 4 2-1 卷積神經網路的架構 4 2-2 經典卷積神經網路 7 2-2-1 視覺幾何群網路 (Visual Geometry Group Network, VGGNet) 7 2-2-2 殘差網路 (Residual networks, ResNet) 8 2-2-3 YOLO物件偵測網路 (YOLO objection detection networks) 9 2-3 神經網路壓縮 11 2-3-1 低秩分解 (Low-rank factorization) 11 2-3-2 知識蒸餾 (Knowledge distillation) 11 2-3-3 緊湊網路設計 (Compact network design) 11 2-3-4 網路架構搜索 (Neural architecture search) 12 2-3-5 參數量化 (Parameter quantization) 12 2-3-6 參數剪枝 (Parameter pruning) 12 第三章 文獻回顧 13 3-1 網路剪枝技術 13 3-1-1 非結構化修剪 13 3-1-2 結構化修剪 15 3-1-3 一次性修剪 17 3-1-4 迭代修剪 18 3-1-5 軟修剪 18 3-1-6 修剪方法比較 19 3-2 網路量化壓縮技術 20 3-2-1 訓練後量化 20 3-2-2 量化感知訓練 21 3-2-3 對稱量化 21 3-2-4 非對稱量化 22 3-2-5 網路量化相關方法 23 3-2-6 量化方法比較 25 第四章 應用變分遮罩之通道剪枝與量化感知訓練 27 4-1 通道剪枝方法 29 4-1-1 通道重要性評估 29 4-1-2 貝葉斯深度學習與變分推論 30 4-1-3 基於變分捨棄之通道剪枝訓練 33 4-1-4 基於變分遮罩之捨棄率執行通道剪枝 35 4-1-5 通道剪枝之微調訓練 38 4-2 CIM(Computing in Memory)運算之量化感知訓練 40 4-2-1 量化感知訓練之前向傳播 40 4-2-2 模擬CIM加速器之前向傳播運算 43 4-2-3 量化感知訓練之反向傳播 44 4-2-4 量化感知訓練之整數推斷模型 45 第五章 實驗環境與數據分析 47 5-1 資料集(Dataset) 47 5-2 模型實現細節 47 5-3 剪枝實驗結果 48 5-4 剪枝方法的比較 49 5-5 變分遮罩有效性分析 53 5-6 網路剪枝方法比較 54 5-7 量化模型的效能 56 第六章 結論與未來展望 61 6-1 結論 61 6-2 未來展望 61 參考文獻 62

    [1] H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally, “Exploring the regularity of sparse structure in convolutional neural networks,” in Conference on Neural Information Processing Systems, 2017.
    [2] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” Advances in Neural Information Processing Systems, 2016.
    [3] S. Ioffe, and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp. 448-456, PMLR, 2015.
    [4] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” Proceedings of the IEEE International Conference on Computer Vision, pp. 2736-2744, 2017.
    [5] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” Advances in neural information processing systems, vol. 28, pp. 2575-2583, 2015.
    [6] D. Molchanov, A. Ashukha and D. Vetrov, “Variational dropout sparsifies deep neural networks,” in International Conference on Machine Learning, pp. 2498-2507, PMLR, 2017.
    [7] C. Zhao, B. Ni, J. Zhang, Q. Zhao, W. Zhang, and Q. Tian, “Variational convolutional neural network pruning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2780-2789, 2019.
    [8] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang, “Soft filter pruning for accelerating deep convolutional neural networks,” in the International Joint Conferences on Artificial Intelligence, pp. 2234–2240, 2018.
    [9] X. Ning, T. Zhao, W. Li, P. Lei, Y. Wang, and H. Yang, “DSA: More efficient budgeted pruning via differentiable sparsity allocation,” in Proceedings of the European Conference on Computer Vision, pp. 592–607, 2020.
    [10] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, “Filter pruning via geometric median for deep convolutional neural networks acceleration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4340-4349, 2019.
    [11] Y. Tang, Y. Wang, Y. Xu, D. Tao, C. Xu, and C. Xu, “SCOP: Scientific control for reliable neural network pruning,” in Neural Information Processing Systems, 33, 10936-10947, 2020.
    [12] M. Kang, and B. Han, “Operation-aware soft channel pruning using differentiable Masks,” In International Conference on Machine Learning, pp. 5122-5131, 2020.
    [13] M. Lin, R. Ji, Y. Wang, Y. Zhang, B. Zhang, Y. Tian, and L. Shao, “HRank: Filter pruning using high-rank feature map,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1529-1538, 2020.
    [14] T. Zhuang, Z. Zhang, Y. Huang, X. Zeng, K. Shuang, and X. Li, “Neuron-level structured pruning using polarization regularizer,” in Neural Information Processing Systems, 33, pp. 9865-9877, 2020.
    [15] Y. Li, S. Gu, C. Mayer, L. V. Gool, and R. Timofte, “Group sparsity: The hinge between filter pruning and decomposition for network compression,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8015–8024, 2020.
    [16] D. Silver, A, Huang, C. J. Maddison, A. Guez, L. Siffire, G. Van Dan Driessche, J.Schrittwieser, I. Antonoglou, V. Panneershelvam, and M. Lanctot, “Mastering the game of go with deep neural networks and tree search.” Nature, vol. 529, no. 7587, pp.484-489, 2016.
    [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
    [18] 傅俊輝, “一個適用於卷積神經網路之電荷重新分佈式記憶體內運算加速器,” 碩士論文, 國立成功大學電機工程學系, 2021。
    [19] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [20] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
    [21] C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, ‘‘Inception-v4, inceptionresnet and the impact of residual connections on learning,’’ in Proc. 31st AAAI Conf. Artif. Intell., 2017, pp. 4278–4284.
    [22] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition., 2016., pp. 770-778.
    [23] G. Huang, Z Liu, L. Van Der Maaten, and K. Q. Weinberger. “Densely connected convolutional networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017.
    [24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779-788.
    [25] J. Redmon, and A. Farhadi, “YOLOv3: An incremental improvement.” in arXiv preprint arXiv:1804.02767, 2018.
    [26] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, ‘‘YOLOv4: Optimal speed and accuracy of object detection,’’ in arXiv:2004.10934, 2020.
    [27] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
    [28] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2018, pp. 4510–4520.
    [29] A. Howard et al., “Searching for MobileNetV3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1314–1324.
    [30] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size,” in International Conference on Learning Representations, 2017.
    [31] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: efficient inference engine on compressed deep neural network,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 243-254, 2016.
    [32] J. Frankle, and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in International Conference on Learning Representations, 2019.
    [33] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in Proc. ICLR, 2016.
    [34] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” Advances in Neural Information Processing Systems, 2016.
    [35] B. Jacob, et al., “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704-2713, 2018.
    [36] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network quantization: Towards lossless CNNs with low-precision weights,” in International Conference on Learning Representations, 2017.
    [37] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1,” Advances in Neural Information Processing Systems, 2015.
    [38] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” Advances in Neural Information Processing Systems, 2016.
    [39] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-net: ImageNet classification using binary convolutional neural networks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 525–542.
    [40] S. Migacz, “Nvidia 8-bit inference width TensorRT,” in GPU Technology Conference, 2017.
    [41] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” in arXiv:1308.3432, 2013.
    [42] R. Gong, X. Liu, S. Jiang, T. Li, P. Hu, J. Lin, and J. Yan, “Differentiable soft quantization: Bridging full-precision and low-bit neural networks,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852-4861, 2019.
    [43] H. Wang and D.-Y. Yeung, ‘‘Towards Bayesian deep learning: A framework and some existing methods,’’ IEEE Trans. Knowl. Data Eng., vol. 28, no. 12, pp. 3395–3408, Dec. 2016.
    [44] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, ‘‘Robust recovery of subspace structures by low-rank representation,’’ IEEE transactions on pattern analysis and machine intelligence,’’ vol. 35, no. 1, pp. 171-184, 2013.
    [45] Y. Kim and A. M. Rush, “Sequence-level knowledge distillation,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), pp. 1317–1327, 2016.
    [46] B. Zoph and Q. V. Le, “Neural Architecture Search with Reinforcement Learning,” in International Conference on Learning Representations (ICLR), 2017.
    [47] R.M. Gray, “Vector Quantization,” IEEE ASSP, vol. 1, pp. 4-29, 1984.
    [48] P. Singh, V. K. Verma, P. Rai, and V. P. Namboodiri, “Play and prune: Adaptive filter pruning for deep model compression,” in Proc. 28th Int. Joint Conf. Artif. Intell., pp. 3460–3466, 2019.
    [49] Y. Tang et al., “Manifold regularized dynamic network pruning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5018–5028, 2021.

    下載圖示 校內:2024-09-20公開
    校外:2024-09-20公開
    QR CODE