簡易檢索 / 詳目顯示

研究生: 張琬婷
Chang, Wan-Ting
論文名稱: 應用變分通道分佈剪枝與混精度量化之神經網路壓縮
Variational Channel Distribution Pruning and Mixed-Precision Quantization for Neural Network Model Compression
指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 110
語文別: 中文
論文頁數: 74
中文關鍵詞: 卷積神經網路貝葉斯深度學習模型壓縮
外文關鍵詞: convolutional neural network, Bayesian deep learning, model compression
相關次數: 點閱:158下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前深度神經網路在很多領域上都有優秀的成果,然而為了追求更高的準確率,模型越加龐大且層數加深,這些伴隨而來更多的參數量與計算時間,在移動端的應用會受到記憶體與能耗的限制,導致機器學習無法在生活中被廣泛的應用。為了解決這個問題,我們開發模型壓縮的演算法優化網路,提出基於變分通道分佈(Variational Channel Distribution)的網路剪枝與混合精度模型量化方法,使網路參數量減少並加速運算,我們使用貝葉斯深度學習的方法訓練網路,以機率分佈的參數模擬學習中的不確定性,使網路更具有強健性,減少壓縮對準確率的影響,並以參數的概率分佈進而找到網路中冗餘的通道,也根據這項特性決定各層的位元數,對模型進行量化,最後獲得只需以整數和位移操作執行推論運算的模型。在剪枝方法中,在CIFAR-10資料集分類任務下的VGGNet-16可以在不損失準確率的情況下達到5.68倍的參數量壓縮率與1.57倍的乘加運算量加速,進一步實現混合精度模型量化,準確率只下降0.18%時能達到58.91倍的參數壓縮率。

    Deep neural networks have achieved excellent results in many research fields. However, in pursuit of higher accuracy, the models become larger and deeper. It is accompanied by more parameters and calculation time. Since mobile devices are limited by memory and energy consumption, the application of machine learning cannot be widely used in daily life. To solve this problem, we present a model compression framework of pruning and mixed-precision quantization based on channel distribution information. By optimizing the neural network, the number of parameters can be reduced and the calculation speed can be accelerated. We use the variational inference technique to optimize Bayesian deep neural networks, in which the probability distribution of the parameters is used to simulate the uncertainty during learning. It makes the neural network more robust, thereby reducing the impact of compression on accuracy. In addition, according to the characteristic of the probability distribution, we find the redundant channel in the network and determine the bit-width of each layer. Finally, the mixed-precision model is retrained through quantization-aware training and then converted to a model where only integer and shift operations are performed. The experiments conducted on the CIFAR10 dataset with the VGG16 network show that the proposed pruning scheme achieves 5.68x parameter saving and 1.57x MAC compression without the loss in accuracy. The proposed mixed-precision quantization scheme further saves 58.91x parameters at the expense of 0.18% drop in accuracy.

    中文摘要 I 誌謝 XIII 目錄 XIV 表目錄 XVII 圖目錄 XVIII 第一章 緒論 1 1-1 前言 1 1-2 深度學習 2 1-2-1 深度神經網路 2 1-2-2 反向傳播法 4 1-2-3 卷積神經網路 5 1-3 研究動機 8 1-4 研究貢獻 9 1-5 論文架構 10 第二章 相關研究背景介紹 11 2-1 卷積神經網路架構 11 2-1-1 批量歸一化層 11 2-1-2 視覺幾何群網路(VGGNet) 12 2-1-3 殘差網路(ResNet) 14 2-1-4 YOLO物件偵測網路 15 2-2 神經網路壓縮介紹 17 2-2-1 壓縮目標 17 2-2-2 壓縮方法 17 2-2-3 深度壓縮(Deep compression) 19 第三章 剪枝與量化相關文獻回顧 20 3-1 網路剪枝壓縮技術 20 3-1-1 細粒度修剪 20 3-1-2 粗粒度剪枝 21 3-1-3 剪枝流程策略 24 3-1-4 剪枝方法比較 25 3-2 網路量化壓縮技術 27 3-2-1 量化感知訓練 27 3-2-2 向量量化 28 3-2-3 二值化網路 29 3-2-4 定點數量化 30 3-2-5 量化方法比較 32 第四章 基於變分通道分佈之剪枝與混合精度量化 33 4-1 貝葉斯深度學習 34 4-2 變分通道分佈剪枝 37 4-3 應用量化感知訓練之量化網路架構 40 4-3-1 前向傳播之量化誤差模擬 41 4-3-2 反向傳播之量化梯度近似 42 4-3-3 結合批量歸一化層 43 4-3-4 整數運算的推論網路 46 4-4 基於變分通道分佈之混合精度模型 48 4-5 整體網路壓縮流程 49 第五章 實驗環境與數據分析 51 5-1 實驗操作細節 51 5-2 剪枝實驗結果 53 5-2-1 網路修剪結果與比較 53 5-2-2 修剪後的網路架構 58 5-2-3 修剪依據的比較 59 5-3 網路壓縮實驗結果 64 第六章 結論與未來展望 69 6-1 結論 69 6-2 未來展望 69 參考文獻 70

    [1] D. Silver, A, Huang, C. J. Maddison, A. Guez, L. Siffire, G. Van Dan Driessche, J.Schrittwieser, I. Antonoglou, V. Panneershelvam, and M. Lanctot, “Mastering the game of go with deep neural networks and tree search.” Nature, vol. 529, no. 7587, pp.484-489, 2016.
    [2] D. E. Rumelhart, G. E. Hintion, and R. J. Williams, “Learning representations by back-propagation errors,” Cognitive modeling, vol. 5, no. 3, p. 1, 1998.
    [3] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, Nov 1998.
    [4] J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179-211, 1990.
    [5] R. Salakhutdinov, and G. Hinton, “Deep boltzmann machines,” Artificial intelligence and statistics, pp. 448-455, 2009.
    [6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
    [7] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
    [9] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” Advances in neural information processing systems, vol. 28, pp. 2575-2583, 2015.
    [10] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.
    [11] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with convolutions,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
    [12] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
    [13] G. Huang, Z Liu, L. Van Der Maaten, and K. Q. Weinberger. “Densely connected convolutional networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017.
    [14] S. Ioffe, and S. Christian. “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” International conference on machine learning, pp. 448-456, 2015.
    [15] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You only look once: Unified, real-time object detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
    [16] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015, arXiv:1503.02531.
    [17] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size,” 2016, arXiv:1602.07360.
    [18] B. Zoph, and Q. V. Le. “Neural architecture search with reinforcement learning,” 2016, arXiv:1611.01578.
    [19] S. Han, H. Mao, and W. J. Dally. “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” 2015, arXiv:1510.00149.
    [20] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural network,” 2015, arXiv:1506.02626.
    [21] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks using vector quantization,” 2014, arXiv:1412.6115.
    [22] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: efficient inference engine on compressed deep neural network,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 243-254, 2016.
    [23] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” Advances in Neural Information Processing Systems, 2016.
    [24] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” Proceedings of the IEEE International Conference on Computer Vision, pp. 2736-2744, 2017.
    [25] Y. Guo, A. Yao, and Y. Chen, “Dynamic network surgery for efficient dnns,” Advances in Neural Information Processing Systems, 2016.
    [26] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, “Filter pruning via geometric median for deep convolutional neural networks acceleration,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4340-4349, 2019.
    [27] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” 2013, arXiv:1308.3432.
    [28] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1,” 2016, arXiv:1602.02830.
    [29] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, D. Kalenichenko, et al., “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704-2713, 2018.
    [30] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” 2016, arXiv:1605.04711.
    [31] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” In European conference on computer vision, pp. 525-542, 2016.
    [32] S. Migacz, “Nvidia 8-bit inference width TensorRT,” In GPU Technology Conference, 2017.
    [33] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network quantization: Towards lossless cnns with low-precision weights,” 2017, arXiv:1702.03044.
    [34] Y. Li, X. Dong, W. Wang, “Additive powers-of-two quantization: an efficient non-uniform discretization for neural networks,” In International Conference on Learning Representations, 2020,
    [35] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
    [36] C. Zhao, B. Ni, J. Zhang, Q. Zhao, W. Zhang, and Q. Tian, “Variational convolutional neural network pruning,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2780-2789, 2019.
    [37] M. Kang, and B. Han, “Operation-aware soft channel pruning using differentiable Masks,” In International Conference on Machine Learning, pp. 5122-5131, 2020.
    [38] Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural networks,” In Proceedings of the IEEE International Conference on Computer Vision, pp. 1389-1397, 2017.
    [39] ] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting linear structure within convolutional networks for efficient evaluation,” Advances in neural information processing systems, 2014.
    [40] R. Gong, X. Liu, S. Jiang, T. Li, P. Hu, J. Lin, and J. Yan, “Differentiable soft quantization: Bridging full-precision and low-bit neural networks,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852-4861,2019.
    [41] H. Qin, R. Gong, X. Liu, M. Shen, Z. Wei, F. Yu, and J. Song, “Forward and backward information retention for accurate binary neural networks,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2250-2259, 2020.

    下載圖示 校內:2023-11-30公開
    校外:2023-11-30公開
    QR CODE