| 研究生: |
鄭子皇 Cheng, Zih-Huang |
|---|---|
| 論文名稱: |
應用逐位元量化感知訓練稀疏化神經網路 Bit-Wise Quantization-Aware Training for Sparsifying Neural Networks |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 卷積神經網路 、量化感知訓練 、記憶體內運算 |
| 外文關鍵詞: | Convolution Neural Network, Quantization-Aware Training, Computing-In-Memory |
| 相關次數: | 點閱:100 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深度神經網路透過大量資料訓練已在多個領域得到超乎想像的成果,例如圖像分類、物件辨識、語義分割與自然語言處理等領域。在追求網路準確度的同時模型複雜度不斷提升,帶來參數量與運算量的額外負擔。上述問題使范紐曼 (Von Neumann) 瓶頸越發明顯,因此已經有許多文獻提出在記憶體單元內 「原地運算」 (in-situ computation) 以降低高頻的資料交換。然而,深度神經網路需要大量乘積累加運算,使類比記憶體內運算在轉換至數位領域需要使用高解析度的類比數位轉換器,反而導致高面積成本與能源消耗。本論文提出針對權重位元的量化感知訓練,並增強位元稀疏度以減少記憶體內運算的累加值,從而使數位化的過程減少解析的位元數,隨之降低類比數位轉換器的功耗。本論文針對二補數記憶體內運算提升稀疏度,採用對應轉換形式訓練權重。實驗結果顯示,8位元整數VGG-16訓練在CIFAR-10可以達到98.28% 位元層級稀疏度並維持93.78% 準確率。透過稀疏性降低類比數位轉換器功耗可以有效減輕記憶體內運算的功耗瓶頸。
Deep neural networks have achieved results beyond expectation in various fields such as image classification, object recognition, semantic segmentation, and natural language processing. As the pursuit of network accuracy, the model complexity becomes elevating, imposing an additional burden of parameters and computational operations. This scenario brings attention to the Von Neumann bottleneck, leading to numerous research efforts proposing in-situ computation within memory units called Computing-In-Memory (CIM) to reduce the data access times. However, a large amount of multiply-accumulate operations for deep neural networks requires high-resolution analog-to-digital converters for digitalizing analog computations in CIM, which results in high area costs and energy consumption. This paper presents a method of quantization-aware training specific to weight bits, enhancing bit-level sparsity to reduce the accumulated value of computations in CIM. The sparsity allows for a reduction in bit resolution for the digitalization process, consequently reducing the power consumption of analog-to-digital converters. This paper focuses on enhancing sparsity for two's complement in-memory computation, thus adopting two's complement conversion for weight training. Experimental results show that training the 8-bit integer VGG-16 model on CIFAR-10 achieves a bit-level sparsity of 98.28% while maintaining an accuracy of 93.78%. With the sparsity, the bottleneck of the power consumption in CIM is relieved.
[1] D. Silver, A, Huang, C. J. Maddison, A. Guez, L. Siffire, G. Van Dan Driessche, J.Schrittwieser, I. Antonoglou, V. Panneershelvam, and M. Lanctot, “Mastering the Game of Go with Deep Neural Networks and Tree Search.” Nature, vol. 529, no.7587, pp.484-489, 2016.
[2] OpenAI, “GPT-4 Technical Report”, arXiv e-prints, 2023.doi:10.48550/arXiv.2303.08774.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097-1105, 2012.
[4] Y. Lecun, L.Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, pp. 2278-2324, Nov 1998
[5] J. L. Elman, “Finding Structure in Time,” Cognitive science, vol. 14, no. 2, pp. 179-211, 1990.
[6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio, “Generative Adversarial Nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.
[7] C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi and J. Zhong, "Attention Is All You Need In Speech Separation," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 21-25, doi: 10.1109/ICASSP39728.2021.9413901.
[8] D. E. Rumelhart, G. E. Hintion, and R. J. Williams, “Learning Representations by Back-propagation Errors,” Cognitive Modeling, vol. 5, no. 3, p. 1, 1998.
[9] S. Ioffe, and S. Christian. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” International Conference on Machine Learning, pp. 448-456, 2015.
[10] J. Zhang, H. Yang, F. Chen, Y. Wang and H. Li, "Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment," 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS), Vancouver, BC, Canada, 2019, pp. 1-5, doi: 10.1109/EMC2-NIPS53020.2019.00008.
[11] K. Simonyan, and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 2014, arXiv:1409.1556.
[12] C. Szegedy, et al., “Going Deeper with Convolutions,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
[13] K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[14] G. Huang, Z Liu, L. Van Der Maaten, and K. Q. Weinberger. “Densely Connected Convolutional Networks,” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4700-4708, 2017.
[15] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You Only Look Once: Unified, Real-time Object Detection,” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
[16] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu and Y. Ma, "Robust Recovery of Subspace Structures by Low-Rank Representation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171-184, Jan. 2013, doi: 10.1109/TPAMI.2012.88.
[17] G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” 2015, arXiv:1503.02531.
[18] A.G. Howard, et al. "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications." arXiv preprint arXiv:1704.04861 (2017).
[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. -C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510-4520, doi: 10.1109/CVPR.2018.00474.
[20] A. Howard et al., “Searching for MobileNetV3,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 1314-1324 doi: 10.1109/ICCV.2019.00140.NAS
[21] B. Zoph, and Q. V. Le. “Neural Architecture Search with Reinforcement Learning,” 2016, arXiv:1611.01578.
[22] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning Filters for Efficient ConvNets,” Advances in Neural Information Processing Systems, 2016.
[23] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both Weights and Connections for Efficient Neural Network,” 2015, arXiv:1506.02626.
[24] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, D. Kalenichenko, et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704-2713, 2018.
[25] Y. Bengio, N. Léonard, & A. Courville (2013). Estimating or Propagating Gradients Trough Stochastic Neurons for Conditional Computation. arXiv preprint arXiv:1308.3432
[26] H. Yang, L. Duan, Y. Chen, and H. Li. BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization. In International Conference on Learning Representation, 2021
[27] S. Migacz, “Nvidia 8-bit Inference Width TensorRT,” In GPU Technology Conference, 2017.
[28] 盛祖丞, “一個採用雙路輸入架構與預先量化技巧之類比式記憶體內運算加速器,” 碩士論文, 國立成功大學電機程學系, 2023。
[29] C. Y. Wang, A. Bochkovskiy & H. Y. M. Liao. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7464-7475).
[30] W. -T. Chang, C. -H. Kuo and L. -C. Fang, "Variational Channel Distribution Pruning and Mixed-Precision Quantization for Neural Network Model Compression," 2022 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, 2022, pp. 1-3, doi: 10.1109/VLSI-DAT54769.2022.9768055.
[31] N. P. Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, On, Canada, 2017, pp. 1-12, doi: 10.1145/3079856.3080246
[32] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, and A. Lerer, “Automatic Differentiation in PyTorch,” 2017.
[33] X. Si et al., “A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro with 8-b MAC Operation for Edge AI Chips,” in IEEE Journal of Solid-State Circuits, vol. 56, no. 9, pp. 2817-2831, Sept. 2021, doi: 10.1109/JSSC.2021.3073254
[34] GM. Nagel, M. V. Baalen, T. Blankevoort and M. Welling, “Data-Free Quantization Through Weight Equalization and Bias Correction, ” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 1325-1334, doi: 10.1109/ICCV.2019.00141