| 研究生: | 陳彥勳 Chen, Yen-Xun | 
|---|---|
| 論文名稱: | 卷積神經網路之零值省略加速器以及圖塊切割策略 Zero-Skipping Accelerator and Tiling Strategy for Convolutional Neural Networks | 
| 指導教授: | 郭致宏 Kuo, Chih-Hung | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2022 | 
| 畢業學年度: | 110 | 
| 語文別: | 中文 | 
| 論文頁數: | 65 | 
| 中文關鍵詞: | 深度學習 、卷積神經網路 、高效節能加速器 | 
| 外文關鍵詞: | deep learning, convolutional neural networks, energy-efficient accelerators | 
| 相關次數: | 點閱:43 下載:3 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
在本文中,我們提出一個零值省略加速器 (Zero-skipping Accelerator, ZSA) 加速各種 CNN 模型的處理。為了有效地重用數據,我們採用列固定資料流 (Row Stationary Dataflow) 減少數據移動的能量,並且實現高度並行運算。預處理模塊有效地處理稀疏數據,通過小容量暫存器堆 (Register File) 重用非零特徵且廣播至所有處理元件 (Processing Element, PE) 來節省能量,不需要每個 PE 單獨儲存特徵。基於該硬體設計,我們分析不同圖塊切割策略以找到最佳硬體配置。由於ReLU函數能增加數據稀疏性,所提出的加速器用於處理包含ReLU函數的網路模型時可以獲得更加優異的效能。實驗結果表明,所提出的設計比其他最先進的 CNN 加速器節能約 2.0 到 3.1 倍。
In this paper, we design a zero-skipping accelerator (ZSA) to accelerate the process of CNNs for various models. In order to effectively reuse data, we adopt the row stationary dataflow to reduce the energy of data movement for highly parallel computing. The Pre-processing Module efficiently handles sparse data and saves energy by reusing non-zero features within a small register file, which is shared by all processing elements (PEs) instead of storing data individually for each PE. Based on this hardware design, different tiling strategies are explored to find a suitable implementation. Since the ReLU function can increase the sparsity of the data, the proposed accelerator can achieve better performance when processing network models with the ReLU function. Experimental results show that the proposed design saves energy about 2.0 to 3.1 times more than other state-of-the-art CNN accelerators.
[1]	A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
[2]	K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[3]	J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[4]	B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704–2713, 2018.
[5]	Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127–138, 2017.
[6]	N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th annual international symposium on computer architecture, pp. 1–12, 2017.
[7]	C. Xin, Q. Chen, M. Tian, M. Ji, C. Zou, B. Wang, et al., “Cosy: an energyefficient hardware architecture for deep convolutional neural networks based on systolic array,” in 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), pp. 180–189, IEEE, 2017.
[8]	M.-Z. Ji, W.-C. Tseng, T.-J. Wu, B.-R. Lin, and C.-H. Chen, “Micro darknet for inference: Esl reference for inference accelerator design,” in 2019 International SoC Design Conference (ISOCC), pp. 69–70, IEEE, 2019.
[9]	M. Horowitz, “1.1 computing’s energy problem (and what we can do about it),” in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14, IEEE, 2014.
[10]	Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019.
[11]	P.-T. Huang, I.-C. Wu, C.-Y. Lo, and W. Hwang, “Energy-efficient accelerator design with tile-based row-independent compressed memory for sparse compressed convolutional neural networks,” IEEE Open Journal of Circuits and Systems, vol. 2, pp. 131–143, 2021.
[12]	D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[13]	W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115–133, 1943.
[14]	F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
[15]	D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in 2012 IEEE conference on computer vision and pattern recognition, pp. 3642–3649, IEEE, 2012.
[16]	G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82– 97, 2012.
[17]	Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[18]	K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[19]	M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1,” arXiv preprint arXiv:1602.02830, 2016.
[20]	M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European conference on computer vision, pp. 525–542, Springer, 2016.
[21]	I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869–6898, 2017.
[22]	T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269–284, 2014.
[23]	Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al., “Dadiannao: A machine-learning supercomputer,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622, IEEE, 2014.
[24]	Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: Shifting vision processing closer to the sensor,” in Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 92–104, 2015.
[25]	D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, “Pudiannao: A polyvalent machine learning accelerator,” ACM SIGARCH Computer Architecture News, vol. 43, no. 1, pp. 369– 381, 2015.
[26]	S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12, IEEE, 2016.
[27]	J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 1–13, 2016.
[28]	A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[29]	V. Panchbhaiyye and T. Ogunfunmi, “A fifo based accelerator for convolutional neural networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1758– 1762, IEEE, 2020.
[30]	C.-N. Liu, Y.-A. Lai, C.-H. Kuo, and S.-A. Zhan, “Design of 2d systolic array accelerator for quantized convolutional neural networks,” in 2021 International Symposium on VLSI Design, Automation and Test (VLSIDAT), pp. 1–4, IEEE, 2021.
[31]	J. Jo, S. Kim, and I.-C. Park, “Energy-efficient convolution architecture based on rescheduled dataflow,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 12, pp. 4196–4207, 2018.
[32]	M.-Z. Ji, W.-C. Tseng, T.-J. Wu, B.-R. Lin, and C.-H. Chen, “Micro darknet for inference: Esl reference for inference accelerator design,” in 2019 International SoC Design Conference (ISOCC), pp. 69–70, IEEE, 2019.