成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳彥勳 Chen, Yen-Xun
論文名稱：	卷積神經網路之零值省略加速器以及圖塊切割策略 Zero-Skipping Accelerator and Tiling Strategy for Convolutional Neural Networks
指導教授：	郭致宏 Kuo, Chih-Hung
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	65
中文關鍵詞：	深度學習、卷積神經網路、高效節能加速器
外文關鍵詞：	deep learning, convolutional neural networks, energy-efficient accelerators
相關次數：	點閱：43 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本文中，我們提出一個零值省略加速器 (Zero-skipping Accelerator, ZSA) 加速各種 CNN 模型的處理。為了有效地重用數據，我們採用列固定資料流 (Row Stationary Dataflow) 減少數據移動的能量，並且實現高度並行運算。預處理模塊有效地處理稀疏數據，通過小容量暫存器堆 (Register File) 重用非零特徵且廣播至所有處理元件 (Processing Element, PE) 來節省能量，不需要每個 PE 單獨儲存特徵。基於該硬體設計，我們分析不同圖塊切割策略以找到最佳硬體配置。由於ReLU函數能增加數據稀疏性，所提出的加速器用於處理包含ReLU函數的網路模型時可以獲得更加優異的效能。實驗結果表明，所提出的設計比其他最先進的 CNN 加速器節能約 2.0 到 3.1 倍。

In this paper, we design a zero-skipping accelerator (ZSA) to accelerate the process of CNNs for various models. In order to effectively reuse data, we adopt the row stationary dataflow to reduce the energy of data movement for highly parallel computing. The Pre-processing Module efficiently handles sparse data and saves energy by reusing non-zero features within a small register file, which is shared by all processing elements (PEs) instead of storing data individually for each PE. Based on this hardware design, different tiling strategies are explored to find a suitable implementation. Since the ReLU function can increase the sparsity of the data, the proposed accelerator can achieve better performance when processing network models with the ReLU function. Experimental results show that the proposed design saves energy about 2.0 to 3.1 times more than other state-of-the-art CNN accelerators.

中文摘要	I
英文摘要	II
誌謝	XI
目錄	XII
圖目錄	XIV
表目錄	XVI
第一章	緒論	1
1-1	前言	1
1-2	研究動機	1
1-3	研究貢獻	4
1-4	論文架構	5
第二章	相關研究背景介紹	6
2-1	深度學習	6
2-2	卷積神經網路	8
2-3	卷積神經網路的量化	13
第三章	卷積神經網路硬體加速器相關文獻回顧	15
3-1	經典卷積神經網路加速器	15
3-1-1	DianNao系列	15
3-1-2	張量處理器 (Tensor Processing Unit, TPU)	17
3-1-3	Eyeriss系列	18
3-2	零值省略硬體架構	22
3-2-1	Cnvlutin	22
3-2-2	Tile-Based Row-Independent Accelerator (TBRI)	22
3-2-3	COSY	23
3-3	相關研究方法比較	26
第四章	硬體加速器設計與圖塊切割策略分析	28
4-1	圖塊切割方式	29
4-2	處理模塊 (Processing Module)	34
4-3	圖塊切割策略分析 (Tiling Strategy Analysis)	36
4-4	預處理模塊 (Pre-processing Module)	41
4-5	後處理模塊 (Post-processing Module)	44
第五章	實驗環境與數據分析	46
5-1	資料流分析 (Data Flow Analysis)	47
5-2	PE數量分析 (PE Numbers Analysis)	50
5-3	DRAM頻寬分析 (DRAM Bandwidth Analysis)	51
5-4	性能比較 (Performance Comparison)	52
5-5	Verilog實現	54
5-6	與先前架構比較 (Comparison with Prior Architecture)	55
5-7	ESL驗證 (ESL Verification)	57
第六章	結論與未來展望	60
6-1	結論	60
6-2	未來展望	60
參考文獻	61
                                    

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[3] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[4] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704–2713, 2018.
[5] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127–138, 2017.
[6] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th annual international symposium on computer architecture, pp. 1–12, 2017.
[7] C. Xin, Q. Chen, M. Tian, M. Ji, C. Zou, B. Wang, et al., “Cosy: an energyefficient hardware architecture for deep convolutional neural networks based on systolic array,” in 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), pp. 180–189, IEEE, 2017.
[8] M.-Z. Ji, W.-C. Tseng, T.-J. Wu, B.-R. Lin, and C.-H. Chen, “Micro darknet for inference: Esl reference for inference accelerator design,” in 2019 International SoC Design Conference (ISOCC), pp. 69–70, IEEE, 2019.
[9] M. Horowitz, “1.1 computing’s energy problem (and what we can do about it),” in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14, IEEE, 2014.
[10] Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019.
[11] P.-T. Huang, I.-C. Wu, C.-Y. Lo, and W. Hwang, “Energy-efficient accelerator design with tile-based row-independent compressed memory for sparse compressed convolutional neural networks,” IEEE Open Journal of Circuits and Systems, vol. 2, pp. 131–143, 2021.
[12] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[13] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115–133, 1943.
[14] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
[15] D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in 2012 IEEE conference on computer vision and pattern recognition, pp. 3642–3649, IEEE, 2012.
[16] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82– 97, 2012.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[19] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1,” arXiv preprint arXiv:1602.02830, 2016.
[20] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European conference on computer vision, pp. 525–542, Springer, 2016.
[21] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869–6898, 2017.
[22] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269–284, 2014.
[23] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al., “Dadiannao: A machine-learning supercomputer,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622, IEEE, 2014.
[24] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: Shifting vision processing closer to the sensor,” in Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 92–104, 2015.
[25] D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, “Pudiannao: A polyvalent machine learning accelerator,” ACM SIGARCH Computer Architecture News, vol. 43, no. 1, pp. 369– 381, 2015.
[26] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12, IEEE, 2016.
[27] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 1–13, 2016.
[28] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[29] V. Panchbhaiyye and T. Ogunfunmi, “A fifo based accelerator for convolutional neural networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1758– 1762, IEEE, 2020.
[30] C.-N. Liu, Y.-A. Lai, C.-H. Kuo, and S.-A. Zhan, “Design of 2d systolic array accelerator for quantized convolutional neural networks,” in 2021 International Symposium on VLSI Design, Automation and Test (VLSIDAT), pp. 1–4, IEEE, 2021.
[31] J. Jo, S. Kim, and I.-C. Park, “Energy-efficient convolution architecture based on rescheduled dataflow,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 12, pp. 4196–4207, 2018.
[32] M.-Z. Ji, W.-C. Tseng, T.-J. Wu, B.-R. Lin, and C.-H. Chen, “Micro darknet for inference: Esl reference for inference accelerator design,” in 2019 International SoC Design Conference (ISOCC), pp. 69–70, IEEE, 2019.

校內：2024-05-24公開
校外：2024-05-24公開

簡易檢索 / 詳目顯示

相關論文