| 研究生: |
呂輝翔 Dee, Hui-Xiang |
|---|---|
| 論文名稱: |
可重構之矩陣乘法管線化加速器的設計與實現 Design and Implementation of a Reconfigurable Matrix Multiplication Pipelined Accelerator |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 共同指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 深度神經網路 、矩陣乘法 、加速器 、NN指令集架構 、管線化 、分塊執行 |
| 外文關鍵詞: | Deep neural networks (DNNs), Matrix Multiplication, Accelerator, NN Instruction Set Architecture, Pipelined Execution, Tiling Execution |
| 相關次數: | 點閱:46 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在當今的深度神經網路(DNN)的發展中,矩陣乘法是深度神經網路中的主要運算,因此透過研究與設計矩陣乘法對於提高DNN的執行效能至關重要。為此本文提出了一種可重構的矩陣乘法管線化加速器,透過矩陣分塊(Tiling),管線化執行等方法提升矩陣乘法的運算效率。首先,本文在加速器的設計上引入了分塊(Tiling)執行策略。通過將輸入矩陣切分成小塊進行計算,從中利用局部性原理,提高數據的重複使用和緩解存儲壓力。其次,本文在設計中採用了管線化技術。通過將操作流程劃分為多個階段並同時進行,我們能夠實現高效的並行處理,從而提高整體的計算效率。管線化還可以減少指令之間的等待時間,使加速器能夠更有效地利用資源。接著是引入了一個NN指令集架構,專門針對神經網路操作進行優化。這種指令集提供了高效的矩陣乘法指令,使得加速器能夠有效地執行DNN中的乘積累加運算。
In the current development of deep neural networks (DNNs), matrix multiplication plays a crucial role as a fundamental operation. Therefore, research and design of efficient matrix multiplication techniques are essential for enhancing the execution performance of DNNs.
Therefore, this paper presents a Reconfigurable Matrix Multiplication Pipelined Accelerator to enhance the execution efficiency of Deep Neural Networks (DNNs). The design incorporates matrix tiling, utilizing localized computation for improved data reuse and storage relief. Additionally, pipelining is employed, dividing the operation flow into multiple stages for efficient parallel processing and reduced waiting time between instructions.
Furthermore, a dedicated NN instruction set architecture is introduced, optimized for neural network operations, enabling the accelerator to efficiently perform matrix multiplication tasks in DNNs. In summary, the proposed accelerator significantly improves the computational efficiency of matrix multiplication operations in DNNs, contributing to the advancement of IC design for deep learning applications.
[1] Zhengyou Zhang, M. Lyons, M. Schuster and S. Akamatsu, "Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron," Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998, pp. 454-459, doi: 10.1109/AFGR.1998.670990.
[2] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278-2324.
[3] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105.
[4] Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations, arXiv:1409.1556.
[5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778.
[6] A. V. Trusov, E. E. Limonova, D. P. Nikolaev and V. V. Arlazarov, "p-im2col: Simple Yet Efficient Convolution Algorithm With Flexibly Controlled Memory Overhead," in IEEE Access, vol. 9, pp. 168162-168184, 2021, doi: 10.1109/ACCESS.2021.3135690.
[7] Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9:251--280, 1990.
[8] Lavin, A., & Gray, S. (2015). Fast Algorithms for Convolutional Neural Networks. arXiv preprint arXiv:1509.09308.
[9] Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016). EIE: Efficient Inference Engine on Compressed Deep Neural Network. arXiv preprint arXiv:1602.01528.
[10] L. Tan, L. Chen, Z. Chen, Z. Zong, D. Li and R. Ge, "Improving performance and energy efficiency of matrix multiplication via pipeline broadcast," 2013 IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, IN, USA, 2013, pp. 1-5, doi: 10.1109/CLUSTER.2013.6702672.
[11] Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2014, pp. 609-622, doi: 10.1109/MICRO.2014.58.
[12] Y. -H. Chen, T. Krishna, J. S. Emer and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017, doi: 10.1109/JSSC.2016.2616357.
[13] Dou, Y., Vassiliadis, S., Kuzmanov, G. K., & Gaydadjiev, G. N. (2005). 64-Bit Floating-Point FPGA Matrix Multiplication. In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays (pp. 86-95). New York, NY, USA: Association for Computing Machinery.
[14] S. Liu et al., "Cambricon: An Instruction Set Architecture for Neural Networks," 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea (South), 2016, pp. 393-405, doi: 10.1109/ISCA.2016.42.