簡易檢索 / 詳目顯示

研究生: 呂輝翔
Dee, Hui-Xiang
論文名稱: 可重構之矩陣乘法管線化加速器的設計與實現
Design and Implementation of a Reconfigurable Matrix Multiplication Pipelined Accelerator
指導教授: 周哲民
Jou, Jer-Min
共同指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 44
中文關鍵詞: 深度神經網路矩陣乘法加速器NN指令集架構管線化分塊執行
外文關鍵詞: Deep neural networks (DNNs), Matrix Multiplication, Accelerator, NN Instruction Set Architecture, Pipelined Execution, Tiling Execution
相關次數: 點閱:46下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在當今的深度神經網路(DNN)的發展中,矩陣乘法是深度神經網路中的主要運算,因此透過研究與設計矩陣乘法對於提高DNN的執行效能至關重要。為此本文提出了一種可重構的矩陣乘法管線化加速器,透過矩陣分塊(Tiling),管線化執行等方法提升矩陣乘法的運算效率。首先,本文在加速器的設計上引入了分塊(Tiling)執行策略。通過將輸入矩陣切分成小塊進行計算,從中利用局部性原理,提高數據的重複使用和緩解存儲壓力。其次,本文在設計中採用了管線化技術。通過將操作流程劃分為多個階段並同時進行,我們能夠實現高效的並行處理,從而提高整體的計算效率。管線化還可以減少指令之間的等待時間,使加速器能夠更有效地利用資源。接著是引入了一個NN指令集架構,專門針對神經網路操作進行優化。這種指令集提供了高效的矩陣乘法指令,使得加速器能夠有效地執行DNN中的乘積累加運算。

    In the current development of deep neural networks (DNNs), matrix multiplication plays a crucial role as a fundamental operation. Therefore, research and design of efficient matrix multiplication techniques are essential for enhancing the execution performance of DNNs.
    Therefore, this paper presents a Reconfigurable Matrix Multiplication Pipelined Accelerator to enhance the execution efficiency of Deep Neural Networks (DNNs). The design incorporates matrix tiling, utilizing localized computation for improved data reuse and storage relief. Additionally, pipelining is employed, dividing the operation flow into multiple stages for efficient parallel processing and reduced waiting time between instructions.
    Furthermore, a dedicated NN instruction set architecture is introduced, optimized for neural network operations, enabling the accelerator to efficiently perform matrix multiplication tasks in DNNs. In summary, the proposed accelerator significantly improves the computational efficiency of matrix multiplication operations in DNNs, contributing to the advancement of IC design for deep learning applications.

    摘要 I SUMMARY II PROPOSED DESIGN II EXPERIMENTS VI CONCLUSION VII 誌謝 VIII 目錄 IX 表目錄 X 圖目錄 XI 第一章 緒論 1 1.1研究背景 1 1.2研究動機與目的 2 1.3 論文架構 3 第二章 背景知識與相關研究 4 2.1矩陣乘法定義與運算方式 4 2.2 矩陣乘法的應用領域 5 2.2.1 矩陣乘法在神經網路中的應用 6 2.2.1.1多層感知器(Multilayer perceptron, MLP) 7 2.2.1.2卷積神經網路(Convolutional Neural Networks, CNNs) 9 2.3 矩陣乘法加速的需求和挑戰 11 2.4 矩陣乘法加速器的現今研究與發展 12 第三章 矩陣乘法加速器之設計考量 13 3.1 矩陣乘法的加速器運算優化探討 13 3.2 矩陣乘法的資料流特性 15 3.3 矩陣乘法的迴圈變形 16 3.4 資料流的排程設計 17 3.5 資料流的排程管線化設計 18 3.6 雙緩衝器(Double Buffer)設計 19 3.7 資料流管線化之設計優化 20 第四章 可重構矩陣乘法管線化加速器之架構設計 21 4.1 可重構矩陣乘法管線化加速器介紹 21 4.2 片上緩衝器設計 22 4.3 運算單元陣列設計 23 4.3.1 運算單元陣列之資料流映射 25 4.4 指令集架構(Instruction Set Architecture, ISA) 26 4.4.1 兼具矩陣乘法與卷積運算的指令集架構 26 4.4.2 配置指令(Configuration Instruction)架構 28 4.4.3 操作指令(Operation Instruction)架構 28 4.5 Intra-pipeline Controller設計 30 4.5.1 分佈式控制器設計 31 4.5.2 加速器可重構之控制單元設計 32 4.6 管線化排程之時間分析 34 第五章 實驗環境與數據分析 36 5.1 實驗結果數據 37 5.2 實驗結果分析 41 第六章 結論與未來展望 42 參考文獻 43

    [1] Zhengyou Zhang, M. Lyons, M. Schuster and S. Akamatsu, "Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron," Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998, pp. 454-459, doi: 10.1109/AFGR.1998.670990.
    [2] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278-2324.
    [3] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105.
    [4] Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations, arXiv:1409.1556.
    [5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778.
    [6] A. V. Trusov, E. E. Limonova, D. P. Nikolaev and V. V. Arlazarov, "p-im2col: Simple Yet Efficient Convolution Algorithm With Flexibly Controlled Memory Overhead," in IEEE Access, vol. 9, pp. 168162-168184, 2021, doi: 10.1109/ACCESS.2021.3135690.
    [7] Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9:251--280, 1990.
    [8] Lavin, A., & Gray, S. (2015). Fast Algorithms for Convolutional Neural Networks. arXiv preprint arXiv:1509.09308.
    [9] Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016). EIE: Efficient Inference Engine on Compressed Deep Neural Network. arXiv preprint arXiv:1602.01528.
    [10] L. Tan, L. Chen, Z. Chen, Z. Zong, D. Li and R. Ge, "Improving performance and energy efficiency of matrix multiplication via pipeline broadcast," 2013 IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, IN, USA, 2013, pp. 1-5, doi: 10.1109/CLUSTER.2013.6702672.
    [11] Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2014, pp. 609-622, doi: 10.1109/MICRO.2014.58.
    [12] Y. -H. Chen, T. Krishna, J. S. Emer and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017, doi: 10.1109/JSSC.2016.2616357.
    [13] Dou, Y., Vassiliadis, S., Kuzmanov, G. K., & Gaydadjiev, G. N. (2005). 64-Bit Floating-Point FPGA Matrix Multiplication. In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays (pp. 86-95). New York, NY, USA: Association for Computing Machinery.
    [14] S. Liu et al., "Cambricon: An Instruction Set Architecture for Neural Networks," 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea (South), 2016, pp. 393-405, doi: 10.1109/ISCA.2016.42.

    下載圖示 校內:2024-08-01公開
    校外:2024-08-01公開
    QR CODE