簡易檢索 / 詳目顯示

研究生: 許廷瑋
Hsu, Ting-Wei
論文名稱: 新式高效稀疏矩陣乘法加速器
A New High Efficiency Sparse Matrix Multiplication Accelerator
指導教授: 周哲民
Jou, Jer-Min
共同指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 50
中文關鍵詞: 稀疏矩陣乘法Gustavson列乘列資料流資料壓縮加速器
外文關鍵詞: Sparse Matrix Multiplication, Gustavson's Row Times Row Data Flow, Data Compression, Accelerator
相關次數: 點閱:63下載:16
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 捲積神經網路(CNN)在圖像辨識領域中取得了突破性的進展,使得許多應用例如人臉識別、物體檢測等變得更加準確;循環神經網路(RNN)在自然語言處理中得到了廣泛應用,例如語言生成、機器翻譯和情感分析等;Transformer模型在機器翻譯、語言生成和文本分類等任務中取得了顯著的成果,然而,隨著模型不斷地擴大和加深,計算和存儲成本也快速增長,主要是因為Transformer模型中自注意力機制涉及到大量的矩陣乘法計算,矩陣乘法計算非常耗時且佔用大量的記憶體空間。
    稀疏矩陣乘法(Sparse Matrix Multiplication, SPMM)是一種僅計算和存儲非零元素的方法,能夠減少不必要的計算量和存儲需求,提高計算效率。Transformer模型中,自注意力機制的參數矩陣往往是巨大且稀疏的,因此,如果能夠有效地利用稀疏性,可以提高計算效率,並節省存儲空間。Gustavson列乘列是一種列固定的資料流,相對於內積和外積,列乘列的優點是僅需要進行非零值和列向量的乘法,不同列可以利用並行計算,此外,列乘列計算出一個列大小的部分輸出矩陣,因此存儲成本較低。因此本文提出新式高效稀疏矩陣乘法加速器,利用稀疏矩陣乘法和列乘列資料流的特性,來提高計算的效率,同時降低存儲的成本。

    Convolutional Neural Networks have gotten a breakthrough in the field of image recognition, leading to increasing accuracy in various applications such as facial recognition and object detection. Recurrent Neural Networks have been widely used in natural language processing tasks, such as language generation, machine translation and sentiment analysis. Transformer have achieved significant success in machine translation, language generation and text classification. However, as Transformer models continue to widen and deepen, the computational and storage costs have also increased rapidly. This is primarily due to the self-attention mechanism involved in Transformer, which requires a substantial amount of matrix multiplication computations that are both time-consuming and memory-intensive.
    Sparse Matrix Multiplication is a method that only computes and stores non-zero elements, reducing unnecessary computational and storage requirements. In Transformer models, the parameter matrices involved in the self-attention mechanism are often large and sparse. Therefore, effectively leveraging sparsity can enhance computational efficiency and reduce storage space. Gustavson is a row stationary data flow. Compared to inner product and outer product, Gustavson has the advantage of only requiring non-zero value and row vector multiplications. Different rows can be computed in parallel. Additionally, Gustavson calculates a row-sized output partial sum, resulting in lower storage costs.
    This paper proposes a new high efficiency sparse matrix multiplication accelerator that utilizes the characteristics of sparse matrix multiplication and the row times row data flow to improve computational efficiency and reduce storage costs.

    摘要 I SUMMARY II PROPOSED DESIGN II EXPERIMENTS V CONCLUSION VI 誌謝 VII 目錄 VIII 圖目錄 X 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 1 1.3 論文架構 2 第二章 背景知識與相關研究 3 2.1 機器學習與類神經網路架構回顧 3 2.2 循環神經網路 5 2.3 長短期記憶網路 5 2.4 捲積神經網路 7 2.5 TRANSFORMER 12 2.6 TRANSFORMER模型 19 第三章 矩陣相乘資料流和稀疏矩陣壓縮格式 23 3.1 矩陣相乘的資料流 23 3.2 稀疏矩陣的資料壓縮格式 27 第四章 新式高效稀疏矩陣乘法加速器架構設計 30 4.1 輸入輸出矩陣切塊 31 4.2 新式高效稀疏矩陣乘法加速器架構介紹 32 4.3 記憶體控制器設計 34 4.4 資料排程單元設計 35 4.5 矩陣B共用緩衝器設計 38 4.6 運算單元集合設計 39 4.7 並行合併單元設計 40 第五章 實驗結果與討論 43 5.1 實驗環境與實驗方式 43 5.2 實驗結果 44 第六章 結論與未來展望 48 參考文獻 49

    [1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. CoRR, abs/1706.03762.
    [2] Hegde, K., Yu, J., Agrawal, R., Yan, M., Pellauer, M., & Fletcher, C. W. (2018). UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition. CoRR, abs/1804.06508.
    [3] E. Qin et al., "SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training," 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA, 2020, pp. 58-70, doi: 10.1109/HPCA47549.2020.00015.
    [4] S. Pal et al., "OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vienna, 2018, pp. 724-736, doi: 10.1109/HPCA.2018.00067.
    [5] Zhang, Z., Wang, H., Han, S., & Dally, W. J. (2020). SpArch: Efficient Architecture for Sparse Matrix Multiplication. CoRR, abs/2002.08947.
    [6] Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J. S., Keckler, S. W., & Dally, W. J. (2017). SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.
    [7] Hegde, K., Asghari-Moghaddam, H., Pellauer, M., Crago, N., Jaleel, A., Solomonik, E., Emer, J., & Fletcher, C. W. (2019). ExTensor: An Accelerator for Sparse Tensor Algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 319–333).
    [8] Zhang, G., Attaluri, N., Emer, J. S., & Sanchez, D. (2021). Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 687–701).
    [9] Ruck, D., Rogers, S., & Kabrisky, M. (1993, July). Feature Selection Using a Multilayer Perceptron. Journal of Neural Network Computing, 2.
    [10] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. CoRR, abs/1512.03385.
    [11] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25). Curran Associates, Inc.
    [12] Jain, L. C., & Medsker, L. R. (1999). Recurrent Neural Networks: Design and Applications (1st ed.). CRC Press, Inc.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE