| 研究生: |
許廷瑋 Hsu, Ting-Wei |
|---|---|
| 論文名稱: |
新式高效稀疏矩陣乘法加速器 A New High Efficiency Sparse Matrix Multiplication Accelerator |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 共同指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 稀疏矩陣乘法 、Gustavson列乘列資料流 、資料壓縮 、加速器 |
| 外文關鍵詞: | Sparse Matrix Multiplication, Gustavson's Row Times Row Data Flow, Data Compression, Accelerator |
| 相關次數: | 點閱:63 下載:16 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
捲積神經網路(CNN)在圖像辨識領域中取得了突破性的進展,使得許多應用例如人臉識別、物體檢測等變得更加準確;循環神經網路(RNN)在自然語言處理中得到了廣泛應用,例如語言生成、機器翻譯和情感分析等;Transformer模型在機器翻譯、語言生成和文本分類等任務中取得了顯著的成果,然而,隨著模型不斷地擴大和加深,計算和存儲成本也快速增長,主要是因為Transformer模型中自注意力機制涉及到大量的矩陣乘法計算,矩陣乘法計算非常耗時且佔用大量的記憶體空間。
稀疏矩陣乘法(Sparse Matrix Multiplication, SPMM)是一種僅計算和存儲非零元素的方法,能夠減少不必要的計算量和存儲需求,提高計算效率。Transformer模型中,自注意力機制的參數矩陣往往是巨大且稀疏的,因此,如果能夠有效地利用稀疏性,可以提高計算效率,並節省存儲空間。Gustavson列乘列是一種列固定的資料流,相對於內積和外積,列乘列的優點是僅需要進行非零值和列向量的乘法,不同列可以利用並行計算,此外,列乘列計算出一個列大小的部分輸出矩陣,因此存儲成本較低。因此本文提出新式高效稀疏矩陣乘法加速器,利用稀疏矩陣乘法和列乘列資料流的特性,來提高計算的效率,同時降低存儲的成本。
Convolutional Neural Networks have gotten a breakthrough in the field of image recognition, leading to increasing accuracy in various applications such as facial recognition and object detection. Recurrent Neural Networks have been widely used in natural language processing tasks, such as language generation, machine translation and sentiment analysis. Transformer have achieved significant success in machine translation, language generation and text classification. However, as Transformer models continue to widen and deepen, the computational and storage costs have also increased rapidly. This is primarily due to the self-attention mechanism involved in Transformer, which requires a substantial amount of matrix multiplication computations that are both time-consuming and memory-intensive.
Sparse Matrix Multiplication is a method that only computes and stores non-zero elements, reducing unnecessary computational and storage requirements. In Transformer models, the parameter matrices involved in the self-attention mechanism are often large and sparse. Therefore, effectively leveraging sparsity can enhance computational efficiency and reduce storage space. Gustavson is a row stationary data flow. Compared to inner product and outer product, Gustavson has the advantage of only requiring non-zero value and row vector multiplications. Different rows can be computed in parallel. Additionally, Gustavson calculates a row-sized output partial sum, resulting in lower storage costs.
This paper proposes a new high efficiency sparse matrix multiplication accelerator that utilizes the characteristics of sparse matrix multiplication and the row times row data flow to improve computational efficiency and reduce storage costs.
[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. CoRR, abs/1706.03762.
[2] Hegde, K., Yu, J., Agrawal, R., Yan, M., Pellauer, M., & Fletcher, C. W. (2018). UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition. CoRR, abs/1804.06508.
[3] E. Qin et al., "SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training," 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA, 2020, pp. 58-70, doi: 10.1109/HPCA47549.2020.00015.
[4] S. Pal et al., "OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vienna, 2018, pp. 724-736, doi: 10.1109/HPCA.2018.00067.
[5] Zhang, Z., Wang, H., Han, S., & Dally, W. J. (2020). SpArch: Efficient Architecture for Sparse Matrix Multiplication. CoRR, abs/2002.08947.
[6] Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J. S., Keckler, S. W., & Dally, W. J. (2017). SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.
[7] Hegde, K., Asghari-Moghaddam, H., Pellauer, M., Crago, N., Jaleel, A., Solomonik, E., Emer, J., & Fletcher, C. W. (2019). ExTensor: An Accelerator for Sparse Tensor Algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 319–333).
[8] Zhang, G., Attaluri, N., Emer, J. S., & Sanchez, D. (2021). Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 687–701).
[9] Ruck, D., Rogers, S., & Kabrisky, M. (1993, July). Feature Selection Using a Multilayer Perceptron. Journal of Neural Network Computing, 2.
[10] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. CoRR, abs/1512.03385.
[11] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25). Curran Associates, Inc.
[12] Jain, L. C., & Medsker, L. R. (1999). Recurrent Neural Networks: Design and Applications (1st ed.). CRC Press, Inc.