簡易檢索 / 詳目顯示

研究生: 歐潤陽
Ou, Jun-Yang
論文名稱: 新式稀疏矩陣相乘架構設計
New Sparse Matrix Multiplication Architecture Design
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 57
中文關鍵詞: 機器學習壓縮稀疏矩陣相乘
外文關鍵詞: Neural Networks, Matrix Multiplication, Sparse Matrix, Dynamic Scheduling, Recurrent Neural Network
相關次數: 點閱:75下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 神經網路已被廣泛應用在各種領域,傳統神經網路訓練需要大量有標籤訓練集,然而有標籤訓練集需要人工為數據進行標記,造成極高的成本,因此不需要人類介入,能夠自主學習的自主式神經網路成為未來發展的趨勢。而在神經網路的運算上,矩陣乘法是一種可以涵蓋所有的計算模式,廣泛應用於數據分析、圖形處理和科學計算等眾多應用領域,例如在卷積神經網路中實現捲積層最常用的方法便是將圖像擴展為矩陣形式,又或者像是語言翻譯的循環神經網路運算也能轉換成矩陣運算形式。為了有效地處理這些計算和內存密集型應用程序,可以利用張量的稀疏性進行壓縮,這有效地提供了顯著的加速和能耗下降。加上機器學習應用愈來愈多地計算稀疏數據,及大部分的數值為零的數據,本研究針對稀疏矩陣相乘的演算法進行研究,相別於一般的矩陣相乘資料流中內積與外積的運算方式,我們選擇使用矩陣間以列乘列的方式進行相乘相加,不像內積偏重輸出重用或向外積偏重輸入重用,而是達到輸入重用與輸出重用間的平衡。並針對資料運算的順序進行進一步的動態調度,別於一般演算法中依照座標順序執行,本研究提出動態的資料調度器可以針對整體資料的存取狀況,排序資料的運算順序使得資料存取的次數降到最低。並且在運算單元的設計上採用雙層並行化處理,加快運算效能。

    Neural networks have been widely used in various fields, such as convolutional neural networks for image recognition; or recurrent neural networks for language translation...etc. Because most of the operations of the neural network are combined by multiplication and addition, therefor it is a common method to convert neural network operations into matrix multiplication. Matrix multiplication is also widely used in data analysis, graphics processing and scientific computing...etc. However, sometimes there are many zero-valued datas, which are contained in the input data. In order to effectively handle these computation and memory-intensive applications, how to compress these input data that containing a large number of zero values to reduce operations is an important issue.
    In this paper, we research and optimize the sparse matrix multiplication, choose to use Gustavson's matrix multiplication data flow, and propose a new data dynamic scheduling mode to optimize the data operation order to maximize data reuse and greatly reduce The number of times the data was accessed. In addition, the multiplication and addition processing element array use a double-layer parallel processing method. The processing element array are already listed as units by row, and different rows operate independently and in parallel. In addition, there are several multipliers in the processing element row for parallel operation to speed up the operation performance.

    摘要 I SUMMARY II OUR PROPOSED DESIGN II EXPERIMENTS IV CONCLUSION V 誌謝 VI 目錄 VII 表目錄 VIII 圖目錄 VIII 第一章 緒論 1 1.1研究背景 1 1.2研究動機與目的 1 1.3論文架構 2 第二章 背景知識與相關研究 3 2.1多層感知器 3 2.2循環神經網絡 4 2.3卷積神經網路 9 2.4 Transformer 13 2.5 Transformer現今發展 22 2.6 Transformer 優化 23 第三章 矩陣相乘資料流與稀疏矩陣壓縮格式 25 3.1矩陣相乘資料流 25 3.2稀疏資料壓縮格式 30 第四章 新式列乘列稀疏矩陣相乘架構設計 32 4.1資料流 32 4.2 RRspMM整體架構介紹 34 4.3 資料調度器 36 4.4運算單元陣列設計 40 4.5 資料分塊 45 4.6階層式記憶體 46 4.7累加器複雜度比較 47 第五章 實驗結果與討論 48 5.1實驗方法與輸入輸出配置 48 5.2實驗結果 49 第六章 結論與未來展望 54 參考文獻 55

    [1] F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.
    [2] F. ROSENBLATT. Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Technical report, 1961.
    [3] Y. LeCun et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    [4] A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In NeurIPS, pp. 1097–1105, 2012.
    [5] D. E. Rumelhart et al. Learning internal representations by error propagation. Technical report, 1985.
    [6] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    [7] D. Bahdanau et al. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
    [8] A. Parikh et al. A decomposable attention model for natural language inference. In EMNLP, 2016.
    [9] A. Vaswani et al. Attention is all you need. In NeurIPS, 2017.
    [10] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
    [11] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in Advances in neural information processing systems, 2016, pp. 2074–2082
    [12] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: efficient inference engine on compressed deep neural network,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 2016, pp. 243–254.
    [13] Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher Fletcher. UCNN: Exploiting computational reuse in deep neural networks via weight repetition. In Proceedings of the 45th annual International Symposium on Computer Architecture (ISCA-45), 2018.
    [14] Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In Proceedings of the 24th IEEE international symposium on High Performance Computer Architecture (HPCA-24), 2018
    [15] MEDSKER, Larry R.; JAIN, L. C. Recurrent neural networks. Design and Applications, 5: 64-67, 2001.
    [16] ALBAWI, Saad; MOHAMMED, Tareq Abed; AL-ZAWI, Saad. Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET). Ieee, p. 1-6. 2017.
    [17] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin. Convolutional Sequence to Sequence Learning. Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1243-1252, 2017.
    [18] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.
    [19] 19. Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher Fletcher. UCNN: Exploiting computational reuse in deep neural networks via weight repetition. In Proceedings of the 45th annual International Symposium on Computer Architecture (ISCA-45), 2018.HUANG, Yanping, et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32: 103-112, 2019.
    [20] Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. SIGMA: A sparse andirregular GEMM accelerator with flexible interconnects for dnn training. In Proceedings of the 26th IEEE international symposium on High Performance Computer Architecture (HPCA-26), 2020.
    [21] Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In Proceedings of the 24th IEEE international symposium on High Performance Computer Architecture (HPCA-24), 2018.
    [22] Zhekai Zhang, Hanrui Wang, Song Han, and William J Dally. Sparch: Efficient architecture for sparse matrix multiplication. In Proceedings of the 26th IEEE international symposium on High Performance Computer Architecture (HPCA-26), 2020.
    [23] Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX), 2014.
    [24] Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W Keckler, and Joel Emer. Timeloop: A systematic approach to dnn accelerator evaluation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019
    [25] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-toend object detection with transformers. In European Conference on Computer Vision, pages 213–229. Springer, 2020
    [26] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
    [27] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122v2, 2017.
    [28] Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.

    無法下載圖示 校內:2027-08-09公開
    校外:2027-08-09公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE