簡易檢索 / 詳目顯示

研究生: 陳聖諺
Chen, Sheng-Yen
論文名稱: 半自動捲積神經網路加速器之設計與研究
Semi-Automatic Design and Research of CNN Accelerators
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 52
中文關鍵詞: CNNLegUpLLVM硬體加速器機器學習
外文關鍵詞: CNN, LegUp, LLVM, hardware accelerator, machine learning
相關次數: 點閱:122下載:20
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文使用LegUp實現了捲積神經網絡(Convolutional Neural Network, CNN)硬體加速器設計。隨著時代方展,人工智能已然成為目前熱門的研究領域,其中捲積神經網絡是目前深度神經網路(Deep Neural Network, DNN)領域的發展主力。
    捲積神經網路是一種在傳統類神經網路基礎上發展出的機器學習模型。捲積神經網路很擅長處理圖像,例如:圖像分類、圖像搜索、人臉識別、以及可用在自動駕駛中的目標定位。但由於捲積神經網路中,有著大量的乘加運算,而為了加速運算,有很多不同的重用設計方式,不同的重用方式的硬體設計都需要經過繁瑣的分析,重新設計硬體。藉由使用LegUp 將C語言實現快速的轉換為硬體描述語言,實現快速生成捲積神經網絡硬體加速器,探討LegUp由高階語言轉換成硬體描述語言過程,使用LLVM將C語言轉成LLVM中間語言(LLVM Intermediate Representation,LLVM IR),再由LLVM後端程式LegUp將LLVM IR 轉成硬體描述語言。
    我們使用LegUp比較不同捲積神經網絡方案對於效能的影響,我們將輸出重用、輸入重用和權重重用的CNN重用方案寫成C語言表達的通式,進而設計了7*7大小的PE,使用3種不同的輸入資料大小與2種不同大小的權重資料,比較其效能以及邏輯單元使用量比較。

    This paper uses LegUp to implement the Convolutional Neural Network (Convolutional Neural Network, CNN) hardware accelerator design. As time goes by, artificial intelligence has become a hot research field, and convolutional neural network is one of the most important fields of research in deep neural network (Deep Neural Network, DNN) is a major player in the field.
    CNN is a machine learning model developed on the basis of traditional neural networks. CNN are very good at processing images, such as: image classification, image search, face recognition, and target positioning that can be used in autonomous driving. However, there are a lot of multiply-add operations in the convolutional neural network, and in order to speed up the calculation, there are many different reuse design methods. The hardware design of different reuse methods needs to undergo tedious analysis and redesign the hardware. By using LegUp to quickly convert the C language to a hardware description language, to realize the rapid generation of a convolutional neural network hardware accelerator, to explore the process of LegUp from high-level language to hardware description language, use LLVM to convert C language to LLVM Intermediate Representation (LLVM Intermediate Representation, LLVM IR), and then LLVM backend program LegUp converts LLVM IR into a hardware description language.
    We use LegUp to compare the performance impact of different convolutional neural network schemes, including output reuse, input reuse, and weight reuse. The reusable CNN reuse scheme is written as a general formula for C language expression, and then designed a 7*7 size PE, using 3 kinds of Different input data size and 2 different weight data, compare their performance and logical unit usage comparison.

    摘要 III Semi-Automatic Design and Research of CNN Accelerators IV ABSTRATE IV Introduction V OUR DESIGNS V EXPERIMENTS VII CONCLUSION IX 致謝 X 圖目錄 XV 表目錄 XVII 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 1 1.3 論文架構 2 第2章 背景知識與相關研究 3 2.1 LLVM簡介 3 2.1.1 LLVM編譯器架構 3 2.1.2 LLVM IR 4 2.1.3 LLVM 前端 4 2.1.4 LLVM 中端 5 2.1.5 LLVM 後端 5 2.2 LegUp簡介 6 2.2.1 LegUp Flow 6 2.3 捲積神經網路架構 8 2.3.1 捲積層(Convolutional layer, CONV Layer) 9 2.3.2 池化層(Pooling layer, PL) 11 2.3.3 全連接層(Fully conneted layer, FCN) 11 第3章 CNN捲積層重用探索 12 3.1 Padding 單元 12 3.2 CNN捲積層數據共享特徵 13 3.3 捲積層輸出重用(CONV Output Reuse) 14 3.3.1 CNN輸出重用示意圖 14 3.3.2 CNN輸出重用硬體運算時序表 15 3.3.3 CNN輸出重用PE架構圖 16 3.4 捲積層輸入重用(CONV Input Reuse) 16 3.4.1 CNN輸入重用示意圖 17 3.4.2 CNN輸入重用資料流 18 3.4.3 CNN輸入重用硬體運算時序表 19 3.4.4 CNN輸入重用PE架構圖 19 3.5 捲積層權重重用(CONV Weight Reuse) 20 3.5.1 CNN權重重用示意圖 21 3.5.2 CNN權重重用硬體運算時序表 22 3.5.3 CNN權重重用PE架構圖 23 第4章 硬體設計 24 4.1 C 程式碼For迴圈改寫 24 4.2 捲積層使用7*7個PE之輸出重用設計 24 4.2.1 資料使用分析 25 4.2.2 輸出重用虛擬碼 26 4.2.3 輸出重用控制流狀態圖 27 4.2.4 輸出重用硬體架構圖 28 4.3 捲積層使用7*7個PE之輸入重用設計 29 4.3.1 資料使用分析 29 4.3.2 輸入重用虛擬碼 30 4.3.4 輸入重用控制流狀態圖 32 4.3.5 輸入重用硬體架構圖 34 4.4 捲積層使用7*7個PE之權重重用設計 34 4.4.1 資料使用分析 34 4.4.2 權重重用虛擬碼 36 4.4.3 CNN權重重用控制流狀態圖 37 4.4.4 權重重用硬體架構 38 第5章 實驗環境與數據分析 39 5.1 開發平台 39 5.2 實驗方法與輸入輸出配置 40 5.3 LegUp生成Verilog Code修改 41 5.4 實驗結果 42 5.4.1 CNN輸出重用硬體合成結果 42 5.4.2 CNN輸入重用硬體合成結果 43 5.4.3 CNN權重重用硬體合成結果 43 5.5 理論通量算法 44 5.6 輸出通量比較 46 5.6.1 理論與合成前輸出通量比較 46 5.6.2 理論與合成後輸出通量比較 47 5.6.3 理論輸出通量與實際情況分析 47 5.6.4 三種重用架構比較與輸出通量分析 48 5.6.5 三種重用輸出通量分析與其他研究比較 49 第6章 結論與未來展望 50 參考文獻 51

    [1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
    [2] C. Lattner and V. Adve, "LLVM: A compilation framework for lifelong program analysis & transformation," in International Symposium on Code Generation and Optimization, 2004. CGO 2004., 2004: IEEE, pp. 75-86.
    [3] P. Chang. (2017). The LLVM Compiler Infrastructure [Online]. Available: https://medium.com/@zetavg/%E7%B7%A8%E8%AD%AF%E5%99%A8-llvm-%E6%B7%BA%E6%B7%BA%E7%8E%A9-42a58c7a7309.
    [4] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 13, no. 4, pp. 451-490, 1991.
    [5] Clang: a C language family frontend for LLVM [Online]. Available: https://clang.llvm.org/.
    [6] LLVM optimizer [Online]. Available: https://llvm.org/docs/CommandGuide/opt.html.
    [7] A. Sampson. (2015). LLVM for Grad Students [Online]. Available: https://www.cs.cornell.edu/~asampson/blog/llvm.html.
    [8] A. Canis et al., "LegUp: high-level synthesis for FPGA-based processor/accelerator systems," in Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, 2011, pp. 33-36.
    [9] J. Cong and Z. Zhang, "An efficient and versatile scheduling algorithm based on SDC formulation," in 2006 43rd ACM/IEEE Design Automation Conference, 2006: IEEE, pp. 433-438.
    [10] A. Canis, S. D. Brown, and J. H. Anderson, "Modulo SDC scheduling with recurrence minimization in high-level synthesis," in 2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014: IEEE, pp. 1-8.
    [11] C.-Y. Huang, Y.-S. Chen, Y.-L. Lin, and Y.-C. Hsu, "Data path allocation based on bipartite weighted matching," in Proceedings of the 27th ACM/IEEE Design Automation Conference, 1991, pp. 499-504.
    [12] S. Albelwi and A. Mahmood, "A framework for designing the architectures of deep convolutional neural networks," Entropy, vol. 19, no. 6, p. 242, 2017.
    [13] J. Cong and B. Xiao, "Minimizing computation in convolutional neural networks," in International conference on artificial neural networks, 2014: Springer, pp. 281-290.
    [14] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315-323.
    [15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
    [16] Y.-L. Boureau, J. Ponce, and Y. LeCun, "A theoretical analysis of feature pooling in visual recognition," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 111-118.
    [17] W. Lu, G. Yan, J. Li, S. Gong, Y. Han, and X. Li, "Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks," in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017: IEEE, pp. 553-564.

    下載圖示 校內:2021-08-01公開
    校外:2021-08-01公開
    QR CODE