| 研究生: |
陳聖諺 Chen, Sheng-Yen |
|---|---|
| 論文名稱: |
半自動捲積神經網路加速器之設計與研究 Semi-Automatic Design and Research of CNN Accelerators |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 52 |
| 中文關鍵詞: | CNN 、LegUp 、LLVM 、硬體加速器 、機器學習 |
| 外文關鍵詞: | CNN, LegUp, LLVM, hardware accelerator, machine learning |
| 相關次數: | 點閱:122 下載:20 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文使用LegUp實現了捲積神經網絡(Convolutional Neural Network, CNN)硬體加速器設計。隨著時代方展,人工智能已然成為目前熱門的研究領域,其中捲積神經網絡是目前深度神經網路(Deep Neural Network, DNN)領域的發展主力。
捲積神經網路是一種在傳統類神經網路基礎上發展出的機器學習模型。捲積神經網路很擅長處理圖像,例如:圖像分類、圖像搜索、人臉識別、以及可用在自動駕駛中的目標定位。但由於捲積神經網路中,有著大量的乘加運算,而為了加速運算,有很多不同的重用設計方式,不同的重用方式的硬體設計都需要經過繁瑣的分析,重新設計硬體。藉由使用LegUp 將C語言實現快速的轉換為硬體描述語言,實現快速生成捲積神經網絡硬體加速器,探討LegUp由高階語言轉換成硬體描述語言過程,使用LLVM將C語言轉成LLVM中間語言(LLVM Intermediate Representation,LLVM IR),再由LLVM後端程式LegUp將LLVM IR 轉成硬體描述語言。
我們使用LegUp比較不同捲積神經網絡方案對於效能的影響,我們將輸出重用、輸入重用和權重重用的CNN重用方案寫成C語言表達的通式,進而設計了7*7大小的PE,使用3種不同的輸入資料大小與2種不同大小的權重資料,比較其效能以及邏輯單元使用量比較。
This paper uses LegUp to implement the Convolutional Neural Network (Convolutional Neural Network, CNN) hardware accelerator design. As time goes by, artificial intelligence has become a hot research field, and convolutional neural network is one of the most important fields of research in deep neural network (Deep Neural Network, DNN) is a major player in the field.
CNN is a machine learning model developed on the basis of traditional neural networks. CNN are very good at processing images, such as: image classification, image search, face recognition, and target positioning that can be used in autonomous driving. However, there are a lot of multiply-add operations in the convolutional neural network, and in order to speed up the calculation, there are many different reuse design methods. The hardware design of different reuse methods needs to undergo tedious analysis and redesign the hardware. By using LegUp to quickly convert the C language to a hardware description language, to realize the rapid generation of a convolutional neural network hardware accelerator, to explore the process of LegUp from high-level language to hardware description language, use LLVM to convert C language to LLVM Intermediate Representation (LLVM Intermediate Representation, LLVM IR), and then LLVM backend program LegUp converts LLVM IR into a hardware description language.
We use LegUp to compare the performance impact of different convolutional neural network schemes, including output reuse, input reuse, and weight reuse. The reusable CNN reuse scheme is written as a general formula for C language expression, and then designed a 7*7 size PE, using 3 kinds of Different input data size and 2 different weight data, compare their performance and logical unit usage comparison.
[1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[2] C. Lattner and V. Adve, "LLVM: A compilation framework for lifelong program analysis & transformation," in International Symposium on Code Generation and Optimization, 2004. CGO 2004., 2004: IEEE, pp. 75-86.
[3] P. Chang. (2017). The LLVM Compiler Infrastructure [Online]. Available: https://medium.com/@zetavg/%E7%B7%A8%E8%AD%AF%E5%99%A8-llvm-%E6%B7%BA%E6%B7%BA%E7%8E%A9-42a58c7a7309.
[4] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 13, no. 4, pp. 451-490, 1991.
[5] Clang: a C language family frontend for LLVM [Online]. Available: https://clang.llvm.org/.
[6] LLVM optimizer [Online]. Available: https://llvm.org/docs/CommandGuide/opt.html.
[7] A. Sampson. (2015). LLVM for Grad Students [Online]. Available: https://www.cs.cornell.edu/~asampson/blog/llvm.html.
[8] A. Canis et al., "LegUp: high-level synthesis for FPGA-based processor/accelerator systems," in Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, 2011, pp. 33-36.
[9] J. Cong and Z. Zhang, "An efficient and versatile scheduling algorithm based on SDC formulation," in 2006 43rd ACM/IEEE Design Automation Conference, 2006: IEEE, pp. 433-438.
[10] A. Canis, S. D. Brown, and J. H. Anderson, "Modulo SDC scheduling with recurrence minimization in high-level synthesis," in 2014 24th International Conference on Field Programmable Logic and Applications (FPL), 2014: IEEE, pp. 1-8.
[11] C.-Y. Huang, Y.-S. Chen, Y.-L. Lin, and Y.-C. Hsu, "Data path allocation based on bipartite weighted matching," in Proceedings of the 27th ACM/IEEE Design Automation Conference, 1991, pp. 499-504.
[12] S. Albelwi and A. Mahmood, "A framework for designing the architectures of deep convolutional neural networks," Entropy, vol. 19, no. 6, p. 242, 2017.
[13] J. Cong and B. Xiao, "Minimizing computation in convolutional neural networks," in International conference on artificial neural networks, 2014: Springer, pp. 281-290.
[14] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315-323.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[16] Y.-L. Boureau, J. Ponce, and Y. LeCun, "A theoretical analysis of feature pooling in visual recognition," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 111-118.
[17] W. Lu, G. Yan, J. Li, S. Gong, Y. Han, and X. Li, "Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks," in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017: IEEE, pp. 553-564.