簡易檢索 / 詳目顯示

研究生: 余誌偉
Yu, Chih-Wei
論文名稱: 適用於深度神經網路推論之嵌入式異質多核心架構之設計與實作
The Design and Implementation of a Heterogeneous Multi-core Embedded Architecture for Deep Neural Network Inference
指導教授: 侯廷偉
Hou, Ting-Wei
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 66
中文關鍵詞: 神經網路加速異質多核心嵌入式系統
外文關鍵詞: Neural acceleration, Heterogeneous multi-core, Embedded system
相關次數: 點閱:61下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   為提升嵌入式裝置針對深度神經網路推論(DNN inference)之效能所提出的設計透過NCU (Neural Computing Unit)軟協同處理器(soft co-processor)輔助嵌入式通用型處理器(general-purpose embedded processor)之不足,構成之異質多核心架構已實作於開發板。

      有別於過去已被提出的深度神經網路硬體加速方案,NCU以指令為基礎的運作方式使其在處理不同結構的深度神經網路模型更加彈性,並可獨立運行推論主程序而不需透過硬處理器系統(Hard Processor System, HPS)控制。NCU執行階段函式庫 (runtime library)可產生對應的指令,而NCU模型轉換器可將Keras已訓練之模型檔轉換為NCU模型檔。硬處理器系統運行嵌入式Linux─Angstrom,並已實作與其相容之驅動程式處理硬體相依之操作。

      效能評估方式採計模型推論之運行時間,於伺服器預先訓練十二個深度神經網路模型,並於選定之嵌入式平台上運行推論,用於比較之平台包含樹梅派3 Model B + 與NVIDIA Jetson TX2開發板,與TX2相比評估之結果,所實作之硬體提升了1.5至8.7倍。

     The main target of the proposed design is to increase the performance of DNN inference on embedded devices by adding a soft co-processor, Neural Computing Unit (NCU), to a general-purpose embedded processor. The heterogeneous multi-core platform has been implemented on a development kit.

     The design in this article differs from other efforts in the instruction-based co-processor. The NCU directly executed the stated instructions, and hence making the inference on different models more flexible. Furthermore, the design also makes the inference procedure be mainly performed by the NCU without control of the Hard Processor System (HPS). The instructions can be generated by the NCU runtime. The NCU converter provides the conversion from Keras pre-trained model files to NCU model files. Embedded Linux, Angstrom, is running on the HPS, and the NCU driver has been implemented to handle all hardware-dependent operations.

     To evaluate the performance of the proposed platform, twelve DNN models are pre-trained by Keras on the server and are deployed onto the selected embedded platforms to perform the inference. The benchmark is obtained according to the execution time of the inference. For comparison, Raspberry Pi 3 Model B + and NVIDIA Jetson TX2 Developer Kit are used in the evaluation. The implemented hardware performs the DNN model inferences efficiently with the speedup of 1.5 to 8.7 times comparing with the TX2.

    Chapter 1. Introduction 1 Chapter 2. Related Work 3  2.1 Deep Neural Networks 3  2.2 DNN Inference on Embedded Devices 5  2.3 DNN Inference on FPGAs 5 Chapter 3. System Architecture 7  3.1 SoC Block Diagram 7  3.2 Software Block Diagram 8 Chapter 4. Instruction Set Architecture Design 10  4.1 Accelerate DNN Inference 10  4.2 Shared Memory 11  4.3 Design of Data Banks 12  4.4 Interaction between the HPS and the NCU 14  4.5 NCU Instruction Set 16  4.6 Memory-mapped Resources Overview 24  4.7 Data Regs 24  4.8 Config Regs 25  4.9 IRAM 27 Chapter 5. Hardware Design and Implementation 28  5.1 Design of the NCU 28  5.2 Design of the NC Cell 29  5.3 Implementation-specific Description 33  5.4 DNN Inference Examples on the Platform 37 Chapter 6. Software Design and Implementation 48  6.1 The NCU Model File and the NCU Model Converter 48  6.2 The NCU Driver 50  6.3 The NCU Runtime 51 Chapter 7. Evaluation 56  7.1 Method 56  7.2 Comparison to Existing Embedded Platforms 56  7.3 Experimental Results 58 Chapter 8. Conclusion 62 References 64

    [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.

    [2] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Proc. Advances in Neural Information Processing Systems, 2014, pp. 568–576.

    [3] S. S. Farfade, M. J. Saberian, and L. Li, “Multi-view face detection using deep convolutional neural networks,” in Proc. ACM International Conference on Multimedia Retrieval, 2015, pp. 643–650.

    [4] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” J. Mach. Learn. Res., vol. 12 pp. 2493–2537, Aug. 2011.

    [5] Terasic, “DE10-Nano Kit.” Terasic - SoC Platform - Cyclone - DE10-Nano Kit. https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=1046 (accessed Sep. 2, 2019).

    [6] T. Wang, C. Wang, X. Zhou, H. Chen, “A survey of FPGA based deep learning accelerators: challenges and opportunities,” 2018, arXiv: 1901.04988v1.

    [7] A. Shawahna, S. M. Sait, A. El-Maleh, “FPGA-based accelerators of deep learning networks for learning and classification: A review,” in IEEE Access, vol. 7, pp. 7823–7859, 2019.

    [8] Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks,” in Proc. International Conference on Field Programmable Logic and Applications, 2017, pp. 1–8.

    [9] Y. Ma, N. Suda, J.-s. Seo, Y. Cao, and S. Vrudhula, “Scalable and modularized RTL compilation of convolutional neural networks onto FPGA,” in Proc. International Conference on Field Programmable Logic and Applications, 2016, pp. 1–8.

    [10] S. I. Venieris and C. S. Bouganis, “fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs,” in Proc. IEEE International Symposium on Field-Programmable Custom Computing Machines, 2016, pp. 40–47.

    [11] Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong, “FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates,” in Proc. IEEE International Symposium on Field-Programmable Custom Computing Machines, 2017, pp. 152–159.

    [12] H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Misra, and H. Esmaeilzadeh, “From high-level deep neural models to fpgas,” in IEEE/ACM International Symposium on Microarchitecture, 2016.

    [13] Josh Patterson and Adam Gibson, Deep Learning. Sebastopol, CA, USA: O’Reilly Media, 2017.

    [14] Nikhil Buduma and Nicholas Locascio, Fundamentals of Deep Learning. Sebastopol, CA, USA: O’Reilly Media, 2017.

    [15] Rodolfo Giometti, Linux Device Driver Development Cookbook. Birmingham, UK: Packt Publishing, 2019.

    [16] Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman, Linux Device Drivers, Third Edition. Sebastopol, CA, USA: O’Reilly Media, 2005.

    [17] Terasic. DE10-Nano User manual. (2017). [Online]. Available: https://www.terasic.com.tw/cgi-bin/page/archive_download.pl?Language=English&No=1046&FID=1c19d1d50e0ee9b21678e881004f6d81

    [18] Intel. Cyclone V Hard Processor System Technical Reference Manual. (2018). [Online]. Available: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/cyclone-v/cv_54001.pdf

    [19] Intel. Floating-Point IP Cores User Guide. (2016). [Online]. Available: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_altfp_mfug.pdf

    [20] Intel. AN 812: Platform Designer System Design Tutorial. (2018). [Online]. Available: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an812.pdf

    [21] ARM. AMBA AXI and ACE Protocol Specification. (2013). [Online]. Available: https://static.docs.arm.com/ihi0022/d/IHI0022D_amba_axi_protocol_spec.pdf

    [22] Intel. Avalon Interface Specifications. (2018). [Online]. Available: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf

    [23] ARM. ARMv7-M Architecture Reference Manual. (2018). [Online]. Available: https://static.docs.arm.com/ddi0403/e/DDI0403E_d_armv7m_arm.pdf

    [24] Raspberry Pi Foundation, “Raspberry Pi 3 Model B+.” Buy a Raspberry Pi 3 Model B+ – Raspberry Pi. https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/ (accessed Sep. 2, 2019).

    [25] Dustin Franklin, “NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge.” NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge | NVIDIA Developer Blog. https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/ (accessed Sep. 2, 2019).

    無法下載圖示 校內:2024-01-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE