簡易檢索 / 詳目顯示

研究生: 林柏榕
Lin, Bo-Rong
論文名稱: 電子系統層級虛擬平台之卷積加速器設計與驗證
An ESL(electronic system level)virtual platform for convolution accelerator design and verification
指導教授: 陳中和
Chen, Chung-Ho
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 53
中文關鍵詞: 電子系統層級設計類神經網路加速器資料切割資料重用
外文關鍵詞: Electronic System Level design, Neural network accelerator, Data partition, Data reuse
相關次數: 點閱:143下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來AI(Artificial Intelligence)的蓬勃發展,使深度類神經網路(Deep Neural Networks, DNNs)廣泛應用於各個領域,而AI(Artificial Intelligence)與IoT(Internet of Things)的結合使AI的應用朝向嵌入式系統發展;但是DNN在運算過程中會面臨大量的資料搬移與計算複雜度的問題,因此對於嵌入式系統的功率消耗及效能會是巨大的挑戰。為了解決此問題當前的研究主要是朝著FPGA、ASIC硬體加速器進行探討。
    本論文基於Micro Darknet for Inference框架來開發ESL虛擬平台,其平台主要針對嵌入式系統來開發加速器之微架構並驗證其正確性,透過ESL的概念能使開發者在開發初期進行軟硬體協同模擬,以達到快速開發與驗證的目的。我們使用MDFI分析YOLOv3-tiny模型的每一層執行時間,根據我們的觀察在整個執行時間中有93%是執行卷積運算,因此我們設計卷積加速器來加速卷積運算。其加速器也提供一些可重新配置的參數,其可配置的參數有PE個數、加速器的記憶體大小、加速器上資料重用的方法,使開發者能夠快速開發加速器與驗證並對其進行效能評估。
    最後我們於ESL虛擬平台與Raspberry Pi3上執行YOLOv3-tiny模型來驗證卷積加速器的行為正確性,並假設每一層的資料都能一次載入至加速器上的記憶體之後進行效能的評估,結果顯示ESL虛擬平台的執行時間快了Raspberry Pi3大約2.3x左右。

    In recent years, Deep Neural Networks (DNNs) have been successfully applied to many computer visions. However, DNN needs to face a lot of data movements and computational complexities in the calculation process, so it will be a huge challenge for power consumption and performance.
    In this paper, we propose an ESL virtual platform based on MDFI (Micro Darknet for Inference) for convolution accelerator design and verification. In order to quickly develop accelerators and perform verification in the early stages of development, we assume that the data for each layer of the model can be loaded into the memory on the accelerator at a time and compared with the Raspberry Pi3. The result shows that the execution time of the ESL virtual platform is about 2.3x faster than the Raspberry Pi3.

    摘要 I 誌謝 VIII 目錄 IX 表目錄 XII 圖目錄 XIII 第1章 序論 1 1.1 論文動機 2 1.2 論文貢獻 2 1.3 論文架構 3 第2章 背景知識與相關研究 4 2.1 Electronics System Level design 4 2.1.1 Simulation Accuracy 5 2.1.2 SystemC 6 2.2 QEMU 7 2.2.1 Portable Dynamic Translation 7 2.2.2 Translation Block 8 2.2.3 Peripheral Model 9 2.2.4 Interrupt Handler 9 2.3 Linux Device Driver 10 2.3.1 Classes of devices and modules 11 2.3.2 I/O ports and I/O memory 11 2.4 MDFI 12 2.5 Computer Vision 14 2.5.1 Image Classification 14 2.5.2 Object Detection 16 2.6 加速器架構與DNN硬體加速器 18 2.6.1 加速器架構 18 2.6.2 DNN硬體加速器 20 2.7 Data reuse method 21 2.7.1 No local reuse 21 2.7.2 Input reuse 22 2.7.3 Filter reuse 23 2.7.4 Output reuse 24 第3章 虛擬平台與卷積加速器設計與實現 25 3.1 虛擬平台介紹 25 3.2 分析模型時間瓶頸 26 3.3 修改MDFI使其符合平台需求 27 3.4 卷積加速器驅動程式設計 28 3.4.1 User space driver 28 3.4.2 Kernel space driver 29 3.5 QEMU與SystemC溝通介面設計 30 3.6 卷積加速器設計 31 3.6.1 Configuration register 32 3.6.2 On-chip memory 32 3.6.3 Controller 33 3.6.4 PE Architecture 34 3.7 Control flow 35 3.7.1 Filter size 1x1 35 3.7.2 Filter size 3x3 36 3.7.3 Other filter size 37 3.8 Execution flow 38 3.8.1 MDFI 38 3.8.2 CAS_Driver 39 3.8.3 Accelerator 39 第4章 實驗結果與效能評估 40 4.1 Verification result of different NN model 40 4.1.1 Object detection – YOLOv3-tiny 41 4.1.2 Image classification – AlexNet and ResNet18 42 4.2 Verification different data reuse of NN model on VP 43 4.2.1 Input reuse 43 4.2.2 Filter reuse 43 4.2.3 Output reuse 44 4.3 Performance evaluation 45 4.3.1 Time of execution other layers on QEMU 45 4.3.2 Time of bus transfer data 46 4.3.3 Time of Accelerator operation 47 4.3.4 Comparisons performance and analysis 49 第5章 結論與未來展望 50 5.1 結論 50 5.2 未來目標 50 參考文獻 52

    [1] Y. H. Chen, T. Krishna, J. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Curcyuts(JSSC), vol. 52, no. 1, pp. 127-138, Jan. 2017.
    [2] S.Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016, vol. 16, pp. 243–254, 2016
    [3] Nvdla deep learning accelerator. http://nvdla.org, 2017.
    [4] A. Gerstlauer , C. Haubelt , A. D. Pimentel , T. P. Stefanov , D. D. Gajski , and J.Teich, “Electronic System-Level Synthesis Methodologies,” IEEE Trans. Comput. Des. Integr. Circuits and Syst., vol. 28, no. 10, pp. 1517-1530, 2009.
    [5] G. Schirner, A. Gerstlauer, and R. Domer, “Fast and Accurate Processor Models for Efficient MPSoC Design,” ACM TODAES, vol. 15, Iss. 2, Article 10, Feb. 2010.
    [6] B. Fabrice, “QEMU, a Fast and Portable Dynamic Translator,” Proceeding of USENIX Annual Technical Conference, pp. 41-46, 2005.
    [7] T. Wei-Chung, “Layer-wise Fixed Point Quantization for Deep Convolutional Neural Networks and Implementation of YOLOv3 Inference Engine/深度卷積網路之逐層定點數量化方法與實作YOLOv3推論引擎,” Natl. Cheng K. Univ. - NCKU, 2019.
    [8] J. Min-Zhi, “Optimization of YOLOv3 Inference Engine for Edge Device/優化YOLOv3推論引擎並實現於終端裝置,” Natl. Cheng K. Univ. - NCKU, 2018.
    [9] J. Redmon, “Darknet: Open source neural networks in c.,” 2013. [Online]. Available: http://pjreddie.com/darknet/..
    [10] G. E. H.Krizhevsky, Alex, Ilya Sutskever, “ImageNet Classification with Deep Convolutional Neural Networks,” J. Geotech. Geoenvironmental Eng., vol. 12, p. 04015009, 2015.
    [11] K. Simonyan, A. Zisserman, “Very Deep Convolutional NetWorks for Large-Scale Image Recognition,” In ICLR, 2015.
    [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Comput. Vis. Pattern Recognit., pp. 770–778, 2016.
    [13] J. Redmon, S.Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2015.
    [14] Guo, et al., “Angel-Eye:A Complete Design Flow for Mapping CNN onto Embedded FPGA,” IEEE Trans, Computer Aided Design of Integrated Circuits and Systems(TCAD), DOI 10.1109/TCAD,2017.
    [15] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. “DaDianNao: A Machine-Learning Supercomputer,” Proc. IEEE/ACM Intl. Microarchitecture (MICRO), 2015, pp. 609-622
    [16] Z. Du, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” Proc. Intl. Symp. Computer Architecture (ISCA), pp. 82-1024, 2015.
    [17] S. Zhang, etc al., “Cambricon-X: An accelerator for sparse neural networks,” Proc. IEEE/ACM Intl. Microarchitecture (MICRO), 2016.

    下載圖示 校內:2024-01-01公開
    校外:2024-01-01公開
    QR CODE