簡易檢索 / 詳目顯示

研究生: 胡雨霖
Hu, Yu-Lin
論文名稱: 對於捲積神經網路通用型加速器研究與設計
General Accelerator Study and Design for Convolutional Neural Network
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 44
中文關鍵詞: 捲積神經網路系統設計FPGAASIC
外文關鍵詞: Convolutional Neural Networks (CNN), System Design, FPGA, ASIC
相關次數: 點閱:99下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 如何有效使用由科技發展所帶來的大數據,已然是現在研究的重要問題之一。其中捲積神經網路藉由其架構可拓展性、結構彈性與極低的錯誤率,成為目前研究的熱點之一。然而,最新的捲積神經網路就單次前向傳播的總體計算量往往超過數十億次,導致即便使用新型的高速通用處理器依然難以避免極長的計算延遲。雖然在桌面端已經出現使用顯示卡加速的解決方案,然而隨著嵌入式移動裝置的發展,對於在穿戴式裝置端具有快速處理捲積神經網路功能的加速器也越來越有其存在之必要。
    因此,我們針對目前捲積神經網路的架構與其在嵌入式系統上的加速器實作進行分析,並使用捲積神經網路在參數重用性與迴圈可調換性作為硬體平行設計的依據。最終提出一個可以協助處理多種捲積神經網路架構的通用加速器。並在實驗中以晶片上系統的方式進行驗證。與其他嵌入式設計的裝置相比,我們的設計具有較高的硬體使用率與較小的面積。

    The hardware design of Convolutional Neural Networks (CNN) facing the following problems: high complexity of computation, large amount of data movement and divergence to different neural network in structural domain. The previous work has dealt well with the first two problems but fail to take the third question in a wide consideration. After analyzing the state-to-art CNN accelerators and the design space they exploiting, we try to develop a format that can describe the full design space. Base on our design space exploration and hardware evaluation, we propose a novel general CNN hardware accelerator, which contain: hierarchical memory storage, variable length and width two-dimensional hardware processing unit set, and elastic data distributor. Our work shows higher multipliers usage in FPGA result compared with previous FPGA design. On the other hands, our work is as efficient as other two latest works in ASIC synthesis estimate.

    摘要 III General Accelerator Study and Design for Convolutional Neural Network IV SUMMARY IV INTRODUCTION IV GENERAL CNN ACCELERATOR OVERVIEW VI CONCLUSION IX 誌謝 X 目錄 XI 表目錄 XII 圖目錄 XIII 第一章 緒論 1 1.1研究背景 1 1.2研究動機與目的 2 1.3論文架構 3 第二章 背景知識與相關研究 4 2.1 捲積神經網路發展歷史 4 2.2 捲積神經網路之架構與特性 5 2.3 相關研究 10 第三章通用捲積型加速器之設計空間探討 13 3.1 捲積層平行度分析 13 3.2 硬體設計目標 19 3.3通用捲積加速器基礎計算架構與資料路徑 23 第四章通用捲積加速器架構 27 4.1通用捲積加速器概觀 27 4.2乘法單元集成 27 4.3加法單元集成 28 4.4控制單元架構 31 4.5記憶體單元架構 33 第五章 實驗環境與數據分析 36 5.1開發平台 36 5.2實驗方法與輸入輸出配置 37 5.3硬體合成結果 38 5.4實驗數據與結果分析 40 第六章 結論與未來展望 42 參考文獻 43

    [1] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” the 25th International Conference on Neural Information Processing Systems, Volume 1, pp.1097-1105, Dec. 2012.
    [2] R. M. French, “The Turing Test: The First Fifty Years,” Trends in Cognitive Sciences, 4(3), pp. 115-121, 2000.
    [3] D. H. Hubel, T. N. Wiesel, “Receptive Fields and Functional Architecture of Monkey Striate Cortex,” J. Physiol. (1968), 195, pp. 215-243, 1968.
    [4] K. Fukushima, “Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biol. Cybernetics 36, pp. 193-202, 1980.
    [5] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proce. of the IEEE, vol. 86, iss. 11, Nov. 1998.
    [6] G. E. Hinton, R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, iss. 5786, pp. 504-507, Jul. 2006.
    [7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going deeper with convolutions,” 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June, 2015.
    [8] R. K. Srivastava, K. Greff, J. Schmidhuber, “Training Very Deep Networks,” Dec., 2015.
    [9] K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition,” Microsoft Research, Dec. 2015.
    [10] Y. Ma, Y. Cao, S. Vrudhula, J. S. Seo, “Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks”.
    [11] Y. H. Chen, T. Krishna, Joel S. V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” IEEE Jour. of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017.
    [12] X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, J. Cong, “Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs,” Falcon Comp. Solutions Inc., Jun. 2017.
    [13] W. Lu, G. Yan, J. Li, S. Gong, Y. Han, X. Li, “FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks,” IEEE Intern. Sympos. on High Perfor. Compu. Architecture, pp. 553-564, Feb. 2017.
    [14] Y. J. Lin, T. S. Chang, “Data and Hardware Efficient Design for Convolutional Neural Network,” IEEE Transactions on Circuits and Systems–I: Regular Papers, vol. 65, no. 5, MAY 2018.
    [15] A. Azizimazreah, L. Chen, “Flexible On-chip Memory Architecture for DCNN Accelerators,” AIM, USA, Sep., 2017.
    [16] X. Yang, M. Gao, J. Pu, A. Nayak, Q. Liu, S. E. Bell, J. O. Setter, K. Caoy, H. Ha, C. Kozyrakis, M. Horowitz, “DNN Dataflow Choice Is Overrated,” Sep. 2018.
    [17] Terasic, “DE2i-150 Development Kit FPGA System User Manual,” 2013.
    [18] Altera, “Cyclone IV Device Handbook, Volume 1,” Altera Corporation, March 2016.

    下載圖示 校內:2022-07-30公開
    校外:2022-07-30公開
    QR CODE