研究生: |
胡雨霖 Hu, Yu-Lin |
---|---|
論文名稱: |
對於捲積神經網路通用型加速器研究與設計 General Accelerator Study and Design for Convolutional Neural Network |
指導教授: |
周哲民
Jou, Jer-Min |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 捲積神經網路 、系統設計 、FPGA 、ASIC |
外文關鍵詞: | Convolutional Neural Networks (CNN), System Design, FPGA, ASIC |
相關次數: | 點閱:99 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
如何有效使用由科技發展所帶來的大數據,已然是現在研究的重要問題之一。其中捲積神經網路藉由其架構可拓展性、結構彈性與極低的錯誤率,成為目前研究的熱點之一。然而,最新的捲積神經網路就單次前向傳播的總體計算量往往超過數十億次,導致即便使用新型的高速通用處理器依然難以避免極長的計算延遲。雖然在桌面端已經出現使用顯示卡加速的解決方案,然而隨著嵌入式移動裝置的發展,對於在穿戴式裝置端具有快速處理捲積神經網路功能的加速器也越來越有其存在之必要。
因此,我們針對目前捲積神經網路的架構與其在嵌入式系統上的加速器實作進行分析,並使用捲積神經網路在參數重用性與迴圈可調換性作為硬體平行設計的依據。最終提出一個可以協助處理多種捲積神經網路架構的通用加速器。並在實驗中以晶片上系統的方式進行驗證。與其他嵌入式設計的裝置相比,我們的設計具有較高的硬體使用率與較小的面積。
The hardware design of Convolutional Neural Networks (CNN) facing the following problems: high complexity of computation, large amount of data movement and divergence to different neural network in structural domain. The previous work has dealt well with the first two problems but fail to take the third question in a wide consideration. After analyzing the state-to-art CNN accelerators and the design space they exploiting, we try to develop a format that can describe the full design space. Base on our design space exploration and hardware evaluation, we propose a novel general CNN hardware accelerator, which contain: hierarchical memory storage, variable length and width two-dimensional hardware processing unit set, and elastic data distributor. Our work shows higher multipliers usage in FPGA result compared with previous FPGA design. On the other hands, our work is as efficient as other two latest works in ASIC synthesis estimate.
[1] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” the 25th International Conference on Neural Information Processing Systems, Volume 1, pp.1097-1105, Dec. 2012.
[2] R. M. French, “The Turing Test: The First Fifty Years,” Trends in Cognitive Sciences, 4(3), pp. 115-121, 2000.
[3] D. H. Hubel, T. N. Wiesel, “Receptive Fields and Functional Architecture of Monkey Striate Cortex,” J. Physiol. (1968), 195, pp. 215-243, 1968.
[4] K. Fukushima, “Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biol. Cybernetics 36, pp. 193-202, 1980.
[5] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proce. of the IEEE, vol. 86, iss. 11, Nov. 1998.
[6] G. E. Hinton, R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, iss. 5786, pp. 504-507, Jul. 2006.
[7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going deeper with convolutions,” 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June, 2015.
[8] R. K. Srivastava, K. Greff, J. Schmidhuber, “Training Very Deep Networks,” Dec., 2015.
[9] K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition,” Microsoft Research, Dec. 2015.
[10] Y. Ma, Y. Cao, S. Vrudhula, J. S. Seo, “Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks”.
[11] Y. H. Chen, T. Krishna, Joel S. V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” IEEE Jour. of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017.
[12] X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, J. Cong, “Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs,” Falcon Comp. Solutions Inc., Jun. 2017.
[13] W. Lu, G. Yan, J. Li, S. Gong, Y. Han, X. Li, “FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks,” IEEE Intern. Sympos. on High Perfor. Compu. Architecture, pp. 553-564, Feb. 2017.
[14] Y. J. Lin, T. S. Chang, “Data and Hardware Efficient Design for Convolutional Neural Network,” IEEE Transactions on Circuits and Systems–I: Regular Papers, vol. 65, no. 5, MAY 2018.
[15] A. Azizimazreah, L. Chen, “Flexible On-chip Memory Architecture for DCNN Accelerators,” AIM, USA, Sep., 2017.
[16] X. Yang, M. Gao, J. Pu, A. Nayak, Q. Liu, S. E. Bell, J. O. Setter, K. Caoy, H. Ha, C. Kozyrakis, M. Horowitz, “DNN Dataflow Choice Is Overrated,” Sep. 2018.
[17] Terasic, “DE2i-150 Development Kit FPGA System User Manual,” 2013.
[18] Altera, “Cyclone IV Device Handbook, Volume 1,” Altera Corporation, March 2016.