研究生: |
鄭傑升 Cheng, Chieh-Sheng |
---|---|
論文名稱: |
卷積神經網路之硬體加速器設計 Hardware Accelerator Design for Convolutional Neural Networks |
指導教授: |
卿文龍
Chin, Wen-Long |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 56 |
中文關鍵詞: | 深度學習 、卷積神經網路 、硬體加速 |
外文關鍵詞: | convolutional neural networks, hardware acceleration, ASIC design. |
相關次數: | 點閱:140 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自從AlexNet於2012年的ImageNet大規模視覺辨識大賽(ImageNet Large Scale Visual Recognition Competition, ILSVRC)中奪得桂冠後,卷積神經網路(convolutional neural network, CNN)開始如雨後春筍般地發展,不論是在原本的圖像辨識,抑或其他電腦視覺應用均有不錯的成果。但是因為其所需大量的計算以及龐大的資料存取量,使其計算時間非常冗長,所以許多專家學者開始致力於研究其硬體加速器設計。
本篇論文實現完整卷積神經網路,包含卷積層(convolutional layer)以及全連接層(fully connected layer)之硬體加速器。主要技術包括將卷積層與全連接層使用管線化(pipeline)計算、將二維的最大池化(max pooling)分為兩階段一維最大池化、以及提出一個讓乘加器能適用於各種大小的卷積核(kernel)的卷積計算方式。以AlexNet為例,此篇論文使用TSMC 90nm製程技術下,合成出2.3M邏輯閘數目。其中包含了105個乘加器與21.82KB的內部記憶體。在250Mhz的時脈頻率下,可以在AlexNet上達到35.24 frame per second (FPS)的輸出量。
We present an ASIC design to accelerate state-of-the-art convolutional neural networks (CNNs), including convolutional layer, pooling layer, fully connected layer, and activation function. The main ideas are using the pipeline calculation for the convolutional and the fully connected layers, dividing the two-dimensional maximum pooling (2D-maxpooling) into two-stage one-dimensional maximum pooling, and optimizing the multiplier-accumulator (MAC) utilization for various sizes of convolutional kernels. An implementation example fulfilled with TSMC 90 nm process for the AlexNet consumes a 2.3 M gate count for 105 MACs and a 21.82 kB internal memory , which achieves 35.24 frame per second (FPS) under 250 MHz clock frequency for the whole AlexNet.
[1] V. Gokhale, J. Jin, A. Dundar, B. Martini, E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks”, Proc. IEEE Conf. Computer. Vision and. Pattern Recognition. Workshops (CVPRW), pp. 696-701, Jun. 2014.
[2] C. Farabet, C. Poulet, J. Y. Han, Y. LeCun, “Cnp: An fpga-based processor for convolutional networks”, International Conference on Field Programmable Logic and Applications (FPL'09), pp. 112, September 2009.
[3] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, Y. LeCun, "NeuFlow: A runtime reconfigurable dataflow processor for vision", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp. 109-116, Jun. 2011.
[4] P.-H. Pham, D. Jelaca, C. Farabet, B. Martini, Y. LeCun, E. Culurciello, "NeuFlow: Dataflow vision processing system-on-a-chip", Proc. IEEE 55th Int. Midwest Symp. Circuits Syst., pp. 1044-1047, Aug. 2012.
[5] M. Peemen, A. A. A. Setio, B. Mesman, H. Corporaal, "Memory-centric accelerator design for convolutional neural networks", Proc. IEEE 31st Int. Conf. Comput. Design (ICCD), pp. 13-19, Oct. 2013.
[6] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks", Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, pp. 161-170, 2015.
[7] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, H. Yang, "Going deeper with embedded FPGA platform for convolutional neural network", Proc. FPGA, pp. 26-35, 2016.
[8] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, O. Temam, "DaDianNao: A machine-learning supercomputer", Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchitecture, pp. 609-622, 2014.
[9] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning", Proc. 19th Int. Conf. Archit. Support Program. Lang. Oper. Syst., pp. 269-284, 2014.
[10] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, O. Temam, "Shidiannao: shifting vision processing closer to the sensor", ISCA. ACM, pp. 92-104, 2015.
[11] D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Temam, X. Feng, X. Zhou, Y. Chen, "Pudiannao: A polyvalent machine learning accelerator", International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), pp. 369-381, 2015.
[12] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, "A high-throughput neural network accelerator", IEEE Micro, vol. 35, no. 3, pp. 24-32, May 2015.
[13] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, L. Benini, "Origami: A convolutional network accelerator", Proc. GLSVLSI, pp. 199-204, May 2015.
[14] L. Cavigelli, M. Magno, L. Benini, "Accelerating real-time embedded scene labeling with convolutional networks", Proc. DAC, pp. 108, 2015.
[15] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, W. J. Dally, "EIE: Efficient inference engine on compressed deep neural network", Proc. ISCA, pp. 243-254, Aug 2016.
[16] J. Sim, J.-S. Park, M. Kim, D. Bae, Y. Choi, L.-S. Kim, "A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 264-265, Feb. 2016.
[17] Y.-H. Chen, T. Krishna, J. S. Emer, V. Sze, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 262-263, Feb. 2016.
[18] Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks", Proc. ISCA, pp. 367-379, 2016.
[19] O. Russakovsky et al., "ImageNet large scale visual recognition challenge", Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, Dec. 2015.
[20] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, "Gradient-based learning applied to document recognition", Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[21] Y. LeCun, et al. (1998). The MNIST database. Available: http://yann.lecun.com/exdb/mnist/
[22] A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet classification with deep convolutional neural networks", Proc. Adv. Neural Inf. Process. Syst., vol. 25, pp. 1097-1105, 2012.
[23] M. D. Zeiler, R. Fergus, "Visualizing and understanding convolutional networks", Proc. Eur. Conf. Comput. Vis., pp. 818-833, 2014.
[24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for largescale image recognition," CoRR, 2014
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE CVPR, Jun. 2015, pp. 1–9.
[26] M. Lin, Q. Chen, S. Yan, "Network in network", CoRR, 2013.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE CVPR, Jun. 2016, pp. 770–778.
[28] A. Canziani, A. Paszke, E. Culurciello, "An analysis of deep neural network models for practical applications", CoRR, pp. 1-7, May 2016.
[29] Y.-J. Lin, T. S. Chang, "Data and hardware efficient design for convolutional neural network", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 65, no. 5, pp. 1642-1651, May 2018.
[30] S. Chintala (2015). convnet-benchmarks. Available: http://github.com/soumith/convnet-benchmarks
[31] N. Suda, V. Chandra, "Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks", ACM International Symposium on Field-Programmable Gate Arrays, pp. 16-25, 2016
[32] C. Zhang, Z. Fang, P. Zhou, P. Pan, J. Cong, "Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks", ICCAD, 2016.
[33] W. Lu, G. Yan, J. Li, S. Gong, Y. Han, X. Li, "FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks", Proc. HPCA, pp. 553-564, 2017.