| 研究生: |
黃啟翔 Huang, Chi-Hsiang |
|---|---|
| 論文名稱: |
降低類神經網路卷積層計算複雜度之硬體設計 Hardware Design for Reducing Computational Complexity of Convolution Layer in Artificial Neural Networks |
| 指導教授: |
卿文龍
Chin, Wen-Long |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 深度學習 、卷積神經網路 、硬體加速 、能量效率 |
| 外文關鍵詞: | deep learning, convolution neural network, hardware accelerator, energy efficiency |
| 相關次數: | 點閱:211 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在2012年的ImageNet大規模視覺辨識大賽 (ImageNet Large Scale Visual Recognition Competition, ILSVRC)中多倫多大學的Alex Krizhevsky以top 5正確率高於第二名10.9 %奪得桂冠後,卷積神經網路 (convolution neural network, CNN)如雨後春筍般地蓬勃發展。不論是在原本的圖像辨識,抑或其他電腦視覺應用均有不錯的成果。但是隨著網路模型越做越深,計算成本不斷疊加,又因可穿戴裝置、自主機器人、物聯網 (IoT)設備等眾多應用興起,對於深度學習的需求也逐漸提高。如何在那些低功耗設備上有效應用卷積神經網路的模型成為焦點議題。
本篇論文實現類神經網路之卷積層的硬體設計,以達到降低計算複雜度進而節省功耗的效果。本篇論文使用TSMC 90nm製程技術下,合成出323k邏輯閘數目。在200Mhz的時脈頻率下,可以在AlexNet上達到39.21 frame per second (FPS)的輸出量。
In the 2012 ImageNet Large Scale Visual Recognition Competition (ILSVRC), Alex Krizhevsky from Toronto University won the champion with a top 5 accuracy rate 10.9% higher than the second place. Convolution neural network (CNN) is getting booming. CNN has achieved good results in not only image recognition but computer vision applications. However, as the network models become deeper and deeper, the calculation cost continues to increase, and due to the rise of many applications such as wearable devices and Internet of Things (IoT) devices, the demands for deep learning have gradually increased. How to effectively apply these models of convolution neural network on those low-power devices has become a focus issue.
This paper implement the hardware design of the convolution layer in artificial neural network to reduce the computational complexity and save energy consumption. This paper uses TSMC 90nm process technology to synthesize 323k logic gates. It contains five layers of 369 multipliers and adders. At a clock frequency of 200Mhz, it can achieve an output of 39.21 frame per second (FPS) on AlexNet.
參考文獻
[1] Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolution neural networks”, in Proc.ISCA, 2016, pp. 367-379.
[2] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” in CVPR Workshop, 2014, pp. 682–687
[3] M. Sankaradas, et al., “A massively parallel coprocessor for convolution neural networks,” in Proc. ASAP, 2009, pp. 53–60.
[4] S. Chakradhar, M. Sankaradas, V. Jakkula,and S. Cadambi, “A dynamically configurable coprocessor for convolution neural networks,” in Proc. ISCA, 2010, pp. 247–257.
[5] S. Park, K. Bong, D. Shin, J. Lee, S. Choi,and H.-J. Yoo, “A 1.93 TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big data applications,” in IEEE ISSCC Dig. Tech.Papers, 2015, pp. 1–3.
[6] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A convolution network accelerator,” in Proc.GLVLSI, 2015, pp. 199–204.
[7] Z. Du, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in Proc.ISCA, 2015, pp. 92–104.
[8] S. Gupta, A. Agrawal, K. Gopalakrishnan,and P. Narayanan, “Deep learning with limited numerical precision,” in Proc.ICML, 2015, pp. 1737–1746.
[9] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-centric accelerator design for convolution neural networks,” in Proc. ICCD, 2013, pp. 13–19.
[10] T. Chen, et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in Proc. ASPLOS, 2014,pp. 269–284.
[11] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “LogNet: Energy-efficient neural networks using logrithmic computations,” in Proc. ICASSP, 2017, pp. 5900–5904.
[12] Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, “Scalable and modularized RTL compilation of convolution neural networks onto FPGA,” in Proc. FPL, 2016, pp. 1–8.
[13] M. Horowitz, “Computing’s energy problem (and what we can do about it),” in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 10–14.
[14] P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos, “Stripes:Bit-serial deep neural network computing,” in Proc. MICRO, 2016, pp. 1–12.
[15] M. Courbariaux and Y. Bengio, (Feb. 2016). “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or −1.”[Online]. Available: https://arxiv.org/abs/1602.02830
[16] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet classification using binary convolution neural networks,” in Proc. ECCV, 2016,pp. 525–542.
[17] Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, (Sep. 2016). “Quantized neural networks: Training neural Sze et al .: Efficient Processing of Deep Neural Networks: A Tutorial and Survey 2328 Proceedings of the IEEE | Vol. 105, No. 12, December 2017,networks with low precision weights andactivations.” [Online]. Available: https://arxiv.org/abs/1609.07061
[18] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, (Jun. 2016). “DoReFa-Net: Training low bitwidth convolution neural networks with lowbit width gradients.” [Online].Available:https://arxiv.org/abs/1606.06160
[19] R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “YodaNN: An ultra-low power convolution neural network accelerator based on binary weights,” in Proc. ISVLSI, 2016, pp. 236–241.
[20] K. Ando, et al., “BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS,” in Proc. Symp. VLSI, 2017, pp. C24–C25.
[21] M. Courbariaux, Y. Bengio, and J.-P. David, “BinaryConnect: Training deep neural networks with binary weights during propagations,” in Proc. NIPS, 2015, pp. 3123–3131.
[22] Li, F., Zhang, B., and Liu, “Ternary weight networks” in NIPS Workshop , 2016
[23] C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,” in Proc.ICLR, 2017.
[24] Z. Cai, X. He, J. Sun, and N. Vasconcelos,“Deep learning with low precision by halfwaveGaussian quantization,” in Proc. CVPR, 2017.
[25] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in Proc. ICLR, 2016.
[26] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,”inProceedings of the IEEE, vol. 86, no. 11, 1998, pp. 2278-2324.
[27] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet classification with deep convolution neural networks,” Proc. Adv. Neural Inf. Process. Syst., vol. 1, 2012, pp. 1097-1105.
[28] K. Simonyan and A. Zisserman, “Very deep convolution networks for largescale image recognition,” arXiv:1409.1556, 2014.
[29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going deeper with convolutions,” arXiv:1409.4842,2014.
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385, 2015.
[31] Y. LeCun, et al. (1998). The MNIST database. Available: http://yann.lecun.com/exdb/mnist/
[32] M. Lin, Q. Chen, S. Yan, "Network in network", in CoRR, 2013.
[33] A. Canziani, A. Paszke, E. Culurciello, "An analysis of deep neural network models for practical applications", inCoRR, May 2016, pp. 1-7.
[34] 張清宇 (2019)。降低類神經網路之運算複雜度的方法。國立成功大學工程科學研究所碩士論文,台南市。
[35] O. Russakovsky et al., "ImageNet large scale visual recognition challenge", Int. J. Comput. Vis., vol. 115, no. 3, Dec. 2015, pp. 211-252.
校內:立即公開