成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃啟翔 Huang, Chi-Hsiang
論文名稱：	降低類神經網路卷積層計算複雜度之硬體設計 Hardware Design for Reducing Computational Complexity of Convolution Layer in Artificial Neural Networks
指導教授：	卿文龍 Chin, Wen-Long
學位類別：	碩士 Master
系所名稱：	工學院 - 工程科學系 Department of Engineering Science
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	55
中文關鍵詞：	深度學習、卷積神經網路、硬體加速、能量效率
外文關鍵詞：	deep learning, convolution neural network, hardware accelerator, energy efficiency
相關次數：	點閱：329 下載：8
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在2012年的ImageNet大規模視覺辨識大賽 (ImageNet Large Scale Visual Recognition Competition, ILSVRC)中多倫多大學的Alex Krizhevsky以top 5正確率高於第二名10.9 %奪得桂冠後，卷積神經網路 (convolution neural network, CNN)如雨後春筍般地蓬勃發展。不論是在原本的圖像辨識，抑或其他電腦視覺應用均有不錯的成果。但是隨著網路模型越做越深，計算成本不斷疊加，又因可穿戴裝置、自主機器人、物聯網 (IoT)設備等眾多應用興起，對於深度學習的需求也逐漸提高。如何在那些低功耗設備上有效應用卷積神經網路的模型成為焦點議題。
本篇論文實現類神經網路之卷積層的硬體設計，以達到降低計算複雜度進而節省功耗的效果。本篇論文使用TSMC 90nm製程技術下，合成出323k邏輯閘數目。在200Mhz的時脈頻率下，可以在AlexNet上達到39.21 frame per second (FPS)的輸出量。

In the 2012 ImageNet Large Scale Visual Recognition Competition (ILSVRC), Alex Krizhevsky from Toronto University won the champion with a top 5 accuracy rate 10.9% higher than the second place. Convolution neural network (CNN) is getting booming. CNN has achieved good results in not only image recognition but computer vision applications. However, as the network models become deeper and deeper, the calculation cost continues to increase, and due to the rise of many applications such as wearable devices and Internet of Things (IoT) devices, the demands for deep learning have gradually increased. How to effectively apply these models of convolution neural network on those low-power devices has become a focus issue.
This paper implement the hardware design of the convolution layer in artificial neural network to reduce the computational complexity and save energy consumption. This paper uses TSMC 90nm process technology to synthesize 323k logic gates. It contains five layers of 369 multipliers and adders. At a clock frequency of 200Mhz, it can achieve an output of 39.21 frame per second (FPS) on AlexNet.

中文摘要	i
英文摘要	ii
誌謝	xvii
目錄	xviii
表目錄	xx
圖目錄	xxi
符號說明	xxiii
第一章	緒論	1
1.1 前言	1
1.2 研究動機與目的	1
1.3文獻探討	2
1.3.1	硬體加速器	2
1.3.2	DNN模型協同硬體的設計	6
1.4論文架構	11
第二章	卷積神經網路	12
2.1	卷積神經網路介紹	12
2.1.1	卷積層	12
2.1.2	激勵函數	15
2.1.3	池化層	16
2.1.4	全連接層	17
2.2	常見網路模型	18
2.2.1	LeNet	18
2.2.2	AlexNet	19
2.2.3	VGGNet	20
2.2.4	GoogLeNet	21
2.2.5	ResNet	22
2.2.6	總結	23
第三章	演算法設計	25
3.1 激勵函數	25
3.2數值分割	25
3.3	資料流程	27
3.3.1Y1資料流程	30
3.3.2Y2資料流程	32
3.3.3狀態機	34
第四章	硬體實現	36
4.1	硬體規格	36
4.2	乘法器	39
4.2.1	Y2部分乘法器	40
4.3	硬體架構	41
4.3.1 Y1模組硬體架構	43
4.3.2 Y2模組硬體架構	43
第五章	結果比較與分析	45
5.1	對照組	45
5.2	結果比較及分析	45
第六章	結論與未來展望	50
參考文獻	51

                                    

參考文獻
[1] Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolution neural networks”, in Proc.ISCA, 2016, pp. 367-379.
[2] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” in CVPR Workshop, 2014, pp. 682–687
[3] M. Sankaradas, et al., “A massively parallel coprocessor for convolution neural networks,” in Proc. ASAP, 2009, pp. 53–60.
[4] S. Chakradhar, M. Sankaradas, V. Jakkula,and S. Cadambi, “A dynamically configurable coprocessor for convolution neural networks,” in Proc. ISCA, 2010, pp. 247–257.
[5] S. Park, K. Bong, D. Shin, J. Lee, S. Choi,and H.-J. Yoo, “A 1.93 TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big data applications,” in IEEE ISSCC Dig. Tech.Papers, 2015, pp. 1–3.
[6] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A convolution network accelerator,” in Proc.GLVLSI, 2015, pp. 199–204.
[7] Z. Du, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in Proc.ISCA, 2015, pp. 92–104.
[8] S. Gupta, A. Agrawal, K. Gopalakrishnan,and P. Narayanan, “Deep learning with limited numerical precision,” in Proc.ICML, 2015, pp. 1737–1746.
[9] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-centric accelerator design for convolution neural networks,” in Proc. ICCD, 2013, pp. 13–19.
[10] T. Chen, et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in Proc. ASPLOS, 2014,pp. 269–284.
[11] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “LogNet: Energy-efficient neural networks using logrithmic computations,” in Proc. ICASSP, 2017, pp. 5900–5904.
[12] Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, “Scalable and modularized RTL compilation of convolution neural networks onto FPGA,” in Proc. FPL, 2016, pp. 1–8.
[13] M. Horowitz, “Computing’s energy problem (and what we can do about it),” in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 10–14.
[14] P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos, “Stripes:Bit-serial deep neural network computing,” in Proc. MICRO, 2016, pp. 1–12.
[15] M. Courbariaux and Y. Bengio, (Feb. 2016). “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or −1.”[Online]. Available: https://arxiv.org/abs/1602.02830
[16] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet classification using binary convolution neural networks,” in Proc. ECCV, 2016,pp. 525–542.
[17] Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, (Sep. 2016). “Quantized neural networks: Training neural Sze et al .: Efficient Processing of Deep Neural Networks: A Tutorial and Survey 2328 Proceedings of the IEEE | Vol. 105, No. 12, December 2017,networks with low precision weights andactivations.” [Online]. Available: https://arxiv.org/abs/1609.07061
[18] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, (Jun. 2016). “DoReFa-Net: Training low bitwidth convolution neural networks with lowbit width gradients.” [Online].Available:https://arxiv.org/abs/1606.06160
[19] R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “YodaNN: An ultra-low power convolution neural network accelerator based on binary weights,” in Proc. ISVLSI, 2016, pp. 236–241.
[20] K. Ando, et al., “BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS,” in Proc. Symp. VLSI, 2017, pp. C24–C25.
[21] M. Courbariaux, Y. Bengio, and J.-P. David, “BinaryConnect: Training deep neural networks with binary weights during propagations,” in Proc. NIPS, 2015, pp. 3123–3131.
[22] Li, F., Zhang, B., and Liu, “Ternary weight networks” in NIPS Workshop , 2016
[23] C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,” in Proc.ICLR, 2017.
[24] Z. Cai, X. He, J. Sun, and N. Vasconcelos,“Deep learning with low precision by halfwaveGaussian quantization,” in Proc. CVPR, 2017.
[25] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in Proc. ICLR, 2016.
[26] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,”inProceedings of the IEEE, vol. 86, no. 11, 1998, pp. 2278-2324.
[27] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet classification with deep convolution neural networks,” Proc. Adv. Neural Inf. Process. Syst., vol. 1, 2012, pp. 1097-1105.
[28] K. Simonyan and A. Zisserman, “Very deep convolution networks for largescale image recognition,” arXiv:1409.1556, 2014.
[29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going deeper with convolutions,” arXiv:1409.4842,2014.
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385, 2015.
[31] Y. LeCun, et al. (1998). The MNIST database. Available: http://yann.lecun.com/exdb/mnist/
[32] M. Lin, Q. Chen, S. Yan, "Network in network", in CoRR, 2013.
[33] A. Canziani, A. Paszke, E. Culurciello, "An analysis of deep neural network models for practical applications", inCoRR, May 2016, pp. 1-7.
[34] 張清宇 (2019)。降低類神經網路之運算複雜度的方法。國立成功大學工程科學研究所碩士論文，台南市。
[35] O. Russakovsky et al., "ImageNet large scale visual recognition challenge", Int. J. Comput. Vis., vol. 115, no. 3, Dec. 2015, pp. 211-252.

校外：不公開電子論文及紙本論文均尚未授權公開

簡易檢索 / 詳目顯示

相關論文