成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭傑升 Cheng, Chieh-Sheng
論文名稱：	卷積神經網路之硬體加速器設計 Hardware Accelerator Design for Convolutional Neural Networks
指導教授：	卿文龍 Chin, Wen-Long
學位類別：	碩士 Master
系所名稱：	工學院 - 工程科學系 Department of Engineering Science
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	56
中文關鍵詞：	深度學習、卷積神經網路、硬體加速
外文關鍵詞：	convolutional neural networks, hardware acceleration, ASIC design.
相關次數：	點閱：140 下載：8
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

自從AlexNet於2012年的ImageNet大規模視覺辨識大賽(ImageNet Large Scale Visual Recognition Competition, ILSVRC)中奪得桂冠後，卷積神經網路(convolutional neural network, CNN)開始如雨後春筍般地發展，不論是在原本的圖像辨識，抑或其他電腦視覺應用均有不錯的成果。但是因為其所需大量的計算以及龐大的資料存取量，使其計算時間非常冗長，所以許多專家學者開始致力於研究其硬體加速器設計。
本篇論文實現完整卷積神經網路，包含卷積層(convolutional layer)以及全連接層(fully connected layer)之硬體加速器。主要技術包括將卷積層與全連接層使用管線化(pipeline)計算、將二維的最大池化(max pooling)分為兩階段一維最大池化、以及提出一個讓乘加器能適用於各種大小的卷積核(kernel)的卷積計算方式。以AlexNet為例，此篇論文使用TSMC 90nm製程技術下，合成出2.3M邏輯閘數目。其中包含了105個乘加器與21.82KB的內部記憶體。在250Mhz的時脈頻率下，可以在AlexNet上達到35.24 frame per second (FPS)的輸出量。

We present an ASIC design to accelerate state-of-the-art convolutional neural networks (CNNs), including convolutional layer, pooling layer, fully connected layer, and activation function. The main ideas are using the pipeline calculation for the convolutional and the fully connected layers, dividing the two-dimensional maximum pooling (2D-maxpooling) into two-stage one-dimensional maximum pooling, and optimizing the multiplier-accumulator (MAC) utilization for various sizes of convolutional kernels. An implementation example fulfilled with TSMC 90 nm process for the AlexNet consumes a 2.3 M gate count for 105 MACs and a 21.82 kB internal memory , which achieves 35.24 frame per second (FPS) under 250 MHz clock frequency for the whole AlexNet.

中文摘要	i
英文摘要	ii
誌謝	x
目錄	xi
表目錄	xiv
圖目錄	xv
符號說明	xvii
第一章	緒論	1
1 前言	1
2 研究動機與目的	1
3 文獻探討	1
3.1	FPGA設計	2
3.2	ASIC設計	6
4 論文架構	11
第二章	卷積神經網路	12
1	卷積神經網路介紹	12
1.1	卷積層	12
1.2	池化層	15
1.3	全連接層	16
1.4	激勵函數	17
2	常見網路模型	17
2.1	LENET	18
2.2	ALEXNET	19
2.3	ZFNET	19
2.4	VGGNET	20
2.5	GOOGLENET	21
2.6	RESNET	22
2.7	總結	23
第三章	資料分析與架構設計	25
1網路模型分析	25
1.1	ALEXNET卷積層分析	25
1.2	ALEXNET全連接層分析	27
1.3	結論	29
2	設計理念	29
2.1	卷積方法	29
2.2	二維最大池化	33
2.3	PIPELINE	33
第四章	硬體實現	35
1	硬體規格	36
1.1	卷積層模組硬體規格	37
1.2 全連接層模組硬體規格	38
2	資料流程	39
2.1 	卷積層模組資料流程	39
2.2	全連接層模組資料流程	45
第五章	結果比較與分析	47
第六章	結論與未來展望	52
參考文獻	53

                                    

[1] V. Gokhale, J. Jin, A. Dundar, B. Martini, E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks”, Proc. IEEE Conf. Computer. Vision and. Pattern Recognition. Workshops (CVPRW), pp. 696-701, Jun. 2014.
[2] C. Farabet, C. Poulet, J. Y. Han, Y. LeCun, “Cnp: An fpga-based processor for convolutional networks”, International Conference on Field Programmable Logic and Applications (FPL'09), pp. 112, September 2009.
[3] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, Y. LeCun, "NeuFlow: A runtime reconfigurable dataflow processor for vision", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp. 109-116, Jun. 2011.
[4] P.-H. Pham, D. Jelaca, C. Farabet, B. Martini, Y. LeCun, E. Culurciello, "NeuFlow: Dataflow vision processing system-on-a-chip", Proc. IEEE 55th Int. Midwest Symp. Circuits Syst., pp. 1044-1047, Aug. 2012.
[5] M. Peemen, A. A. A. Setio, B. Mesman, H. Corporaal, "Memory-centric accelerator design for convolutional neural networks", Proc. IEEE 31st Int. Conf. Comput. Design (ICCD), pp. 13-19, Oct. 2013.
[6] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks", Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, pp. 161-170, 2015.
[7] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, H. Yang, "Going deeper with embedded FPGA platform for convolutional neural network", Proc. FPGA, pp. 26-35, 2016.
[8] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, O. Temam, "DaDianNao: A machine-learning supercomputer", Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchitecture, pp. 609-622, 2014.
[9] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning", Proc. 19th Int. Conf. Archit. Support Program. Lang. Oper. Syst., pp. 269-284, 2014.
[10] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, O. Temam, "Shidiannao: shifting vision processing closer to the sensor", ISCA. ACM, pp. 92-104, 2015.
[11] D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Temam, X. Feng, X. Zhou, Y. Chen, "Pudiannao: A polyvalent machine learning accelerator", International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), pp. 369-381, 2015.
[12] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, "A high-throughput neural network accelerator", IEEE Micro, vol. 35, no. 3, pp. 24-32, May 2015.
[13] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, L. Benini, "Origami: A convolutional network accelerator", Proc. GLSVLSI, pp. 199-204, May 2015.
[14] L. Cavigelli, M. Magno, L. Benini, "Accelerating real-time embedded scene labeling with convolutional networks", Proc. DAC, pp. 108, 2015.
[15] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, W. J. Dally, "EIE: Efficient inference engine on compressed deep neural network", Proc. ISCA, pp. 243-254, Aug 2016.
[16] J. Sim, J.-S. Park, M. Kim, D. Bae, Y. Choi, L.-S. Kim, "A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 264-265, Feb. 2016.
[17] Y.-H. Chen, T. Krishna, J. S. Emer, V. Sze, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 262-263, Feb. 2016.
[18] Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks", Proc. ISCA, pp. 367-379, 2016.
[19] O. Russakovsky et al., "ImageNet large scale visual recognition challenge", Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, Dec. 2015.
[20] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, "Gradient-based learning applied to document recognition", Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[21] Y. LeCun, et al. (1998). The MNIST database. Available: http://yann.lecun.com/exdb/mnist/
[22] A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet classification with deep convolutional neural networks", Proc. Adv. Neural Inf. Process. Syst., vol. 25, pp. 1097-1105, 2012.
[23] M. D. Zeiler, R. Fergus, "Visualizing and understanding convolutional networks", Proc. Eur. Conf. Comput. Vis., pp. 818-833, 2014.
[24] K. Simonyan and A. Zisserman, "Very deep convolutional networks for largescale image recognition," CoRR, 2014
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE CVPR, Jun. 2015, pp. 1–9.
[26] M. Lin, Q. Chen, S. Yan, "Network in network", CoRR, 2013.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE CVPR, Jun. 2016, pp. 770–778.
[28] A. Canziani, A. Paszke, E. Culurciello, "An analysis of deep neural network models for practical applications", CoRR, pp. 1-7, May 2016.
[29] Y.-J. Lin, T. S. Chang, "Data and hardware efficient design for convolutional neural network", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 65, no. 5, pp. 1642-1651, May 2018.
[30] S. Chintala (2015). convnet-benchmarks. Available: http://github.com/soumith/convnet-benchmarks
[31] N. Suda, V. Chandra, "Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks", ACM International Symposium on Field-Programmable Gate Arrays, pp. 16-25, 2016
[32] C. Zhang, Z. Fang, P. Zhou, P. Pan, J. Cong, "Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks", ICCAD, 2016.
[33] W. Lu, G. Yan, J. Li, S. Gong, Y. Han, X. Li, "FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks", Proc. HPCA, pp. 553-564, 2017.

校內：2024-03-01公開
校外：2024-03-01公開

簡易檢索 / 詳目顯示

相關論文