成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張凱捷 Chang, Kai-Chieh
論文名稱：	以C、CUDA及FPGA分別實現CNN對CIFAR-10資料集分類的比較研究 A Comparative Study of C, CUDA and FPGA Implementations of CNN Classification on the CIFAR-10 Dataset
指導教授：	陳進興 Chen, Chin-Hsing
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	45
中文關鍵詞：	現場可程式化邏輯閘陣列(FPGA) 、卷積神經網路(CNN) 、平行化設計、UART 、CIFAR-10
外文關鍵詞：	FPGA, CNN, parallel design, UART, CIFAR-10
相關次數：	點閱：152 下載：10
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文在FPGA上實現了用於CIFAR-10圖像辨識的一卷積神經網路系統。FPGA具有許多地的優點，如低延遲、低功耗和高靈活性，來加速圖像辨識任務。為了實現高效的計算，論文的設計受到了GPU平行處理的啟發，將平行化技術應用於卷積層、池化層和全連接層的計算中。這種設計方式極大地降低了整體計算時間，讓FPGA能夠以高效率執行CIFAR-10圖像辨識任務。
所實現的系統透過UART介面，將測試圖像透過RS232從PC傳輸到FPGA上，再透過RS232將預測的結果傳輸回PC端進行顯示。通過這種並行計算的方法，FPGA可以在同一個時鐘週期內處理多個像素，進而加速特徵圖的計算。這種優化使得FPGA在處理大規模的圖像辨識任務時表現出色，同時還能維持準確度。因此，這種特徵圖並行計算的設計方式在加速卷積神經網路的圖像辨識應用中具有極大的應用價值。實驗結果顯示，相對於GPU，我們的設計達到大約2.5倍的加速效果，與CPU比較，則得到27.5倍的加速效果。在這個加速效果下，FPGA實現的CNN在CIFAR-10資料集上的辨識準確度與使用軟體端所得到的準確度相去不遠。

This thesis presents FPGA implementation of a convolutional neural network (CNN) for CIFAR-10 image recognition tasks. FPGA offers several advantages, such as low latency, low power consumption, and high flexibility, making it a suitable platform to accelerate image recognition tasks. Inspired by GPUs, the design incorporates parallelization techniques in the computation of convolutional layers, pooling layers, and fully connected layers. This design significantly reduces the overall computation time, allowing the FPGA to efficiently execute CIFAR-10 image recognition tasks.
Our system uses UART interface to transfer test images from PC to the FPGA via RS232 and then transmits the prediction results back to the PC for display. By employing this parallel processing approach, the FPGA can handle multiple pixels within the same clock cycle, accelerating the computation of feature maps. This optimization enhances the FPGA's performance in processing large-scale images for recognition tasks, while preserving the accuracy. As a result, concurrent feature map computation has substantial potential for the accelerating image recognition tasks based on convolutional neural networks.
Although FPGA's parallelization level is not on par with GPUs, the results from the study indicate that the proposed design achieves 2.5 times speedup compared to the GPU and 27.5 times speedup compared to CPU's sequential computation. Despite this accelerated performance, the CNN implemented on the FPGA maintains recognition accuracy on the CIFAR-10 dataset that is comparable to the accuracy obtained from software-based computations.

摘  要	I
Abstract	III
誌　謝	V
Acknowledgment	VI
Contents	VII
List of Figures	IX
Chapter 1	Introduction	1
1.1	CNN tasks for CIFAR-10	1
1.2	Parallel processing of GPU	1
1.3	Motivation and Contribution	3
1.4	Thesis Outline	4
Chapter 2	Related Works	5
2.1	Convolution Neural Network	5
2.2	Forward Propagation	6
2.3	Backpropagation	11
2.4	Gray scale and Fix point transformation	13
2.5	FPGA	14
Chapter 3	Implementation of the Proposed System	16
3.1	Architecture of the purposed CNN	16
3.2	C Implementation	17
3.3	CUDA Implementation	17
3.4	FPGA Implementation	20
3.4.1	TOP Module	20
3.4.2	CNN module	21
3.4.2.1	Convolution Module	24
3.4.2.2	Max pooling Module	24
3.4.2.3	Fully Connected Module	25
3.4.2.4	Memory bits Modules	26
3.4.3	UART & control Module	28
Chapter 4	Experiment	30
4.1	CIFAR-10	30
4.2	Experimental result	31
Chapter 5	Conclusion and Future Work	42
5.1	Conclusion	42
5.2	Future Work	42
References	44
                                    

[1] S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural network," International Conference on Engineering and Technology (ICET), Antalya, Turkey, pp. 1-6, 2017.
[2] K. Chellapilla, S. Puri and P. Simard, "High Performance Convolutional Neural Networks for Document Processing," Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, France, 2006.
[3] Djtfoo, "lenet5-verilog," (Source code), 2023, https://github.com/djtfoo/lenet5-verilog.
[4] U. Farooq, Z. Marrakchi and H. Mehrez, "Tree-Based Heterogeneous FPGA Architectures,” Springer Science & Business Media, New York, USA, 2012.
[5] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, Red Bank, NJ, USA, pp. 2278-2324, Nov. 1998.
[6] M. Mavaddat, M. Naderan and S. E. Alavi, "Classification of Rice Leaf Diseases Using CNN-Based Pre-Trained Models and Transfer Learning," International Conference on Pattern Recognition and Image Analysis (IPRIA), Qom, Islamic Republic of Iran, pp. 1-6, 2023.
[7] NVIDIA, "CUDA C++ Programming Guide v12.2," NVIDIA Documentation Hub, California, USA, 2023, https://docs.nvidia.com/cuda/cuda-c-programming-guide/contents.html.
[8] J. Pomerat, A. Segev and R. Datta, "On Neural Network Activation Functions and Optimizers in Relation to Polynomial Regression," IEEE International Conference on Big Data (Big Data), Los Angeles, USA, pp. 6183-6185, 2019.
[9] X. Tian, L. Wang and R. Zhang, "License Plate Recognition Based on CNN," International Conference on Computer Research and Development (ICCRD), Shenzhen, China, pp. 244-249, 2022.
[10] S. I. Yudita, T. Mantoro and M. A. Ayu, "Deep Face Recognition for Imperfect Human Face Images on Social Media using the CNN Method," International Conference of Computer and Informatics Engineering (IC2IE), Depok, Indonesia, pp. 412-417, 2021.
[11] J. Zhang, H. Cai and J. Li, "A High Energy Efficiency and Low Resource Consumption FPGA Accelerator for Convolutional Neural Network," International Conference on Computer and Communications (ICCC), Chengdu, China, pp. 1278-1283, 2021.
[12] X. Zhang, "The AlexNet, LeNet-5 and VGG NET applied to CIFAR-10," International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Zhuhai, China, pp. 414-419, 2021.

校外：立即公開

簡易檢索 / 詳目顯示

相關論文