簡易檢索 / 詳目顯示

研究生: 張凱捷
Chang, Kai-Chieh
論文名稱: 以C、CUDA及FPGA分別實現CNN對CIFAR-10資料集分類的比較研究
A Comparative Study of C, CUDA and FPGA Implementations of CNN Classification on the CIFAR-10 Dataset
指導教授: 陳進興
Chen, Chin-Hsing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 45
中文關鍵詞: 現場可程式化邏輯閘陣列(FPGA)卷積神經網路(CNN)平行化設計UARTCIFAR-10
外文關鍵詞: FPGA, CNN, parallel design, UART, CIFAR-10
相關次數: 點閱:109下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文在FPGA上實現了用於CIFAR-10圖像辨識的一卷積神經網路系統。FPGA具有許多地的優點,如低延遲、低功耗和高靈活性,來加速圖像辨識任務。為了實現高效的計算,論文的設計受到了GPU平行處理的啟發,將平行化技術應用於卷積層、池化層和全連接層的計算中。這種設計方式極大地降低了整體計算時間,讓FPGA能夠以高效率執行CIFAR-10圖像辨識任務。
    所實現的系統透過UART介面,將測試圖像透過RS232從PC傳輸到FPGA上,再透過RS232將預測的結果傳輸回PC端進行顯示。通過這種並行計算的方法,FPGA可以在同一個時鐘週期內處理多個像素,進而加速特徵圖的計算。這種優化使得FPGA在處理大規模的圖像辨識任務時表現出色,同時還能維持準確度。因此,這種特徵圖並行計算的設計方式在加速卷積神經網路的圖像辨識應用中具有極大的應用價值。實驗結果顯示,相對於GPU,我們的設計達到大約2.5倍的加速效果,與CPU比較,則得到27.5倍的加速效果。在這個加速效果下,FPGA實現的CNN在CIFAR-10資料集上的辨識準確度與使用軟體端所得到的準確度相去不遠。

    This thesis presents FPGA implementation of a convolutional neural network (CNN) for CIFAR-10 image recognition tasks. FPGA offers several advantages, such as low latency, low power consumption, and high flexibility, making it a suitable platform to accelerate image recognition tasks. Inspired by GPUs, the design incorporates parallelization techniques in the computation of convolutional layers, pooling layers, and fully connected layers. This design significantly reduces the overall computation time, allowing the FPGA to efficiently execute CIFAR-10 image recognition tasks.
    Our system uses UART interface to transfer test images from PC to the FPGA via RS232 and then transmits the prediction results back to the PC for display. By employing this parallel processing approach, the FPGA can handle multiple pixels within the same clock cycle, accelerating the computation of feature maps. This optimization enhances the FPGA's performance in processing large-scale images for recognition tasks, while preserving the accuracy. As a result, concurrent feature map computation has substantial potential for the accelerating image recognition tasks based on convolutional neural networks.
    Although FPGA's parallelization level is not on par with GPUs, the results from the study indicate that the proposed design achieves 2.5 times speedup compared to the GPU and 27.5 times speedup compared to CPU's sequential computation. Despite this accelerated performance, the CNN implemented on the FPGA maintains recognition accuracy on the CIFAR-10 dataset that is comparable to the accuracy obtained from software-based computations.

    摘 要 I Abstract III 誌 謝 V Acknowledgment VI Contents VII List of Figures IX Chapter 1 Introduction 1 1.1 CNN tasks for CIFAR-10 1 1.2 Parallel processing of GPU 1 1.3 Motivation and Contribution 3 1.4 Thesis Outline 4 Chapter 2 Related Works 5 2.1 Convolution Neural Network 5 2.2 Forward Propagation 6 2.3 Backpropagation 11 2.4 Gray scale and Fix point transformation 13 2.5 FPGA 14 Chapter 3 Implementation of the Proposed System 16 3.1 Architecture of the purposed CNN 16 3.2 C Implementation 17 3.3 CUDA Implementation 17 3.4 FPGA Implementation 20 3.4.1 TOP Module 20 3.4.2 CNN module 21 3.4.2.1 Convolution Module 24 3.4.2.2 Max pooling Module 24 3.4.2.3 Fully Connected Module 25 3.4.2.4 Memory bits Modules 26 3.4.3 UART & control Module 28 Chapter 4 Experiment 30 4.1 CIFAR-10 30 4.2 Experimental result 31 Chapter 5 Conclusion and Future Work 42 5.1 Conclusion 42 5.2 Future Work 42 References 44

    [1] S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural network," International Conference on Engineering and Technology (ICET), Antalya, Turkey, pp. 1-6, 2017.
    [2] K. Chellapilla, S. Puri and P. Simard, "High Performance Convolutional Neural Networks for Document Processing," Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, France, 2006.
    [3] Djtfoo, "lenet5-verilog," (Source code), 2023, https://github.com/djtfoo/lenet5-verilog.
    [4] U. Farooq, Z. Marrakchi and H. Mehrez, "Tree-Based Heterogeneous FPGA Architectures,” Springer Science & Business Media, New York, USA, 2012.
    [5] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, Red Bank, NJ, USA, pp. 2278-2324, Nov. 1998.
    [6] M. Mavaddat, M. Naderan and S. E. Alavi, "Classification of Rice Leaf Diseases Using CNN-Based Pre-Trained Models and Transfer Learning," International Conference on Pattern Recognition and Image Analysis (IPRIA), Qom, Islamic Republic of Iran, pp. 1-6, 2023.
    [7] NVIDIA, "CUDA C++ Programming Guide v12.2," NVIDIA Documentation Hub, California, USA, 2023, https://docs.nvidia.com/cuda/cuda-c-programming-guide/contents.html.
    [8] J. Pomerat, A. Segev and R. Datta, "On Neural Network Activation Functions and Optimizers in Relation to Polynomial Regression," IEEE International Conference on Big Data (Big Data), Los Angeles, USA, pp. 6183-6185, 2019.
    [9] X. Tian, L. Wang and R. Zhang, "License Plate Recognition Based on CNN," International Conference on Computer Research and Development (ICCRD), Shenzhen, China, pp. 244-249, 2022.
    [10] S. I. Yudita, T. Mantoro and M. A. Ayu, "Deep Face Recognition for Imperfect Human Face Images on Social Media using the CNN Method," International Conference of Computer and Informatics Engineering (IC2IE), Depok, Indonesia, pp. 412-417, 2021.
    [11] J. Zhang, H. Cai and J. Li, "A High Energy Efficiency and Low Resource Consumption FPGA Accelerator for Convolutional Neural Network," International Conference on Computer and Communications (ICCC), Chengdu, China, pp. 1278-1283, 2021.
    [12] X. Zhang, "The AlexNet, LeNet-5 and VGG NET applied to CIFAR-10," International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Zhuhai, China, pp. 414-419, 2021.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE