簡易檢索 / 詳目顯示

研究生: 許博竣
Hsu, Po-Chun
論文名稱: 以CNN辨識數字的一心脈陣列FPGA實現
A FPGA Systolic Array Implementation of CNN for Recognition of Digits
指導教授: 陳進興
Chen, Chin-Hsing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 53
中文關鍵詞: 現場可規劃邏輯電路心脈陣列卷積神經網路RS232
外文關鍵詞: filed programmable logic gate array (FPGA), systolic array, convolutional neural network (CNN), RS232
相關次數: 點閱:75下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著人工智慧的普及和深度學習的複雜化,對硬體的運算能力的需求也越來越高。為了應對這一需求,高效能運算晶片變得至關重要。為此,本論文以心動陣列(SystolicArray)技術於現場可規劃邏輯電路(FPGA)實現一數字辨識系統。
    本論文採用卷積神經網路(CNN)架構來實現數字辨識,本論文的CNN架構包含兩層convolution layers、兩層max-pooling layers以及兩層full connection layers,其中convolution layers的卷積運算具有規律性,其卷積運算使用systolic array技術,來達到高效能運算。至於輸入的部分則是在PC端將影像處理成30 * 30的解析度並將RGB影像灰階化,再利用RS232模塊接收由PC傳輸的多筆數據做數字辨識,最終再利用RS232模塊將CNN辨識結果傳回到PC端顯示於螢幕上。
    使用心脈陣列與使用順序矩陣相乘之CNN架構相比較,前者具有處理速度快的優點,其辨識速度較使用順序矩陣相乘之CNN架構快了6倍,驗證了前者為一高效能數字辨識系統。

    Today, the application of artificial intelligence (AI) is pervades life, such as license plate recognition, face recognition, conversational robots, self-driving cars, etc. However, with the outbreak of the wave of AI, the deep learning model is becoming more and more complex and requires a huge calculation process. Therefore, high-performance computing Application Specific Integrated Circuit (ASIC) chips have become important. In this thesis, the systolic array technology is used to implement a digit recognition system on a field programmable gate array (FPGA), which can improve the system’s performance.
    This thesis utilizes a Convolutional Neural Network (CNN) architecture to implement a digit recognition system. The CNN architecture used in our system consists of two convolution layers, two max-pooling layers, and two fully connected layers. The convolution operation in the convolution layers is regular, therefore, systolic array technology is utilized in the convolutional layers to achieve high-performance computing. On the input side, the images are processed on the PC to a resolution of 30 * 30 and converted to grayscale. The RS232 module is used to receive multiple data inputs transmitted from the PC for digit recognition. Finally, the CNN recognition data is transmitted back to the PC using the RS232 module to display the digit recognition result.
    Compared to CNN architectures with sequential matrix multiplication, our system offers the advantage of fast processing speed. The recognition speed of our systolic-based system is six times faster than the CNN architecture that uses sequential matrix multiplication which shows our system is a high-performance computational digit recognition system.

    摘 要 I Abstract III 誌 謝 V Acknowledgment VI Contents VII List of Tables X List of Figures XI Chapter 1 Introduction 1 1.1 Digit recognition method 1 1.2 Convolutional Neural Network (CNN) & Systolic array 1 1.3 Field Programmable Gate Array (FPGA) 2 1.4 Convolutional Neural Network (CNN) 3 1.5 Motivation and Contribution 3 1.6 Thesis Organization 4 Chapter 2 Background Knowledge related to the Proposed System 5 2.1 Deep Learning 5 2.2 Overview of CNN 6 2.2.1 Convolution Layers 6 2.2.2 Pooling Layers 6 2.2.3 Fully Connected Layers 6 2.2.4 Activation Function 7 2.2.5 Backpropagation in CNN 8 2.3 RGB to Grayscale 10 2.4 Systolic Array 11 2.5 Weight Stationary systolic array 12 2.6 Processing Element (PE) 17 Chapter 3 Hardware Implementation of the Proposed System 19 3.1 Implementation of the Proposed Algorithm 19 3.2 Overview of the Proposed System 20 3.2.1 The Cyclone IV E FPGA 22 3.2.2 Programming Interface 22 3.2.3 RS232 Interface 22 3.3 The top Module 23 3.4 RS232 Module 24 3.4.1 RS232 Receiver Module 24 3.4.2 Control_in Module 24 3.4.3 RS232 Receiver Module 25 3.5 Grayscale Threshold Module 26 3.6 CNN Module 26 3.7 CNN Transmitter Module 28 3.8 CNN Receiver Module 29 3.9 Systolic Array & PE Module 30 3.9.1 Systolic Array Module 30 3.9.2 PE Module 31 3.10 FC_Op Module 31 Chapter 4 Experimental Results 32 4.1 Research equipment and environment 32 4.2 Systolic Array for Matrix Multiplication 34 4.2.1 Matrix Multiplication 34 4.2.2 Systolic Array convolution operation 34 4.2.3 Comparing recognition performance 45 4.3 Digit Recognition 46 4.4 The resources of FPGA used by the Proposed System 48 Chapter 5 Conclusion and Future Work 49 References 51

    [1] Bahar Asgari, Ramyad Hadidi and Hyesoon Kim, "MEISSA: multiplying matrices efficiently in a scale systolic architecture," Georgia Institute of Technology, GA, USA, 2020.
    [2] Eunjin Baek, Dongup Kwon and Jangwoo Kim, "A multi-neural network acceleration architecture," ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, pp. 940-953, 2020.
    [3] Mohammed Bakiri, Christophe Guyeux, Jean-Francois Couchot and Abdelkrim Kamel Oudjida, "Survey on hardware implementation of random number generators on FPGA: theory and experimental analyses," Computer Science Review, pp. 135-153, 2018.
    [4] Yoshua Bengio, Rejean Ducharme, Pascal Vincent and Chirstian Jauvin, "A neural probabilistic language model," Université de Montréal, Montréal, Canada, 2003.
    [5] Onkar Choudhari, Marisha Chopade, Sourabh Chopde, Swarali Dabhadkar and V Ingale, "Hardware accelerator: implementation of CNN on FPGA for digit recognition," 24th International Symposium on VLSI Design and Test (VDAT), Bhubaneswar, India, pp. 1-6, 2020.
    [6] Bin Ding, Huimin Qian and Jun Zhou, "Activation functions and their characteristics in deep neural networks," Chinese Control And Decision Conference (CCDC), Shenyang, China, pp. 1836-1841, 2022.
    [7] Shiv Ram Duby, Satish Kumar Singh and Bidyut Baran Chaudhuri, "Activation functions in deep learning: a comprehensive survey and benchmark," Cornell University, NY, USA, 2021.
    [8] MK Gurucharan, "Basic CNN architecture: explaining 5 layers of convolutional neural network," upGrad, 2022.
    [9] Norman Paul Jouppi, Cliff Young, Nishant Patil, David Patterson and Gaurav Agrawal, "In-datacenter performance analysis of a tensor processing unit," Google, Inc., California, USA, 2017.
    [10] Kaushal Kumar, Ritesh Kumar Mishra and Durgesh Nandan, "Efficient hardware of RGB to gray conversion realized on FPGA and ASIC," Procedia Computer Science, pp. 2008-2015, 2020.
    [11] Yann LeCun, Yoshua Bengio and Geofftry Hinton, "Deep learning," Nature 521, pp. 436-444, 2015.
    [12] Chen-Yi Lee and Mei-Cheng Lu, "An efficient VLSI architecture for full-search block matching algorithms," National Chiao Tung University, Hsinchu, ROC, 1997.
    [13] Keiron O’Shea and Ryan Nash, "An introduction to convolutional neural networks," Cornell University, NY, USA, 2015.
    [14] Ananda Samajdar, Yuhao Zhu, Paul Watmough, Matthew Mattina and Tushar Krishna, "Scale-sim: systolic CNN accelerator simulator," Cornell University, NY, USA, 2019.
    [15] Juergen Schmidhuber, "Deep learning in nural networks: an overview," Cornell University, NY, USA, 2014.
    [16] Ahmad Shawahna, Sadiq Sait and Aiman El-Maleh, "FPGA-based accelerators of deep learning networks for learning and classification: a review," IEEE, vol. 7, pp. 7823-7859, 2019.
    [17] Terasic Technology Inc., "DE2-115 User Manual," Terasic Technology Inc., 2012.
    [18] Zhifei Zhang, "Derivation of backpropagation in convolutional neural network," University of Tennessee, TN, USA, 2016.

    無法下載圖示 校內:2028-08-23公開
    校外:2028-08-23公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE