簡易檢索 / 詳目顯示

研究生: 潘柏儒
Pan, Bo-Ru
論文名稱: 主成分分析用於人工智慧晶片之硬體加速器
Hardware Accelerator for AI on Chip via Principal Component Analysis
指導教授: 李國君
Lee, Gwo-Giun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 71
中文關鍵詞: 主成分分析賈柏濾波器演算法暨架構共同探索硬體加速器特殊應用積體電路
外文關鍵詞: principal component analysis, Gabor filter, algorithm architecture co-design, accelerator, ASIC
相關次數: 點閱:134下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文主旨為提出了一個主成分分析演算法之硬體架構設計,設計目標主要用於賈柏濾波器。透過主成分分析方法將重要訊息集中於主特徵軸之特性,利用賈柏濾波器做完主成分分析結果之對稱性以降低運算複雜度。近年來,隨著邊緣人工智慧的流行,本文將以小面積以及低功耗為目標設計主成分分析之硬體架構。設計流程是基於演算法/架構共同設計,經由對演算法進行複雜度分析,包含運算量、平行度、記憶體大小、傳輸頻寬,進而設計出資料流模型。最終所提出之硬體架構是使用TSMC 90 nm製成進行合成,操作時脈為142.8百萬赫茲。

    This thesis proposes a hardware architecture of principal component analysis (PCA). The design goal is to apply PCA on Gabor filter and utilize the symmetrical property of PCA result to reduce the computation complexity. In recent years, Edge AI become new trend of AI application, this thesis aims to develop the PCA hardware architecture with small area and low power. Design flow is based on algorithm/architecture co-design, by analyzing number of operations, degree of parallelism, memory configuration, data transfer and then data flow is built. Final proposed architecture is synthesized with TSMC 90 nm technology, operating at the speed of 142.8 MHz.

    摘 要 i Abstract ii 誌 謝 iv Table of Contents vi List of Tables ix List of Figures x Chapter 1 Introduction 1 1.1 Objective 1 1.2 Motivation 1 1.3 Organization of the Thesis 2 1.4 Background Information 2 1.4.1 Convolutional Neural Network 2 1.4.2 Two-Dimensional Gabor filter 4 1.4.3 Related Works for Deep Learning Accelerator 6 Chapter 2 Principal Component Analysis Algorithm 10 2.1 Theoretical basis of PCA 10 2.1.1 Covariance Matrix 10 2.1.2 Eigen analysis of Covariance Matrix 12 2.1.3 Projection of Data 13 2.2 Related Algorithm for Eigen Decomposition 14 2.2.1 Preferred Method 14 2.2.2 Jacobi Iterative Algorithm 15 2.2.3 Coordinate Rotation Digital Computer 20 2.3 Related Work 24 Chapter 3 Architecture Design 26 3.1 Specification 26 3.2 Data Flow 27 3.3 Number of operations 33 3.3.1 Covariance matrix 33 3.3.2 Eigen Decomposition 34 3.3.3 Data Projection 36 3.4 Degree of Parallelism 36 3.4.1 Parallelism of Jacobi Algorithm 37 3.4.2 Parallelism of Data Projection 38 3.5 Memory Configuration 38 3.6 Data Transfer 40 Chapter 4 Hardware Implementation 42 4.1 Covariance Matrix Module 44 4.1.1 Zero Mean 45 4.1.2 Covariance 47 4.2 Eigen Decomposition Module 49 4.3 Matrix Multiplier Module 55 4.3.1 Vector Based Multiplication 55 4.3.2 Systolic array 57 Chapter 5 Verification and Experimental Result 61 5.1 Verification Plan 61 5.1.1 Verification of C model 62 5.1.2 Verification of RTL model 62 5.1.3 Verification of Gate Level model 63 5.2 Experimental Result 63 5.3 Comparison with Related Work 65 Chapter 6 Conclusion and Future Work 67 6.1 Conclusion 67 6.2 Future Work 68 References 69

    [1] K. Pearson, LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science, 1901. 2(11): p. 559-572.
    [2] G.G. Lee, Y.-K. Chen, et al., Algorithm/architecture co-exploration of visual computing on emergent platforms: Overview and future prospects. IEEE transactions on circuits and systems for video technology, 2009. 19(11): p. 1576-1587.
    [3] N. Sundaram, Making computer vision computationally efficient. 2012: University of California, Berkeley.
    [4] G.G. Lee, C.-H. Huang, et al., Complexity-aware Gabor filter bank architecture using principal component analysis. Journal of Signal Processing Systems, 2017. 89(3): p. 431-444.
    [5] M. Anthony and P.L. Bartlett, Neural network learning: Theoretical foundations. 2009: cambridge university press.
    [6] Y. LeCun, Generalization and network design strategies. Connectionism in perspective, 1989. 19: p. 143-155.
    [7] A. Hidaka and T. Kurita. Consecutive dimensionality reduction by canonical correlation analysis for visualization of convolutional neural networks. in Proceedings of the ISCIE international symposium on stochastic systems theory and its applications. 2017. The ISCIE Symposium on Stochastic Systems Theory and Its Applications.
    [8] J.G. Daugman, Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. JOSA A, 1985. 2(7): p. 1160-1169.
    [9] D. Gabor Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers - Part III: Radio and Communication Engineering, 1946. 93, 429-441.
    [10] J.G. Daugman, Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. IEEE Transactions on acoustics, speech, and signal processing, 1988. 36(7): p. 1169-1179.
    [11] R. Mehrotra, K.R. Namuduri, and N. Ranganathan, Gabor filter-based edge detection. Pattern recognition, 1992. 25(12): p. 1479-1494.
    [12] M. Lindenbaum, M. Fischer, and A. Bruckstein, On Gabor's contribution to image enhancement. Pattern recognition, 1994. 27(1): p. 1-8.
    [13] J.V. Soares, J.J. Leandro, et al., Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification. IEEE Transactions on medical Imaging, 2006. 25(9): p. 1214-1222.
    [14] A.C. Bovik, M. Clark, and W.S. Geisler, Multichannel texture analysis using localized spatial filters. IEEE transactions on pattern analysis and machine intelligence, 1990. 12(1): p. 55-73.
    [15] S.-Y. Chen, G.G.C. Lee, et al. Reconfigurable Edge via Analytics Architecture. in 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). 2019. IEEE.
    [16] N.P. Jouppi, C. Young, et al. In-datacenter performance analysis of a tensor processing unit. in Proceedings of the 44th annual international symposium on computer architecture. 2017.
    [17] Y.-H. Chen, T.-J. Yang, et al., Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019. 9(2): p. 292-308.
    [18] S. Kundu, M. Nazemi, et al., Pre-defined sparsity for low-complexity convolutional neural networks. IEEE Transactions on Computers, 2020. 69(7): p. 1045-1058.
    [19] Nvidia. NVIDIA Deep Learning Accelerator (NVDLA). [Internet]; Available from: http://nvdla.org/.
    [20] S. Yoon and A. Jameson, Lower-upper symmetric-Gauss-Seidel method for the Euler and Navier-Stokes equations. AIAA journal, 1988. 26(9): p. 1025-1026.
    [21] R.B. Lehoucq and D.C. Sorensen, Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM Journal on Matrix Analysis and Applications, 1996. 17(4): p. 789-821.
    [22] G.L. Sleijpen and H.A. Van der Vorst, A Jacobi--Davidson iteration method for linear eigenvalue problems. SIAM review, 2000. 42(2): p. 267-293.
    [23] J. Demmel and K. Veselić, Jacobi’s method is more accurate than QR. SIAM Journal on Matrix Analysis and Applications, 1992. 13(4): p. 1204-1245.
    [24] I. Bravo, C. Vázquez, et al., High level synthesis FPGA implementation of the Jacobi algorithm to solve the eigen problem. Mathematical Problems in Engineering, 2015. 2015.
    [25] A. Ruhe, The norm of a matrix after a similarity transformation. BIT Numerical Mathematics, 1969. 9(1): p. 53-58.
    [26] J.E. Volder, The CORDIC trigonometric computing technique. IRE Transactions on electronic computers, 1959(3): p. 330-334.
    [27] J.B. Walther, Computer-mediated communication: Impersonal, interpersonal, and hyperpersonal interaction. Communication research, 1996. 23(1): p. 3-43.
    [28] X. Hu, R.G. Harber, and S.C. Bass, Expanding the range of convergence of the CORDIC algorithm. IEEE Transactions on computers, 1991. 40(01): p. 13-21.
    [29] S.-F. Hsiao and J.-M. Delosme, Householder CORDIC algorithms. IEEE Transactions on Computers, 1995. 44(8): p. 990-1001.
    [30] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. in European conference on machine learning. 1998. Springer.
    [31] S. Peng, Q. Xu, et al., Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS letters, 2003. 555(2): p. 358-362.
    [32] C.P. GRILL and V.N. RUSH, Analysing spectral data: comparison and application of two techniques. Biological Journal of the Linnean society, 2000. 69(2): p. 121-138.
    [33] T. Elgamal and M. Hefeeda, Analysis of PCA algorithms in distributed environments. arXiv preprint arXiv:1503.05214, 2015.
    [34] M.A. Mansoori and M.R. Casu, High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method. Electronics, 2020. 9(3): p. 449.
    [35] L. Beilina, E. Karchevskii, and M. Karchevskii, Numerical linear algebra: Theory and applications. 2017: Springer.
    [36] U.A. Korat, A reconfigurable hardware implementation for the principal component analysis. 2016, San Diego State University.
    [37] G.G. Lee and S.C. Kim, Guest Editorial for the Special Section on “Algorithm Vs. Architectures: Opportunities and Challenges in Multicore/GPU DSP. Journal of Signal Processing Systems, 2017. 89(3): p. 415-416.
    [38] R.R. Teja and P.S. Reddy, Sine/cosine generator using pipelined cordic processor. International Journal of Engineering and Technology, 2011. 3(4): p. 431.
    [39] H.T. Kung and C.E. Leiserson, Systolic Arrays for (VLSI). 1978, CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.

    無法下載圖示 校內:2026-10-25公開
    校外:2026-10-25公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE