成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	潘柏儒 Pan, Bo-Ru
論文名稱：	主成分分析用於人工智慧晶片之硬體加速器 Hardware Accelerator for AI on Chip via Principal Component Analysis
指導教授：	李國君 Lee, Gwo-Giun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	71
中文關鍵詞：	主成分分析、賈柏濾波器、演算法暨架構共同探索、硬體加速器、特殊應用積體電路
外文關鍵詞：	principal component analysis, Gabor filter, algorithm architecture co-design, accelerator, ASIC
相關次數：	點閱：199 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文主旨為提出了一個主成分分析演算法之硬體架構設計，設計目標主要用於賈柏濾波器。透過主成分分析方法將重要訊息集中於主特徵軸之特性，利用賈柏濾波器做完主成分分析結果之對稱性以降低運算複雜度。近年來，隨著邊緣人工智慧的流行，本文將以小面積以及低功耗為目標設計主成分分析之硬體架構。設計流程是基於演算法/架構共同設計，經由對演算法進行複雜度分析，包含運算量、平行度、記憶體大小、傳輸頻寬，進而設計出資料流模型。最終所提出之硬體架構是使用TSMC 90 nm製成進行合成，操作時脈為142.8百萬赫茲。

This thesis proposes a hardware architecture of principal component analysis (PCA). The design goal is to apply PCA on Gabor filter and utilize the symmetrical property of PCA result to reduce the computation complexity. In recent years, Edge AI become new trend of AI application, this thesis aims to develop the PCA hardware architecture with small area and low power. Design flow is based on algorithm/architecture co-design, by analyzing number of operations, degree of parallelism, memory configuration, data transfer and then data flow is built. Final proposed architecture is synthesized with TSMC 90 nm technology, operating at the speed of 142.8 MHz.

摘  要	i
Abstract	ii
誌  謝	iv
Table of Contents	vi
List of Tables	ix
List of Figures	x
Chapter 1	Introduction	1
1	Objective	1
2	Motivation	1
3	Organization of the Thesis	2
4	Background Information	2
4.1	Convolutional Neural Network	2
4.2	Two-Dimensional Gabor filter	4
4.3	Related Works for Deep Learning Accelerator	6
Chapter 2	Principal Component Analysis Algorithm	10
1	Theoretical basis of PCA	10
1.1	Covariance Matrix	10
1.2	Eigen analysis of Covariance Matrix	12
1.3	Projection of Data	13
2	Related Algorithm for Eigen Decomposition	14
2.1	Preferred Method	14
2.2	Jacobi Iterative Algorithm	15
2.3	Coordinate Rotation Digital Computer	20
3	Related Work	24
Chapter 3	Architecture Design	26
1	Specification	26
2	Data Flow	27
3	Number of operations	33
3.1	Covariance matrix	33
3.2	Eigen Decomposition	34
3.3	Data Projection	36
4	Degree of Parallelism	36
4.1	Parallelism of Jacobi Algorithm	37
4.2	Parallelism of Data Projection	38
5	Memory Configuration	38
6	Data Transfer	40
Chapter 4	Hardware Implementation	42
1	Covariance Matrix Module	44
1.1	Zero Mean	45
1.2	Covariance	47
2	Eigen Decomposition Module	49
3	Matrix Multiplier Module	55
3.1	Vector Based Multiplication	55
3.2	Systolic array	57
Chapter 5	Verification and Experimental Result	61
1	Verification Plan	61
1.1	Verification of C model	62
1.2	Verification of RTL model	62
1.3	Verification of Gate Level model	63
2	Experimental Result	63
3	Comparison with Related Work	65
Chapter 6	Conclusion and Future Work	67
1	Conclusion	67
2	Future Work	68
References	69
                                    

[1] K. Pearson, LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science, 1901. 2(11): p. 559-572.
[2] G.G. Lee, Y.-K. Chen, et al., Algorithm/architecture co-exploration of visual computing on emergent platforms: Overview and future prospects. IEEE transactions on circuits and systems for video technology, 2009. 19(11): p. 1576-1587.
[3] N. Sundaram, Making computer vision computationally efficient. 2012: University of California, Berkeley.
[4] G.G. Lee, C.-H. Huang, et al., Complexity-aware Gabor filter bank architecture using principal component analysis. Journal of Signal Processing Systems, 2017. 89(3): p. 431-444.
[5] M. Anthony and P.L. Bartlett, Neural network learning: Theoretical foundations. 2009: cambridge university press.
[6] Y. LeCun, Generalization and network design strategies. Connectionism in perspective, 1989. 19: p. 143-155.
[7] A. Hidaka and T. Kurita. Consecutive dimensionality reduction by canonical correlation analysis for visualization of convolutional neural networks. in Proceedings of the ISCIE international symposium on stochastic systems theory and its applications. 2017. The ISCIE Symposium on Stochastic Systems Theory and Its Applications.
[8] J.G. Daugman, Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. JOSA A, 1985. 2(7): p. 1160-1169.
[9] D. Gabor Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers - Part III: Radio and Communication Engineering, 1946. 93, 429-441.
[10] J.G. Daugman, Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. IEEE Transactions on acoustics, speech, and signal processing, 1988. 36(7): p. 1169-1179.
[11] R. Mehrotra, K.R. Namuduri, and N. Ranganathan, Gabor filter-based edge detection. Pattern recognition, 1992. 25(12): p. 1479-1494.
[12] M. Lindenbaum, M. Fischer, and A. Bruckstein, On Gabor's contribution to image enhancement. Pattern recognition, 1994. 27(1): p. 1-8.
[13] J.V. Soares, J.J. Leandro, et al., Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification. IEEE Transactions on medical Imaging, 2006. 25(9): p. 1214-1222.
[14] A.C. Bovik, M. Clark, and W.S. Geisler, Multichannel texture analysis using localized spatial filters. IEEE transactions on pattern analysis and machine intelligence, 1990. 12(1): p. 55-73.
[15] S.-Y. Chen, G.G.C. Lee, et al. Reconfigurable Edge via Analytics Architecture. in 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). 2019. IEEE.
[16] N.P. Jouppi, C. Young, et al. In-datacenter performance analysis of a tensor processing unit. in Proceedings of the 44th annual international symposium on computer architecture. 2017.
[17] Y.-H. Chen, T.-J. Yang, et al., Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019. 9(2): p. 292-308.
[18] S. Kundu, M. Nazemi, et al., Pre-defined sparsity for low-complexity convolutional neural networks. IEEE Transactions on Computers, 2020. 69(7): p. 1045-1058.
[19] Nvidia. NVIDIA Deep Learning Accelerator (NVDLA). [Internet]; Available from: http://nvdla.org/.
[20] S. Yoon and A. Jameson, Lower-upper symmetric-Gauss-Seidel method for the Euler and Navier-Stokes equations. AIAA journal, 1988. 26(9): p. 1025-1026.
[21] R.B. Lehoucq and D.C. Sorensen, Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM Journal on Matrix Analysis and Applications, 1996. 17(4): p. 789-821.
[22] G.L. Sleijpen and H.A. Van der Vorst, A Jacobi--Davidson iteration method for linear eigenvalue problems. SIAM review, 2000. 42(2): p. 267-293.
[23] J. Demmel and K. Veselić, Jacobi’s method is more accurate than QR. SIAM Journal on Matrix Analysis and Applications, 1992. 13(4): p. 1204-1245.
[24] I. Bravo, C. Vázquez, et al., High level synthesis FPGA implementation of the Jacobi algorithm to solve the eigen problem. Mathematical Problems in Engineering, 2015. 2015.
[25] A. Ruhe, The norm of a matrix after a similarity transformation. BIT Numerical Mathematics, 1969. 9(1): p. 53-58.
[26] J.E. Volder, The CORDIC trigonometric computing technique. IRE Transactions on electronic computers, 1959(3): p. 330-334.
[27] J.B. Walther, Computer-mediated communication: Impersonal, interpersonal, and hyperpersonal interaction. Communication research, 1996. 23(1): p. 3-43.
[28] X. Hu, R.G. Harber, and S.C. Bass, Expanding the range of convergence of the CORDIC algorithm. IEEE Transactions on computers, 1991. 40(01): p. 13-21.
[29] S.-F. Hsiao and J.-M. Delosme, Householder CORDIC algorithms. IEEE Transactions on Computers, 1995. 44(8): p. 990-1001.
[30] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. in European conference on machine learning. 1998. Springer.
[31] S. Peng, Q. Xu, et al., Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS letters, 2003. 555(2): p. 358-362.
[32] C.P. GRILL and V.N. RUSH, Analysing spectral data: comparison and application of two techniques. Biological Journal of the Linnean society, 2000. 69(2): p. 121-138.
[33] T. Elgamal and M. Hefeeda, Analysis of PCA algorithms in distributed environments. arXiv preprint arXiv:1503.05214, 2015.
[34] M.A. Mansoori and M.R. Casu, High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method. Electronics, 2020. 9(3): p. 449.
[35] L. Beilina, E. Karchevskii, and M. Karchevskii, Numerical linear algebra: Theory and applications. 2017: Springer.
[36] U.A. Korat, A reconfigurable hardware implementation for the principal component analysis. 2016, San Diego State University.
[37] G.G. Lee and S.C. Kim, Guest Editorial for the Special Section on “Algorithm Vs. Architectures: Opportunities and Challenges in Multicore/GPU DSP. Journal of Signal Processing Systems, 2017. 89(3): p. 415-416.
[38] R.R. Teja and P.S. Reddy, Sine/cosine generator using pipelined cordic processor. International Journal of Engineering and Technology, 2011. 3(4): p. 431.
[39] H.T. Kung and C.E. Leiserson, Systolic Arrays for (VLSI). 1978, CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.

2026-10-25公開電子論文及紙本論文均尚未授權公開

簡易檢索 / 詳目顯示

相關論文