成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳育萱 Chen, Yu-Hsuan
論文名稱：	利用對稱性濾波器之可重組資料路徑設計用於人工智慧晶片之硬體加速器 Hardware Accelerator for AI on Chip via Reconfigurable Data Path Design for Symmetrical Filters
指導教授：	李國君 Lee, Gwo-Giun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	70
中文關鍵詞：	可重組、資料流模型、硬體實現、賈伯濾波器
外文關鍵詞：	Reconfigurable, Dataflow Model, Hardware Implementation, Gabor Filter
相關次數：	點閱：102 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文實現用於計算轉換濾波器之硬體。我們使用特徵轉換法將賈伯濾波器轉換為轉換濾波器，基於轉換濾波器之對稱性，將乘上相同參數的輸入資料預先加總，以達到減少乘法運算之效果。我們根據數學式建立四種資料流模型作為硬體實現的雛型架構，並利用四種資料流模型之共同處實現對硬體架構的最佳化，為增加硬體可共用處，將部分模型之功能進行整合為一可重組模型，以此減少架構實現的面積成本。最後，根據演算法暨架構共同探索之四個指標，分析各架構之間的優劣。

This paper implements a hardware that computes the transformed filters. We use Eigen-transformation approach to convert Gabor filters into transformed filters, we pre-add the input data which multiply the same coefficients to reduce the number of multiplications. According to the formula, we build four types of dataflow models as prototypes to implement hardware and utilize the commonality of four dataflow models to optimize the architecture, we integrate the functions of part model as a reconfigurable model to increase the common part of hardware and reduce the budget of cell area. Finally, we analyze the advantages and disadvantages between these architectures according to four indexes in algorithm architecture co-design.

摘  要	i
Abstract	ii
誌  謝	iv
Table of Contents	vi
List of Tables	ix
List of Figures	x
Chapter 1	Introduction	1
1	Objective	1
2	Motivation	1
3	Background Information	2
3.1	Machine Learning	3
3.1.1	Deep Learning	3
3.1.2	Convolution Neural Network (CNN)	4
3.2	Algorithm/Architecture Co-design (AAC)	7
3.3	Related Work	9
3.3.1	Google TPU	9
3.3.2	Eyeriss	10
3.4	Reconfigurability	11
4	Contributions of this Thesis	11
Chapter 2	Applied Methods	13
1	Gabor Filter	14
2	Eigen-Transformation Approach	15
3	Low Rank Approximation to Eigen-Transformation	22
Chapter 3	Architecture Design	25
1	Pre-add	25
2	Pre-add in Symmetrical Properties	26
3	Transformed Filters in Four Patterns	28
3.1	Dataflow	28
3.2	Sub-modules	30
3.2.1	Even Point Symmetry	30
3.2.2	Vertical Symmetry	31
3.2.3	Diagonal Symmetry	32
3.3	Commonality in Four Patterns	33
3.4	Architecture with Commonality	33
3.5	Reconfigurable of Even and Odd Module	34
3.6	Pipeline Design	35
4	Feeder Model	35
5	Complexity Analysis of Implementation	37
5.1	Number of Operations	38
5.2	Degree of Parallelism	38
5.3	Data Transfer	38
5.4	Data Storage	39
Chapter 4	Experimental Results	40
1	Filter Model Implementation	41
1.1	Original Model of Four Patterns	41
1.1.1	Accuracy of Transformed Filter	42
1.1.2	Dataflow Model	43
1.1.3	Implementation Results	47
1.1.4	Number of Operations	47
1.1.5	Degree of Parallelism	48
1.1.6	Data Transfer	48
1.1.7	Data Storage	48
1.2	Model via Commonality	49
1.2.1	Sharing Input Buffer	50
1.2.2	Sharing Even Point Symmetry	51
1.2.3	Commonality Model	52
1.2.4	Dataflow Model	52
1.2.5	Implementation Results	54
1.2.6	Number of Operations	54
1.2.7	Degree of Parallelism	54
1.2.8	Data Transfer	55
1.2.9	Data Storage	55
1.3	Architecture via Reconfigurable of Even/Odd Model	56
1.3.1	Dataflow Model	57
1.3.2	Implementation Results	58
1.3.3	Number of Operations	58
1.3.4	Degree of Parallelism	58
1.3.5	Data Transfer	59
1.3.6	Data Storage	59
1.4	Reference Architecture	60
1.4.1	Dataflow Model	61
1.4.2	Implementation Results	62
1.4.3	Number of Operations	62
1.4.4	Degree of Parallelism	62
1.4.5	Data Transfer	62
1.4.6	Data Storage	63
1.5	Pipeline Design	63
2	Architecture Comparison	63
Chapter 5	Conclusion and Future Work	65
1	Conclusion	65
2	Future Work	66
References	67
                                    

[1] Chen, S.-Y., et al. Reconfigurable Edge via Analytics Architecture. in 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). 2019. IEEE.
[2] Yang, Z.G., et al., Human-Mimetic Estimation of Food Volume from a Single-View RGB Image Using an AI System. Electronics, 2021. 10(13): p. 22.
[3] Yue, G., T.L. Guo, and W. Dan, Multi-layered coding-based study on optimization algorithms for automobile production logistics scheduling. Technological Forecasting and Social Change, 2021. 170: p. 14.
[4] You, X.H., et al., AI for 5G: research directions and paradigms. Science China-Information Sciences, 2019. 62(2): p. 13.
[5] Yu, S.K., Application of artificial intelligence in physical education. International Journal of Electrical Engineering Education: p. 10.
[6] Diaz, O., et al., Artificial intelligence in the medical physics community: An international survey. Physica Medica-European Journal of Medical Physics, 2021. 81: p. 141-146.
[7] Wold, S., K. Esbensen, and P. Geladi, Principal component analysis. Chemometrics and intelligent laboratory systems, 1987. 2(1-3): p. 37-52.
[8] Minsky, M. and S.A. Papert, Perceptrons: An introduction to computational geometry. 2017: MIT press.
[9] Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning representations by back-propagating errors. nature, 1986. 323(6088): p. 533-536.
[10] Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
[11] Friedman, J., T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 2000. 28(2): p. 337-407.
[12] Hearst, M.A., et al., Support vector machines. IEEE Intelligent Systems and their applications, 1998. 13(4): p. 18-28.
[13] Gulshan, V., et al., Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 2016. 316(22): p. 2402-2410.
[14] LeCun, Y., et al., Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. 86(11): p. 2278-2324.
[15] Tschandl, P., The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, D.I.R.G. Vi, Editor. 2018, Harvard Dataverse.
[16] Marin, J., et al., Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE transactions on pattern analysis and machine intelligence, 2019. 43(1): p. 187-203.
[17] Shawahna, A., S.M. Sait, and A. El-Maleh, FPGA-based accelerators of deep learning networks for learning and classification: A review. IEEE Access, 2018. 7: p. 7823-7859.
[18] Chang, H.S., et al., Google deep mind’s alphago. OR/MS Today, 2016. 43(5): p. 24-29.
[19] Gao, H., et al., Object classification using CNN-based fusion of vision and LIDAR in autonomous vehicle environment. IEEE Transactions on Industrial Informatics, 2018. 14(9): p. 4224-4231.
[20] Galvez, R.L., et al. Object detection using convolutional neural networks. in TENCON 2018-2018 IEEE Region 10 Conference. 2018. IEEE.
[21] Bao, L., B. Wu, and W. Liu. CNN in MRF: Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[22] Krizhevsky, A., I. Sutskever, and G.E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012. 25: p. 1097-1105.
[23] Howard, A.G., et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[24] Huang, G., et al. Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[25] He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[26] Olah, C., A. Mordvintsev, and L. Schubert, Feature visualization. Distill, 2017. 2(11): p. e7.
[27] Lee, G.G.C., C.-F. Chen, and T.-P. Wang, System-on-Chip Architectures for Data Analytics, in Handbook of Signal Processing Systems. 2019, Springer. p. 543-575.
[28] Lee, G.G., et al., Algorithm/architecture co-exploration of visual computing on emergent platforms: Overview and future prospects. IEEE transactions on circuits and systems for video technology, 2009. 19(11): p. 1576-1587.
[29] Jouppi, N.P., et al. In-datacenter performance analysis of a tensor processing unit. in Proceedings of the 44th annual international symposium on computer architecture. 2017.
[30] Chen, Y.-H., et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 2016. 52(1): p. 127-138.
[31] Lee, G.G.C., et al., Complexity-aware Gabor filter bank architecture using principal component analysis. Journal of Signal Processing Systems, 2017. 89(3): p. 431-444.
[32] Gabor, D., Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 1946. 93(26): p. 429-441.
[33] Amdahl, G.M. Validity of the single processor approach to achieving large scale computing capabilities. in Proceedings of the April 18-20, 1967, spring joint computer conference. 1967.

校內：2026-10-25公開
校外：2026-10-25公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文