簡易檢索 / 詳目顯示

研究生: 陳育萱
Chen, Yu-Hsuan
論文名稱: 利用對稱性濾波器之可重組資料路徑設計用於人工智慧晶片之硬體加速器
Hardware Accelerator for AI on Chip via Reconfigurable Data Path Design for Symmetrical Filters
指導教授: 李國君
Lee, Gwo-Giun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 70
中文關鍵詞: 可重組資料流模型硬體實現賈伯濾波器
外文關鍵詞: Reconfigurable, Dataflow Model, Hardware Implementation, Gabor Filter
相關次數: 點閱:102下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文實現用於計算轉換濾波器之硬體。我們使用特徵轉換法將賈伯濾波器轉換為轉換濾波器,基於轉換濾波器之對稱性,將乘上相同參數的輸入資料預先加總,以達到減少乘法運算之效果。我們根據數學式建立四種資料流模型作為硬體實現的雛型架構,並利用四種資料流模型之共同處實現對硬體架構的最佳化,為增加硬體可共用處,將部分模型之功能進行整合為一可重組模型,以此減少架構實現的面積成本。最後,根據演算法暨架構共同探索之四個指標,分析各架構之間的優劣。

    This paper implements a hardware that computes the transformed filters. We use Eigen-transformation approach to convert Gabor filters into transformed filters, we pre-add the input data which multiply the same coefficients to reduce the number of multiplications. According to the formula, we build four types of dataflow models as prototypes to implement hardware and utilize the commonality of four dataflow models to optimize the architecture, we integrate the functions of part model as a reconfigurable model to increase the common part of hardware and reduce the budget of cell area. Finally, we analyze the advantages and disadvantages between these architectures according to four indexes in algorithm architecture co-design.

    摘 要 i Abstract ii 誌 謝 iv Table of Contents vi List of Tables ix List of Figures x Chapter 1 Introduction 1 1.1 Objective 1 1.2 Motivation 1 1.3 Background Information 2 1.3.1 Machine Learning 3 1.3.1.1 Deep Learning 3 1.3.1.2 Convolution Neural Network (CNN) 4 1.3.2 Algorithm/Architecture Co-design (AAC) 7 1.3.3 Related Work 9 1.3.3.1 Google TPU 9 1.3.3.2 Eyeriss 10 1.3.4 Reconfigurability 11 1.4 Contributions of this Thesis 11 Chapter 2 Applied Methods 13 2.1 Gabor Filter 14 2.2 Eigen-Transformation Approach 15 2.3 Low Rank Approximation to Eigen-Transformation 22 Chapter 3 Architecture Design 25 3.1 Pre-add 25 3.2 Pre-add in Symmetrical Properties 26 3.3 Transformed Filters in Four Patterns 28 3.3.1 Dataflow 28 3.3.2 Sub-modules 30 3.3.2.1 Even Point Symmetry 30 3.3.2.2 Vertical Symmetry 31 3.3.2.3 Diagonal Symmetry 32 3.3.3 Commonality in Four Patterns 33 3.3.4 Architecture with Commonality 33 3.3.5 Reconfigurable of Even and Odd Module 34 3.3.6 Pipeline Design 35 3.4 Feeder Model 35 3.5 Complexity Analysis of Implementation 37 3.5.1 Number of Operations 38 3.5.2 Degree of Parallelism 38 3.5.3 Data Transfer 38 3.5.4 Data Storage 39 Chapter 4 Experimental Results 40 4.1 Filter Model Implementation 41 4.1.1 Original Model of Four Patterns 41 4.1.1.1 Accuracy of Transformed Filter 42 4.1.1.2 Dataflow Model 43 4.1.1.3 Implementation Results 47 4.1.1.4 Number of Operations 47 4.1.1.5 Degree of Parallelism 48 4.1.1.6 Data Transfer 48 4.1.1.7 Data Storage 48 4.1.2 Model via Commonality 49 4.1.2.1 Sharing Input Buffer 50 4.1.2.2 Sharing Even Point Symmetry 51 4.1.2.3 Commonality Model 52 4.1.2.4 Dataflow Model 52 4.1.2.5 Implementation Results 54 4.1.2.6 Number of Operations 54 4.1.2.7 Degree of Parallelism 54 4.1.2.8 Data Transfer 55 4.1.2.9 Data Storage 55 4.1.3 Architecture via Reconfigurable of Even/Odd Model 56 4.1.3.1 Dataflow Model 57 4.1.3.2 Implementation Results 58 4.1.3.3 Number of Operations 58 4.1.3.4 Degree of Parallelism 58 4.1.3.5 Data Transfer 59 4.1.3.6 Data Storage 59 4.1.4 Reference Architecture 60 4.1.4.1 Dataflow Model 61 4.1.4.2 Implementation Results 62 4.1.4.3 Number of Operations 62 4.1.4.4 Degree of Parallelism 62 4.1.4.5 Data Transfer 62 4.1.4.6 Data Storage 63 4.1.5 Pipeline Design 63 4.2 Architecture Comparison 63 Chapter 5 Conclusion and Future Work 65 5.1 Conclusion 65 5.2 Future Work 66 References 67

    [1] Chen, S.-Y., et al. Reconfigurable Edge via Analytics Architecture. in 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). 2019. IEEE.
    [2] Yang, Z.G., et al., Human-Mimetic Estimation of Food Volume from a Single-View RGB Image Using an AI System. Electronics, 2021. 10(13): p. 22.
    [3] Yue, G., T.L. Guo, and W. Dan, Multi-layered coding-based study on optimization algorithms for automobile production logistics scheduling. Technological Forecasting and Social Change, 2021. 170: p. 14.
    [4] You, X.H., et al., AI for 5G: research directions and paradigms. Science China-Information Sciences, 2019. 62(2): p. 13.
    [5] Yu, S.K., Application of artificial intelligence in physical education. International Journal of Electrical Engineering Education: p. 10.
    [6] Diaz, O., et al., Artificial intelligence in the medical physics community: An international survey. Physica Medica-European Journal of Medical Physics, 2021. 81: p. 141-146.
    [7] Wold, S., K. Esbensen, and P. Geladi, Principal component analysis. Chemometrics and intelligent laboratory systems, 1987. 2(1-3): p. 37-52.
    [8] Minsky, M. and S.A. Papert, Perceptrons: An introduction to computational geometry. 2017: MIT press.
    [9] Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning representations by back-propagating errors. nature, 1986. 323(6088): p. 533-536.
    [10] Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
    [11] Friedman, J., T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 2000. 28(2): p. 337-407.
    [12] Hearst, M.A., et al., Support vector machines. IEEE Intelligent Systems and their applications, 1998. 13(4): p. 18-28.
    [13] Gulshan, V., et al., Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 2016. 316(22): p. 2402-2410.
    [14] LeCun, Y., et al., Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. 86(11): p. 2278-2324.
    [15] Tschandl, P., The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, D.I.R.G. Vi, Editor. 2018, Harvard Dataverse.
    [16] Marin, J., et al., Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE transactions on pattern analysis and machine intelligence, 2019. 43(1): p. 187-203.
    [17] Shawahna, A., S.M. Sait, and A. El-Maleh, FPGA-based accelerators of deep learning networks for learning and classification: A review. IEEE Access, 2018. 7: p. 7823-7859.
    [18] Chang, H.S., et al., Google deep mind’s alphago. OR/MS Today, 2016. 43(5): p. 24-29.
    [19] Gao, H., et al., Object classification using CNN-based fusion of vision and LIDAR in autonomous vehicle environment. IEEE Transactions on Industrial Informatics, 2018. 14(9): p. 4224-4231.
    [20] Galvez, R.L., et al. Object detection using convolutional neural networks. in TENCON 2018-2018 IEEE Region 10 Conference. 2018. IEEE.
    [21] Bao, L., B. Wu, and W. Liu. CNN in MRF: Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
    [22] Krizhevsky, A., I. Sutskever, and G.E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012. 25: p. 1097-1105.
    [23] Howard, A.G., et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
    [24] Huang, G., et al. Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [25] He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [26] Olah, C., A. Mordvintsev, and L. Schubert, Feature visualization. Distill, 2017. 2(11): p. e7.
    [27] Lee, G.G.C., C.-F. Chen, and T.-P. Wang, System-on-Chip Architectures for Data Analytics, in Handbook of Signal Processing Systems. 2019, Springer. p. 543-575.
    [28] Lee, G.G., et al., Algorithm/architecture co-exploration of visual computing on emergent platforms: Overview and future prospects. IEEE transactions on circuits and systems for video technology, 2009. 19(11): p. 1576-1587.
    [29] Jouppi, N.P., et al. In-datacenter performance analysis of a tensor processing unit. in Proceedings of the 44th annual international symposium on computer architecture. 2017.
    [30] Chen, Y.-H., et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 2016. 52(1): p. 127-138.
    [31] Lee, G.G.C., et al., Complexity-aware Gabor filter bank architecture using principal component analysis. Journal of Signal Processing Systems, 2017. 89(3): p. 431-444.
    [32] Gabor, D., Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 1946. 93(26): p. 429-441.
    [33] Amdahl, G.M. Validity of the single processor approach to achieving large scale computing capabilities. in Proceedings of the April 18-20, 1967, spring joint computer conference. 1967.

    無法下載圖示 校內:2026-10-25公開
    校外:2026-10-25公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE