簡易檢索 / 詳目顯示

研究生: 陳威宇
Chen, Wei-Yu
論文名稱: 動態多型串流AI加速器
Dynamic Polymorphic Multi-streaming AI Accelerator
指導教授: 周哲民
Jou, Jer-Min
郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 61
中文關鍵詞: 動態重構多型架構多串流並行架構串流控制加速器
外文關鍵詞: Dynamic reconfiguration, Polymorphic architecture, Multi-stream parallel architecture, Streaming control, Accelerator
相關次數: 點閱:42下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習技術的飛速發展,在語音辨識、圖像識別及自然語言處理等多個領域取得了顯著的進展,然而,除了神經網絡模型不斷擴大和複雜化,各種運算模型也如雨後春筍般出現,這對於加速器的靈活性是一個嚴峻的挑戰,如何高效處理大量數據並執行複雜的計算與對於各種複雜計算的適應力成為了一個重要的研究課題。
    本研究針對這一挑戰,提出了動態多型串流AI加速器架構,首先,為了在各種不同類型並行度的計算模型中取得高效能,架構整體採用了並行串流處理方式來處理資料流。且為了應對各種複雜串流的計算需求,架構包含了多形態硬體組織技術,能夠靈活組合成不同維度的運算單元。最後,為了使架構能在運算過程中根據實時串流需求動態調整硬體配置,結合了動態重構技術與情境微調,使架構能在運行中改變架構特性,從而大幅提升計算效率。

    With the rapid advancement of deep learning technology, significant progress has been made in areas such as speech recognition, image recognition, and natural language processing. However, alongside the continuous expansion and complexity of neural network models, various computational models have emerged in abundance, posing a severe challenge to the flexibility of accelerators. Efficiently processing large amounts of data and executing complex computations, while adapting to various intricate computational demands, has become a crucial research topic.
    To address this challenge, this study proposes a Dynamic Polymorphic Multi-streaming AI Accelerator architecture. Firstly, to achieve high performance across various types of parallel computation models, the architecture employs a parallel streaming processing approach to handle data streams. Additionally, to meet the computational demands of diverse complex streams, the architecture includes polymorphic hardware organization technology, which can flexibly combine into computational units of different dimensions. Lastly, to enable the architecture to dynamically adjust hardware configurations based on real-time stream demands during computation, it incorporates dynamic reconfiguration technology and contextual fine-tuning, significantly enhancing computational efficiency.

    摘要 I 英文延伸摘要 II 致謝 X 目錄 XI 表目錄 XIII 圖目錄 XIV 第一章 緒論 1 1.1 前言 1 1.2 研究動機與目的 1 1.3 論文架構 2 第二章 背景知識與相關研究 3 2.1 神經網路與深度學習模型架構回顧 3 2.1.1 深度神經網路 (Deep Neural Network, DNN) 3 2.1.2 卷積深度神經網路 (Convolutional Neural Network, CNN) 5 2.1.3 循環神經網路 (Recurrent Neural Networks, RNN) 6 2.1.4 Transformer 7 2.2 並行處理技術 10 2.2.1 多串流並行處理 10 2.2.2 脈動陣列 11 第三章 架構概述與多串流性架構設計 13 3.1 架構概述 13 3.2 多串流性並行系統設計 14 3.2.1 多串流性並行策略 14 3.2.2 多串流性控制單元設計 15 3.3 多層控制架構設計 17 3.3.1 管線控制設計 17 3.3.2 情境(Context)控制架構設計 19 第四章 多型架構設計 21 4.1 多型分類 21 4.1.1 依硬體架構區分 21 4.1.2 依多型之邏輯單元區分 22 4.1.3 依時程特性區分 22 4.2 多型架構 22 4.2.1 多型虛擬表 23 4.2.2 多型架構設計 24 4.3 多型串流映射方式 25 第五章 動態可重構性設計 28 5.1 架構之重構性 28 5.1.1 動態重構 …29 5.1.2 動態區域重疊重構 30 5.2 情境(context)指令動態微調 30 5.3 動態調度帶來的挑戰 31 第六章 實驗環境與數據分析 32 6.1 實驗環境 32 6.2 測試資料(Benchmark) 33 6.3 運算映射策略對執行效率的影響 34 6.3.1 切塊、多型架構與平行模式對執行效率的影響 34 6.3.2 並行方式選擇對執行效率的影響 37 6.4 動態部分重構對執行效率的影響 38 6.5 動態部分重構、多串流及多型架構之綜合性實驗 40 第七章 結論與未來展望 42 7.1 結論 42 7.2 未來展望 42 參考文獻 43

    [1] F. B. Tu et al., "TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes," (in English), IEEE J. Solid-State Circuit, Article vol. 58, no. 6, pp. 1798-1809, Jun 2023, doi: 10.1109/jssc.2022.3213542.
    [2] Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," (in English), IEEE J. Solid-State Circuit, Article vol. 52, no. 1, pp. 127-138, Jan 2017, doi: 10.1109/jssc.2016.2616357.
    [3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019, June 2, 2019 - June 7, 2019, Minneapolis, MN, United states, 2019, vol. 1: Association for Computational Linguistics (ACL), in NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, pp. 4171-4186.
    [4] V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," (in English), Proc. IEEE, Article vol. 105, no. 12, pp. 2295-2329, Dec 2017, doi: 10.1109/jproc.2017.2761740.
    [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," (in English), Commun. ACM, Article vol. 60, no. 6, pp. 84-90, Jun 2017, doi: 10.1145/3065386.
    [6] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in 3rd International Conference on Learning Representations, ICLR 2015, May 7, 2015 - May 9, 2015, San Diego, CA, United states, 2015: International Conference on Learning Representations, ICLR, in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.
    [7] N. D. S. Battula, H. R. Kambhampaty, Y. Vijayalata, and R. N. Ashlin Deepa, "Deep-learning Residual Network Based Image Analysis for An Efficient Two-Stage Recognition of Neurological Disorders," in 2nd International Conference for Innovation in Technology, INOCON 2023, March 3, 2023 - March 5, 2023, Bangalore, India, 2023: Institute of Electrical and Electronics Engineers Inc., in 2023 2nd International Conference for Innovation in Technology, INOCON 2023, doi: 10.1109/INOCON57975.2023.10101037.
    [8] S. Hochreiter and J. Schmidhuber, "Long short-term memory," (in English), Neural Comput., Article vol. 9, no. 8, pp. 1735-1780, Nov 1997, doi: 10.1162/neco.1997.9.8.1735.
    [9] A. Vaswani et al., "Attention is all you need," in 31st Annual Conference on Neural Information Processing Systems, NIPS 2017, December 4, 2017 - December 9, 2017, Long Beach, CA, United states, 2017, vol. 2017-December: Neural information processing systems foundation, in Advances in Neural Information Processing Systems, pp. 5999-6009.
    [10] F. Tu et al., "MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity," IEEE J. Solid-State Circuit, vol. 59, no. 1, pp. 90-101, 2024, doi: 10.1109/JSSC.2023.3305663.
    [11] Y.-H. Lai et al., "SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs," in 39th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2020, November 2, 2020 - November 5, 2020, Virtual, San Diego, CA, United states, 2020, vol. 2020-November: Institute of Electrical and Electronics Engineers Inc., in IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, p. ACM; IEEE, doi: 10.1145/3400302.3415644.
    [12] X. He et al., "Sparse-TPU: Adapting systolic arrays for sparse matrices," in 34th ACM International Conference on Supercomputing, ICS 2020, June 29, 2020 - July 2, 2020, Barcelona, Spain, 2020: Association for Computing Machinery, in Proceedings of the International Conference on Supercomputing, p. ACM Special Interest Group on Computer Architecture (SIGARCH), doi: 10.1145/3392717.3392751.
    [13] D.-H. Park et al., "A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator," IEEE J. Solid-State Circuit, vol. 55, no. 4, pp. 933-944, 2020, doi: 10.1109/JSSC.2019.2960480.
    [14] S. Pal et al., "OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator," in 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, February 24, 2018 - February 28, 2018, Vienna, Austria, 2018, vol. 2018-February: IEEE Computer Society, in Proceedings - International Symposium on High-Performance Computer Architecture, pp. 724-736, doi: 10.1109/HPCA.2018.00067.
    [15] Y. Dou, S. Vassiliadis, G. K. Kuzmanov, and G. N. Gaydadjiev, "64-bit floating-point FPGA matrix multiplication," in ACM/SIGDA Thirteenth ACM International Symposium on Field Programmable Gate Arrays - FPGA 2005, February 20, 2005 - February 22, 2005, Monterey, CA, United states, 2005: Association for Computing Machinery (ACM), in ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA, pp. 86-95, doi: 10.1145/1046192.1046204.

    下載圖示 校內:2025-07-31公開
    校外:2025-07-31公開
    QR CODE