| 研究生: |
陳威宇 Chen, Wei-Yu |
|---|---|
| 論文名稱: |
動態多型串流AI加速器 Dynamic Polymorphic Multi-streaming AI Accelerator |
| 指導教授: |
周哲民
Jou, Jer-Min 郭致宏 Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 動態重構 、多型架構 、多串流並行架構 、串流控制 、加速器 |
| 外文關鍵詞: | Dynamic reconfiguration, Polymorphic architecture, Multi-stream parallel architecture, Streaming control, Accelerator |
| 相關次數: | 點閱:42 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習技術的飛速發展,在語音辨識、圖像識別及自然語言處理等多個領域取得了顯著的進展,然而,除了神經網絡模型不斷擴大和複雜化,各種運算模型也如雨後春筍般出現,這對於加速器的靈活性是一個嚴峻的挑戰,如何高效處理大量數據並執行複雜的計算與對於各種複雜計算的適應力成為了一個重要的研究課題。
本研究針對這一挑戰,提出了動態多型串流AI加速器架構,首先,為了在各種不同類型並行度的計算模型中取得高效能,架構整體採用了並行串流處理方式來處理資料流。且為了應對各種複雜串流的計算需求,架構包含了多形態硬體組織技術,能夠靈活組合成不同維度的運算單元。最後,為了使架構能在運算過程中根據實時串流需求動態調整硬體配置,結合了動態重構技術與情境微調,使架構能在運行中改變架構特性,從而大幅提升計算效率。
With the rapid advancement of deep learning technology, significant progress has been made in areas such as speech recognition, image recognition, and natural language processing. However, alongside the continuous expansion and complexity of neural network models, various computational models have emerged in abundance, posing a severe challenge to the flexibility of accelerators. Efficiently processing large amounts of data and executing complex computations, while adapting to various intricate computational demands, has become a crucial research topic.
To address this challenge, this study proposes a Dynamic Polymorphic Multi-streaming AI Accelerator architecture. Firstly, to achieve high performance across various types of parallel computation models, the architecture employs a parallel streaming processing approach to handle data streams. Additionally, to meet the computational demands of diverse complex streams, the architecture includes polymorphic hardware organization technology, which can flexibly combine into computational units of different dimensions. Lastly, to enable the architecture to dynamically adjust hardware configurations based on real-time stream demands during computation, it incorporates dynamic reconfiguration technology and contextual fine-tuning, significantly enhancing computational efficiency.
[1] F. B. Tu et al., "TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes," (in English), IEEE J. Solid-State Circuit, Article vol. 58, no. 6, pp. 1798-1809, Jun 2023, doi: 10.1109/jssc.2022.3213542.
[2] Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," (in English), IEEE J. Solid-State Circuit, Article vol. 52, no. 1, pp. 127-138, Jan 2017, doi: 10.1109/jssc.2016.2616357.
[3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019, June 2, 2019 - June 7, 2019, Minneapolis, MN, United states, 2019, vol. 1: Association for Computational Linguistics (ACL), in NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, pp. 4171-4186.
[4] V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," (in English), Proc. IEEE, Article vol. 105, no. 12, pp. 2295-2329, Dec 2017, doi: 10.1109/jproc.2017.2761740.
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," (in English), Commun. ACM, Article vol. 60, no. 6, pp. 84-90, Jun 2017, doi: 10.1145/3065386.
[6] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in 3rd International Conference on Learning Representations, ICLR 2015, May 7, 2015 - May 9, 2015, San Diego, CA, United states, 2015: International Conference on Learning Representations, ICLR, in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.
[7] N. D. S. Battula, H. R. Kambhampaty, Y. Vijayalata, and R. N. Ashlin Deepa, "Deep-learning Residual Network Based Image Analysis for An Efficient Two-Stage Recognition of Neurological Disorders," in 2nd International Conference for Innovation in Technology, INOCON 2023, March 3, 2023 - March 5, 2023, Bangalore, India, 2023: Institute of Electrical and Electronics Engineers Inc., in 2023 2nd International Conference for Innovation in Technology, INOCON 2023, doi: 10.1109/INOCON57975.2023.10101037.
[8] S. Hochreiter and J. Schmidhuber, "Long short-term memory," (in English), Neural Comput., Article vol. 9, no. 8, pp. 1735-1780, Nov 1997, doi: 10.1162/neco.1997.9.8.1735.
[9] A. Vaswani et al., "Attention is all you need," in 31st Annual Conference on Neural Information Processing Systems, NIPS 2017, December 4, 2017 - December 9, 2017, Long Beach, CA, United states, 2017, vol. 2017-December: Neural information processing systems foundation, in Advances in Neural Information Processing Systems, pp. 5999-6009.
[10] F. Tu et al., "MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity," IEEE J. Solid-State Circuit, vol. 59, no. 1, pp. 90-101, 2024, doi: 10.1109/JSSC.2023.3305663.
[11] Y.-H. Lai et al., "SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs," in 39th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2020, November 2, 2020 - November 5, 2020, Virtual, San Diego, CA, United states, 2020, vol. 2020-November: Institute of Electrical and Electronics Engineers Inc., in IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, p. ACM; IEEE, doi: 10.1145/3400302.3415644.
[12] X. He et al., "Sparse-TPU: Adapting systolic arrays for sparse matrices," in 34th ACM International Conference on Supercomputing, ICS 2020, June 29, 2020 - July 2, 2020, Barcelona, Spain, 2020: Association for Computing Machinery, in Proceedings of the International Conference on Supercomputing, p. ACM Special Interest Group on Computer Architecture (SIGARCH), doi: 10.1145/3392717.3392751.
[13] D.-H. Park et al., "A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator," IEEE J. Solid-State Circuit, vol. 55, no. 4, pp. 933-944, 2020, doi: 10.1109/JSSC.2019.2960480.
[14] S. Pal et al., "OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator," in 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, February 24, 2018 - February 28, 2018, Vienna, Austria, 2018, vol. 2018-February: IEEE Computer Society, in Proceedings - International Symposium on High-Performance Computer Architecture, pp. 724-736, doi: 10.1109/HPCA.2018.00067.
[15] Y. Dou, S. Vassiliadis, G. K. Kuzmanov, and G. N. Gaydadjiev, "64-bit floating-point FPGA matrix multiplication," in ACM/SIGDA Thirteenth ACM International Symposium on Field Programmable Gate Arrays - FPGA 2005, February 20, 2005 - February 22, 2005, Monterey, CA, United states, 2005: Association for Computing Machinery (ACM), in ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA, pp. 86-95, doi: 10.1145/1046192.1046204.