| 研究生: |
崔哲瑋 Tsui, Che-Wei |
|---|---|
| 論文名稱: |
用於可變尺寸變換器之可配置化加速器 Reconfigurable Accelerator for Variable Size Transformer |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 66 |
| 中文關鍵詞: | 變換器網路 、巢狀指令集架構 、脈動陣列 、可配置化加速器 |
| 外文關鍵詞: | Systolic array, Transformer model, Nested-loop instruction set architecture |
| 相關次數: | 點閱:6 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於注意力模型(Attention Model)在計算時常面臨指令數量龐大、難以完全暫存於晶片上記憶體,加速器需額外花費外部記憶體搬運時間。為解決此瓶頸,本研究根據注意力模型的重複性計算特性,提出一套客製化指令集架構,可使指令數量不再隨模型尺寸等比例成長。實驗結果顯示,相較於未優化設計,對 ViT-Huge 模型可節省約 300 倍以上的指令儲存空間,並提供約1.5 倍的效能提升。
此外,本論文亦提出專用之記憶體位址產生單元,以高效處理注意力模型中的記憶體位址管理,使有限的晶片上記憶體空間得以達到更高使用效率。此單元僅儲存少量模型資訊,即可於運算期間自主調整記憶體配置以避免資料衝突,提升單一記憶體 (unified SRAM) 計算的平行度。實驗結果顯示,結合指令產生策略與記憶體定址優化機制後,相較於現有加速器架構,可達到最高約 3.4 倍的效能增益。
In this work, we present an instruction-driven accelerator tailored for Transformer-based models. Traditional reconfigurable accelerators rely on customized instructions to define each computational function or operation, causing the instruction counts scale with model size. For large-scale models such as Vision Transformers (ViTs), an unoptimized instruction generation strategy generates an excessive number of instructions that need a large amount of on-chip memory and frequent instruction loading from external memory. To address this challenge, we exploit the repetitive structure inherent in attention-based architectures and introduce a reusable instruction generation strategy inspired by nested-for-loops. This approach significantly reduces the total number of instructions. Experimental results demonstrate that our method reduces the instruction count for ViT-Huge by a factor of 308. Additionally, unlike prior designs that allocate separate on-chip SRAMs for different computations, our accelerator adopts an unified on-chip SRAM for input data to avoid low memory utilization. To handle memory allocation under this design and prevent data conflicts between computations, we introduce a dedicated Address Generate Unit (AGU) for efficient memory management on large matrix tiling and nonlinear operations. This module only stores a small amount of known model information and autonomously schedules on-chip memory allocation to enhance storage utilization and minimize redundant off-chip data accesses. Our AGU module achieves up to a 1.7× improvement in frames per second (FPS) on the ViT-Huge model.
[1] K. Marino, P. Zhang, and V. K. Prasanna, “Me-vit: A single-load memory-efficient fpga accelerator for vision transformers,” in 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 213–223, IEEE, 2023.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Advances in neural information processing systems, vol. 30, 2017.
[3] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale,” arXiv preprint arXiv:2010.11929, 2020.
[4] S. Nag, G. Datta, S. Kundu, N. Chandrachoodan, and P. A. Beerel, “ViTA: A Vision Transformer Inference Accelerator for Edge Applications,” in 2023 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, IEEE, 2023.
[5] S. Liu, Z. Du, J. Tao, D. Han, T. Luo, Y. Xie, Y. Chen, and T. Chen, “Cambricon: An instruction set architecture for neural networks,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 393–405, 2016.
[6] Z. Liu, P. Yin, and Z. Ren, “An efficient fpga-based accelerator for swin transformer,”arXiv preprint arXiv:2308.13922, 2023.
[7] Y.-C. Wu, C.-H. Kuo, and C.-W. Tsui, “Save: Systolic array-based accelerator for vision transformer with efficient tiling strategy,” in 2025 International VLSI Symposium on Technology, Systems and Applications (VLSI TSA), pp. 1–4, IEEE, 2025.
[8] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
[9] S. Lu, M. Wang, S. Liang, J. Lin, and Z. Wang, “Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer,” in 2020 IEEE 33rd International System-on-Chip Conference (SOCC), pp. 84–89, IEEE, 2020.
[10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp. 4171–4186, 2019.
[11] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th annual international symposium on computer architecture, pp. 1–12, 2017.
[12] S.-Y. Kung, “Vlsi array processors,” IEEE ASSP Magazine, vol. 2, no. 3, pp. 4–22,1985.