| 研究生: |
洪浩宸 Hong, Hao-Chen |
|---|---|
| 論文名稱: |
形狀可重構式脈動陣列加速器之設計與效能分析 Design and Performance Analysis of a Shape-Reconfigurable Systolic Array Accelerator |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 96 |
| 中文關鍵詞: | 脈動陣列 、AI 加速器 、深度學習 |
| 外文關鍵詞: | Systolic array, AI accelerator, deep learning |
| 相關次數: | 點閱:3 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
脈動陣列 (Systolic Array) 能高效執行深度神經網路中的矩陣乘法,已廣泛應用於各式加速器設計中。然而,固定形狀的脈動陣列無法有效率地處理多樣化的矩陣維度。當脈動陣列形狀與矩陣尺寸不匹配時,處理單元的使用率會下降,資料搬移量增加,進而導致整體處理延遲增加。本研究提出形狀可重構式脈動陣列加速器 (Shape-Reconfigurable Systolic Array Accelerator, SRSAA),可以在執行過程中動態切換不同的陣列形狀。每次矩陣乘法運算都能採用較佳的陣列形狀,進而提升性能。實驗結果顯示,與相關作品相比,SRSAA 硬體效率最高可提升 4 倍。
Systolic arrays have been widely adopted for efficient matrix multiplication in deep neural networks. However, fixed array shapes cannot efficiently accommodate diverse matrix dimensions. When the array shape does not match the matrix dimensions, systolic array utilization decreases or data movement increases, leading to longer processing latency. We propose a Shape-Reconfigurable Systolic Array Accelerator (SRSAA) that dynamically switches array shapes. This shape-reconfigurability enables each matrix multiplication to utilize a better array shape, improving performance. Experimental results demonstrate that the proposed SRSAA achieves up to 4× improvement in hardware efficiency over related works.
[1] M. Hu, J. Fan, Y. Hu, R. Xu, and Y. Guo, “Modeling and Optimizing PE Utilization Rate for Systolic Array Based CNN Accelerators,” in Eighth International Conference on Electronic Technology and Information Science (ICETIS 2023), vol. 12715, pp. 435–443, SPIE, 2023.
[2] M. Elbtity, P. Chandarana, and R. Zand, “Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture,” arXiv preprint arXiv:2407.08700, 2024.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[4] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[5] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464–7475, 2023.
[6] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127–138, 2016.
[7] L. Wang, Z. Wang, J. Dang, C. Dang, Y. Cai, and H. Shi, “Design of FPGA-Based Reconfigurable Hardware Acceleration System,” in 2024 6th International Conference on Electronics and Communication, Network and Computer Technology (ECNCT), pp. 561–566, IEEE, 2024.
[8] F. N. Peccia, S. Pavlitska, T. Fleck, and O. Bringmann, “Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator,” in 2024 27th Euromicro Conference on Digital System Design (DSD), pp. 418–426, IEEE, 2024.
[9] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” in Proceedings of the 44th annual international symposium on computer architecture, pp. 1–12, 2017.
[10] C.-J. Lee and T. T. Yeh, “ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors,” ACM Transactions on Architecture and Code Optimization, vol. 21, no. 3, pp. 1–24, 2024.
[11] S. Choi, S. Park, J. Park, J. Kim, G. Koo, S. Hong, M. K. Yoon, and Y. Oh, “SAVector: Vectored Systolic Arrays,” IEEE Access, vol. 12, pp. 44446–44461, 2024.
[12] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 315–323, JMLR Workshop and Conference Proceedings, 2011.
[13] H. T. Kung and C. E. Leiserson, “Systolic Arrays (for VLSI),” in Sparse Matrix Proceedings 1978, vol. 1, pp. 256–282, Society for industrial and applied mathematics Philadelphia, PA, USA, 1979.