| 研究生: |
江冠霆 Chiang, Guan-Ting |
|---|---|
| 論文名稱: |
基於 FPGA 的異質多核 ISP 架構並整合 NPU 加速器 Integrating an FPGA-Based Heterogeneous Multicore ISP Architecture with NPU Accelerators |
| 指導教授: |
侯廷偉
Hou, Ting-Wei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 儲存內運算 、類神經網路加速器 、硬體整合 、系統軟體開發 |
| 外文關鍵詞: | In-Storage Processing, NPU, Hardware Integration, System Software Development |
| 相關次數: | 點閱:5 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究探討的是儲存內運算 (In-storage processing, ISP) 架構對於神經網路推論的效能增進,傳統 ISP 架構的目的在於解決伺服器與次級儲存裝置之間頻寬不足的問題,面對傳統的任務都已經有能力解決,但是面對現今的 AI 伺服器這個架構顯然受到挑戰。本研究的解決辦法,是引入加快神經網路推論的硬體加速器 (NPU)。本研究採用的是 Xilinx DPU 以及一款DPU-HLS4-ML 。最後效能評估環節,以量子分類案例以及圖片辨識案例去分析加入 NPU 的成效,分別是量子狀態的分類以及圖片辨識,前者是擁有萬筆資料的資料集,後者則是視覺辨識最常出現的 ImageNet 。最後的結果顯示加入 NPU 後效能遠勝於沒有加入的版本,並且兩個案例的深度學習模型包括了 DNN 以及 CNN 模型,這也突顯了只要是深度學模型都可以獲得加速效果。透過兩個不同的案例結果顯示引入 NPU 後大幅提高效能, FPGA 的使用資源也僅佔一半。
This work explores the use of In-Storage Processing (ISP) architectures to enhance neural network inference. While traditional ISP addresses bandwidth limitations between servers and secondary storage, it struggles to meet the demands of modern AI workloads. To address this, we integrate Neural Processing Units (NPUs), specifically the Xilinx DPU and a DPU-HLS4ML design, into the ISP framework. Performance is evaluated on two case studies: quantum state classification with tens of thousands of samples, and image recognition with ImageNet. Both deep neural networks (DNNs) and convolutional neural networks (CNNs) are tested. Results show that NPU integration outperforms the baseline without NPUs, while reducing FPGA resource usage by nearly half. These findings demonstrate that incorporating NPUs significantly accelerates diverse deep learning models in ISP systems, offering a cost-efficient and resource-efficient solution for AI servers.
[1] J. Johnson, M. Douze and H. Jégou, "Billion-Scale Similarity Search with GPUs," in IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535-547, 1 July 2021, doi: 10.1109/TBDATA.2019.2921572.
[2] M. Torabzadehkashi, S. Rezaei, A. Heydarigorji, H. Bobarshad, V. Alves and N. Bagherzadeh, "Catalina: In-Storage Processing Acceleration for Scalable Big Data Analytics," 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Pavia, Italy, 2019, pp. 430-437, doi: 10.1109/EMPDP.2019.8671589.
[3] Faiss[online].Available:https://github.com/facebookresearch/faiss
[4] “Datasets for approximate nearest neighbor search,” [online]. Available: http://corpus-texmex.irisa.fr
[5] M. Torabzadehkashi, S. Rezaei, V. Alves and N. Bagherzadeh, "CompStor: An In-storage Computation Platform for Scalable Distributed Processing," 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, BC, Canada, 2018, pp. 1260-1267, doi: 10.1109/IPDPSW.2018.00195.
[6] ZynqMP Technical Reference Manual [online] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm
[7] gzip/grep [online] https://linux.die.net/man/1/gzip / https://man7.org/linux/man-pages/man1/grep.1.html
[8] Alam, S., Yakopcic, C., Wu, Q., Barnell, M., Khan, S., and Taha, T. M. (2024). “Survey of Deep Learning Accelerators for Edge and Emerging Computing,”Electronics, 13(15), 2988. https://doi.org/10.3390/electronics13152988.
[9] Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, et al. (. “A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms,” ACM Comput. Surv. 57, 11, Article 286,November 2025, 39 pages. https://doi.org/10.1145/3729215
[10] N. Zhang, S. Ni, L. Chen, T. Wang and H. Chen, "High-Throughput and Energy-Efficient FPGA-Based Accelerator for All Adder Neural Networks," in IEEE Internet of Things Journal, vol. 12, no. 12, pp. 20357-20376, 15 June15, 2025, doi: 10.1109/JIOT.2025.3543213.
[11] S. Bouguezzi, H. B. Fredj, T. Belabed, C. Valderrama, H. Faiedh, and C. Souani, ‘‘An Efficient FPGA-Based Convolutional Neural Network For Classification: Ad-Mobilenet,’’ Electronics, vol. 10, no. 18, p. 2272, Sep. 2021.
[12] T.-K. Chi; T.-Y. Chen; Y.-C. Lin; et al. “An Edge Computing System with AMD Xilinx FPGA AI Customer Platform for Advanced Driver Assistance System,” Sensors 2024, 24(10), 3098. https://doi.org/10.3390/s24103098
[13] Y. Lu et al., "Automatic Implementation of Large-Scale CNNs on FPGA Cluster Based on HLS4ML," 2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Kaifeng, China, 2024, pp. 1080-1087, doi: 10.1109/ISPA63168.2024.00143.
[14] Ngadiuba, J., Loncar, V., Pierini, M., et al., “Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,”2021 Machine Learning: Science and Technology, vol.2, no.1, 015001.
[15] Baldi, Pierre, et al. "Jet Substructure Classification in High-Energy Physics With Deep Neural Networks." Physical Review, D 93.9 (2016): 094034.
[16] Lim, Sung Hak, and Mihoko M. Nojiri. "Morphology for Jet Classification," Physical Review, D 105.1 (2022): 014004.
[17] Yu, Zhongkai, et al. "Cambricon-llm: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70b llm." 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024.
[18] Farah, et al. “hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices,” arXiv preprint arXiv:2103.05579, 2021.
[19] DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide [online] https://docs.amd.com/r/en-US/pg338-dpu/Introduction?tocId=3xsG16y_QFTWvAJKHbisEw
[20] Vivado User guide [online] https://docs.amd.com/access/sources/dita/map?Doc_Version=2024.2%20English&url=ug888-vivado-design-flows-overview-tutorial
[21] Vitis Unified Software Platform Documentation [online] https://docs.amd.com/r/en-US/ug1400-vitis-embedded/Getting-Started-with-Vitis
[22] Petalinux [online] https://docs.amd.com/r/en-US/ug1144-petalinux-tools-reference-guide/Introduction
[23] AMBA Specification [online] https://www.arm.com/architecture/system-architectures/amba/amba-specifications
[24] Vitis High-Level Synthesis User Guide (UG1399) [online] https://docs.amd.com/r/en-US/ug1399-vitis-hls
[25] NumPy Array [online] https://github.com/oysteijo/npy_array
[26] Xilinx Vitis AI [online] https://xilinx.github.io/Vitis-AI/3.5/html/index.html#
[27] ImageNet [online] https://en.wikipedia.org/wiki/ImageNet
[28] The python profilers [online] https://docs.python.org/3/library/profile.html
[29] Perf-events [online] https://docs.kernel.org/arch/arm64/perf.html
[30] Inception[online]https://en.wikipedia.org/wiki/Inception_(deep_learning_architecture)
[31] Resnet-50 [online] https://en.wikipedia.org/wiki/Residual_neural_network