簡易檢索 / 詳目顯示

研究生: 戴源
Tai, Yuan
論文名稱: 通用於各內核大小卷積與矩陣運算之計算核心設計
Design of a Kernel-agnostic Compute Core for Convolution and GEMM
指導教授: 陳中和
Chen, Chung-Ho
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 101
中文關鍵詞: CNN 邊緣推論加速器深度優先運算1 * 1卷積核分解矩陣運算映射
外文關鍵詞: CNN Edge Inference Accelerator, Depth-first Computation, 1 * 1 Convolution Mapping, Matrix to Convolution Mapping
相關次數: 點閱:146下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 摘要 I 誌謝 XXII 目錄 XXIII 表目錄 XXV 圖目錄 XXV 第1章 序論 1 1.1 論文動機 1 1.2 論文挑戰 1 1.3 論文貢獻 2 1.4 論文架構 2 第2章 背景知識與相關研究 3 2.1 卷積神經網路 (Convolutional Neural Network) 3 2.1.1 卷積層 (Convolution Layer) 3 2.1.2 全連接層 (Fully Connected/Dense Layer) 6 2.1.3 池化層(Pooling Layer) 7 2.1.4 Depth-wise Separable Convolution 8 2.2 Vision Transformer (ViT) 10 2.2.1 Pre-Process 11 2.2.2 Encoder Stack 11 2.2.3 Post-Process 13 2.3 模型量化 (Quantization) 13 2.4 CASLab-DLA 15 2.4.1 CASLab-DLA Architecture 15 2.4.2 Process Element (PE) 16 2.4.3 On-chip Memory 16 第3章 設計方法 18 3.1 硬體架構 18 3.1.1 SoC架構 18 3.1.2 計算單元 (Compute Unit) 19 3.1.3 Input/Output地址單元 ( Input/Output Address Unit) 20 3.1.4 後處理單元 (PostProcess Unit) 23 3.1.5 記憶體配置 24 3.1.6 資料流 25 3.2 Micro-operation 26 3.2.1 Micro-operation屬性 26 3.2.2 運算映射 29 3.3 硬體成本模型 31 3.3.1 成本模型參數 31 3.3.2 分析運算操作 32 3.3.3 成本模型設計 42 第4章 實驗結果與效能分析 51 4.1 實驗環境與模型參數 51 4.1.1 實驗環境 51 4.1.2 Benchmark 模型參數 52 4.2 實驗結果 56 4.2.1 計算單元架構衍伸分析 57 4.2.2 CASLab-DLA效能比較 61 第5章 結論與未來展望 69 5.1 結論 69 5.2 未來展望 70 參考文獻 70

    [1]. Wu, T.J. A one-dimensional convolution accelerator supporting data reuse and multiple dimensional filters. NCKU Master Thesis. 2020.
    [2]. Hsiao, C.-C. Quantization Implementation for Neural Network Accelerator based on CNN Inference Engine. NCKU Master Thesis. 2021.
    [3]. Huang, H.-Q. Integration of Machine Learning Compiler Framework with Custom Instruction Set Architecture for CASLab-DLA. NCKU Master Thesis. 2022.
    [4]. Xie, C.-Y. Optimizations of CNN Micro-architecture and Memory Sub-system for CASLab-DLA. NCKU Master Thesis. 2022.
    [5]. Lo, T.-Y. Compressed Sparse Convolution Hardware and Software Co-design on CASLab-DLA. NCKU Master Thesis. 2023.
    [6]. Wang, H.-Y. Instruction Scheduling Optimization for Convolution Neural Network on Scalable CASLab-DLA – TVM System. NCKU Master Thesis. 2023.
    [7]. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., ... & Rakin, A. S. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
    [8]. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
    [9]. Redmon, J., & Farhadi, A. YOLOv3: An Incremental Improvement
    [10]. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
    [11]. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks.
    [12]. Simonyan, K., & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition.
    [13]. Li, T., Zhang, F., Fan, X., Shen, J., Guo, W., & Cao, W. Unified Accelerator for Attention and Convolution in Inference Based on FPGA
    [14]. You, H., Sun, Z., Shi, H., Yu, Z., Zhao, Y., Zhang, Y., Li, C., Li, B., & Lin, Y. ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
    [15]. Tuli, S., & Jha, N. K. AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers

    無法下載圖示
    2029-03-20公開
    電子論文及紙本論文均尚未授權公開
    QR CODE