簡易檢索 / 詳目顯示

研究生: 郭珍瑋
Kuo, Chen-Wei
論文名稱: 深度神經網絡加速之動態組合架構
A Dynamically Composable Architecture for Deep Neural Networks Accelerating
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 70
中文關鍵詞: 動態組合捲積神經網絡
外文關鍵詞: Dynamic composition, Convolutional Neural Network, Bit level
相關次數: 點閱:62下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 高性能計算機結構設計的進步一直是捲積神經網絡和生成對抗網路快速發展的主要驅動力。我們利用捲積神經網絡的三種特性來引入一種新型加速架構。(1)捲積神經網絡主要是大規模並行乘法、加法的集合(2)減少運算位寬,不會降低精度 [1-5](3)為了保持準確性,捲積神經網絡之間運算資料位寬會有很大差異,甚至需要針對每層分別進行調整。因此,在固定位寬的加速器設計中可能會出現(1)使用最大位寬的運算單元,硬體計算中高位元使用效率不高或是(2)以精度換取硬體使用效率,最終精度下降,為了緩解這些缺陷而引入位寬融合和分解的概念,將其作為捲積神經網絡加速器設計中的思考方向。同時利用捲積神經網路平行性,加速捲積神經網路中的捲積層中的迴圈運算和降低資料搬運次數用以提高整體捲積神經網路架構的傳輸量。
    本文利用運算可拆的特性,思考不同位寬的乘法運算皆可利用位寬較小的運算單元組合做運算,因而我們可以設計可擴充的組合運算結構。在可擴充的運算結構上特別針對組合模組做三種不同結構的設計,使用可擴充結構能更加靈活實現不同位寬的運算,並能保證在不同位寬下硬體應用到最佳。由此設計方法能盡可能地漸少硬體的浪費同時又不影響精度以及運算加速。

    Advances in high-performance computer architecture design has been a major driver for the rapid evolution of Convolution-based DNNs (Deep Neural Networks, then CNNs) and GANs (Generative Adversarial Networks). We leverage the following three algorithmic properties of CNNs to introduce a novel acceleration architecture. (1) CNNs are mostly a collection of massively parallel multiply-adds. (2) The bit-width of these operations can be reduced with no loss in accuracy. [1-5](3) To preserve accuracy, the bit-width varies significantly across CNNs and may even be adjusted for each layer individually. We design three versions of computing acceleration components according to the above conditions.

    目錄 摘要 III A Dynamically Composable Architecture for Deep Neural Networks Accelerating IV SUMMARY IV 致謝 XV 目錄 XVI 圖目錄 XIX 表目錄 XXII 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 1 1.3 論文架構 1 第2章 捲積神經網路背景知識與相關研究 3 2.1 捲積神經網路理論 3 2.2 捲積神經網路架構 3 2.2.1 前向傳播(Forward-Propagation,FP) 4 2.2.1.1 捲積層(Convolution layer , CONV) 4 2.2.1.2 步伐(stride , S) &填充(Padding) 5 2.2.1.3 激活函數(Activation function ,Act. func.) 7 2.2.1.4 池化層(Pooling layer ,Pool) 7 2.2.1.5 全連接層(Fully Connected Layer,FC) 10 2.2.2 反向傳播 11 2.2.2.1 全連接層(Fully Connected Layer,FC) 11 2.2.2.2 池化層(Pooling layer ,Pool) 12 2.2.2.3 捲積層(Convolution layer , CONV) 15 2.2.3 權重更新 16 2.2.3.1 全連接層(Fully Connected Layer,FC) 16 2.2.3.2 捲積層(Convolution layer , CONV) 16 2.3 生成對抗網路 17 第3章 動態組合乘法器設計 18 3.1 動態位寬乘法器設計探討 18 3.2 可擴充2位寬乘法器設計 18 3.2.1 證明可擴充2位寬乘法器 19 3.3 可擴充4位寬乘法器設計 20 3.3.1 證明可擴充4位寬乘法器 21 3.4 可擴充8位寬乘法器設計 22 3.5 可擴充2n位寬乘法器設計 22 3.6 可擴充64位寬乘法器設計 23 3.7 擴充介面模組設計 24 3.7.1 4位寬擴充介面模組設計(4-bits EIM) 24 3.7.2 8位寬擴充介面模組設計(8-bits EIM) 26 3.7.3 2n位寬擴充介面模組設計 (2n-bits EIM) 27 3.8 新版擴充介面模組設計 28 3.8.1 新版4位寬擴充介面模組設計 (New 4-bits EIM) 29 3.8.2 新版8位寬擴充介面模組設計 (New 8-bits EIM) 30 3.8.3 新版2n位寬擴充介面模組設計(New 2n-bits EIM) 32 3.9 更新版擴充介面模組設計 32 3.9.1 更新版8位寬擴充介面模組設計(Improve new 8-bits EIM) 32 3.10 擴充介面模組三種版本比較 34 第4章 動態位寬組合架構設計 38 4.1 硬體架構圖 38 4.2 Network Controller 39 4.3 Data Controller 44 4.4 Operation Controller 46 第5章 實驗環境與數據分析 49 5.1 實驗環境 49 5.2 實驗環境與實驗方式 50 5.3 軟體模擬 51 5.3.1 使用python建構捲積神經網路 51 5.3.2 使用C進行捲積網路中各層運算模擬 54 5.4 硬體實驗結果 56 5.4.1 乘法器硬體模擬結果 56 5.4.2 動態位寬組合架構硬體模擬結果 61 5.5 IC晶片 Layout 66 第6章 結論與未來展望 67 參考文獻 69

    [1] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," arXiv preprint arXiv:1606.06160, 2016.
    [2] C. Zhu, S. Han, H. Mao, and W. J. Dally, "Trained ternary quantization," arXiv preprint arXiv:1612.01064, 2016.
    [3] F. Li, B. Zhang, and B. Liu, "Ternary weight networks," arXiv preprint arXiv:1605.04711, 2016.
    [4] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869-6898, 2017.
    [5] A. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr, "WRPN: wide reduced-precision networks," arXiv preprint arXiv:1709.01134, 2017.
    [6] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv preprint arXiv:1511.06434, 2015.
    [7] H. Sharma et al., "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018: IEEE, pp. 764-775.
    [8] Y.-H. Chen, J. Emer, and V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 367-379, 2016.
    [9] M. Sankaradas et al., "A massively parallel coprocessor for convolutional neural networks," in 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009: IEEE, pp. 53-60.
    [10] V. Sriram, D. Cox, K. H. Tsoi, and W. Luk, "Towards an embedded biologically-inspired machine vision processor," in 2010 International Conference on Field-Programmable Technology, 2010: IEEE, pp. 273-278.
    [11] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A dynamically configurable coprocessor for convolutional neural networks," in Proceedings of the 37th annual international symposium on Computer architecture, 2010, pp. 247-257.
    [12] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 g-ops/s mobile coprocessor for deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 682-687.
    [13] H.-J. Yoo, S. Park, K. Bong, D. Shin, J. Lee, and S. Choi, "A 1.93 tops/w scalable deep learning/inference processor with tetra-parallel mimd architecture for big data applications," in IEEE international solid-state circuits conference, 2015: IEEE, pp. 80-81.
    [14] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, "Origami: A convolutional network accelerator," in Proceedings of the 25th edition on Great Lakes Symposium on VLSI, 2015, pp. 199-204.
    [15] M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal, "Memory-centric accelerator design for convolutional neural networks," in 2013 IEEE 31st International Conference on Computer Design (ICCD), 2013: IEEE, pp. 13-19.
    [16] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in International Conference on Machine Learning, 2015, pp. 1737-1746.
    [17] Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, pp. 92-104.
    [18] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, 2015, pp. 161-170.
    [19] T. Chen et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269-284, 2014.
    [20] Y. Chen et al., "Dadiannao: A machine-learning supercomputer," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014: IEEE, pp. 609-622.
    [21] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
    [23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [24] "example cifar10 cnn." https://keras.io/zh/examples/cifar10_cnn/ (accessed.

    無法下載圖示 校內:2022-02-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE