| 研究生: |
郭珍瑋 Kuo, Chen-Wei |
|---|---|
| 論文名稱: |
深度神經網絡加速之動態組合架構 A Dynamically Composable Architecture for Deep Neural Networks Accelerating |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 動態組合 、捲積神經網絡 |
| 外文關鍵詞: | Dynamic composition, Convolutional Neural Network, Bit level |
| 相關次數: | 點閱:62 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
高性能計算機結構設計的進步一直是捲積神經網絡和生成對抗網路快速發展的主要驅動力。我們利用捲積神經網絡的三種特性來引入一種新型加速架構。(1)捲積神經網絡主要是大規模並行乘法、加法的集合(2)減少運算位寬,不會降低精度 [1-5](3)為了保持準確性,捲積神經網絡之間運算資料位寬會有很大差異,甚至需要針對每層分別進行調整。因此,在固定位寬的加速器設計中可能會出現(1)使用最大位寬的運算單元,硬體計算中高位元使用效率不高或是(2)以精度換取硬體使用效率,最終精度下降,為了緩解這些缺陷而引入位寬融合和分解的概念,將其作為捲積神經網絡加速器設計中的思考方向。同時利用捲積神經網路平行性,加速捲積神經網路中的捲積層中的迴圈運算和降低資料搬運次數用以提高整體捲積神經網路架構的傳輸量。
本文利用運算可拆的特性,思考不同位寬的乘法運算皆可利用位寬較小的運算單元組合做運算,因而我們可以設計可擴充的組合運算結構。在可擴充的運算結構上特別針對組合模組做三種不同結構的設計,使用可擴充結構能更加靈活實現不同位寬的運算,並能保證在不同位寬下硬體應用到最佳。由此設計方法能盡可能地漸少硬體的浪費同時又不影響精度以及運算加速。
Advances in high-performance computer architecture design has been a major driver for the rapid evolution of Convolution-based DNNs (Deep Neural Networks, then CNNs) and GANs (Generative Adversarial Networks). We leverage the following three algorithmic properties of CNNs to introduce a novel acceleration architecture. (1) CNNs are mostly a collection of massively parallel multiply-adds. (2) The bit-width of these operations can be reduced with no loss in accuracy. [1-5](3) To preserve accuracy, the bit-width varies significantly across CNNs and may even be adjusted for each layer individually. We design three versions of computing acceleration components according to the above conditions.
[1] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," arXiv preprint arXiv:1606.06160, 2016.
[2] C. Zhu, S. Han, H. Mao, and W. J. Dally, "Trained ternary quantization," arXiv preprint arXiv:1612.01064, 2016.
[3] F. Li, B. Zhang, and B. Liu, "Ternary weight networks," arXiv preprint arXiv:1605.04711, 2016.
[4] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869-6898, 2017.
[5] A. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr, "WRPN: wide reduced-precision networks," arXiv preprint arXiv:1709.01134, 2017.
[6] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv preprint arXiv:1511.06434, 2015.
[7] H. Sharma et al., "Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network," in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018: IEEE, pp. 764-775.
[8] Y.-H. Chen, J. Emer, and V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 367-379, 2016.
[9] M. Sankaradas et al., "A massively parallel coprocessor for convolutional neural networks," in 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009: IEEE, pp. 53-60.
[10] V. Sriram, D. Cox, K. H. Tsoi, and W. Luk, "Towards an embedded biologically-inspired machine vision processor," in 2010 International Conference on Field-Programmable Technology, 2010: IEEE, pp. 273-278.
[11] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A dynamically configurable coprocessor for convolutional neural networks," in Proceedings of the 37th annual international symposium on Computer architecture, 2010, pp. 247-257.
[12] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 g-ops/s mobile coprocessor for deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 682-687.
[13] H.-J. Yoo, S. Park, K. Bong, D. Shin, J. Lee, and S. Choi, "A 1.93 tops/w scalable deep learning/inference processor with tetra-parallel mimd architecture for big data applications," in IEEE international solid-state circuits conference, 2015: IEEE, pp. 80-81.
[14] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, "Origami: A convolutional network accelerator," in Proceedings of the 25th edition on Great Lakes Symposium on VLSI, 2015, pp. 199-204.
[15] M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal, "Memory-centric accelerator design for convolutional neural networks," in 2013 IEEE 31st International Conference on Computer Design (ICCD), 2013: IEEE, pp. 13-19.
[16] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in International Conference on Machine Learning, 2015, pp. 1737-1746.
[17] Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, pp. 92-104.
[18] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, 2015, pp. 161-170.
[19] T. Chen et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269-284, 2014.
[20] Y. Chen et al., "Dadiannao: A machine-learning supercomputer," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014: IEEE, pp. 609-622.
[21] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[24] "example cifar10 cnn." https://keras.io/zh/examples/cifar10_cnn/ (accessed.
校內:2022-02-01公開