簡易檢索 / 詳目顯示

研究生: 廖祥宇
Liao, Xiang-Yu
論文名稱: 基於CSR的反捲積加速器設計
CSR-based Deconvolution Accelerator Design
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 57
中文關鍵詞: 反捲積捲積CSR格式稀疏神經網路
外文關鍵詞: Deconvolution, CSR format, Convolution, Sparse neural network
相關次數: 點閱:41下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近代深度神經網路主要是基於機器學習的演算法,並發展出有如捲積神經網路(CNN)[1]、生成對抗神經網路(DCGANs)[2]等神經網路,廣泛的應用於人工智慧、圖像分類等領域。深度神經網路模型在應用中具有的顯著的準確性,但是大量的運算難以在資源有限的嵌入式硬體上執行;因此,對深度神經網路進行資料壓縮是必要的,藉此減少多餘不必要的資料,降低深度神經網路的運算量與硬體功耗。但是若以傳統儲存方式處理壓縮過後的資料將會造成多餘的記憶體儲存量,因此對於壓縮過後的資料以稀疏壓縮格式進行轉換並儲存以避免造成記憶體儲存過多冗餘的參數。此外,反捲積的資料訓練時使用的重複性高,相對的存取記憶體次數就多,因此在這種資料訓練時將會造成不必要功率消耗和計算延遲。
    本論文針對反捲積(Deconvolution, DCNN)加速器架構進行設計,將深度神經網路中存在冗餘的稀疏權值藉由(Comprass Row Storage, CSR)壓縮編碼格式以該權值所在行、列和值的儲存方式將非零權值保存下來,藉此降低神經網路中的運算次數。除此之外,我們可以利用透過CSR方法保存下來的權值進行資料重複使用的設計,減少對記憶體的存取。
    本文的實驗中,使用了Cyclone IV GX EP4CGX150DF31C7 FPGA板來表示加速稀疏反捲積的加速效果,從實驗數據得知,此架構能夠提升2.2倍的效能提升,並且降低大約80%的冗餘運算,且能減少對記憶體的存取進而降低整體的功耗。

    In recent years, CNN accelerators have been proposed for optimizing then performance and power efficiency. However, there has been little research on the design of deconvolutional neural networks (DCNNs). In the conventional way of image deconvolution, the number of multiplications is very high. Thereby the computational complexity is also high. Weight prunning can compress DCNNs models by removing redundant parameters in the networks. Prunning can remove more than 80% of the parameters, it actually improve the performance. Two major problems cause this unsatisfactory performance on FPGA. First, DCNNs onto matrix multiplication reduces data reuse opportunities. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when processing massively computational DCNNs. Evaluation on a Cyclone II FPGA board show that our acceleration can improve sparse deconvolution speedup by 2.2 and reduce at least 80% of MAC. DRAM access is critical for DCNNs performance as well as a major contributor to system energy consumption. minimizing the number of access external DRAM times can decline power consumption.

    摘要 II SUMMARY III INTRODUCTION IV CONCLUSION VIII 致謝 IX 目錄 X 表目錄 XII 圖目錄 XIII 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 論文架構 4 第2章 背景知識與相關研究 5 2.1 神經網路的基本概念 5 2.1.1 神經網路的基本組成單元 6 2.1.2 深度神經網路 9 2.1.3 捲積神經網路 10 2.1.4 池化層(Pooling Layer) 13 2.1.5 全連接層(Fully Connect Layer) 14 2.2 轉置捲積(反捲積)的介紹 16 2.2.1 反捲積神經網路應用 17 2.2.2 Fully Connect Network(FCN) 19 2.3 深度神經網路壓縮(Model Compression) 21 2.3.1 模型壓縮 21 2.2.2 Compress Row Storage(CSR)儲存格式 23 2.3.3 Coordinate Format(Coo) 24 2.3.4 壓縮率比較 25 第3章 反捲積加速算法的分析 28 3.1 神經網路預測算法分析 28 3.2 捲積層計算方式 29 3.3 反捲積層 30 3.4 傳統剪枝技術下神經網路模型的變化 32 3.5 記憶體存取效能影響 33 第4章 基於CSR格式反捲積架構 35 4.1 硬體架構 35 4.2 稀疏矩陣參數處理 36 4.3 硬體實現 38 4.4 記憶體存取優化 41 4.5 Processing Unit 硬體架構 44 第5章 實驗結果與分析討論 46 5.1 系統架構 46 第6章 結論與未來展望 53 參考文獻 54

    [1] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
    [2] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    [3] V. Papyan, Y. Romano, and M. Elad. Convolutional neural networks analyzed via convolutional sparse coding. Journal of Machine Learning Research, 18(83):1–52, 2017.
    [4] L. Wang, A. Schwing and S. Lazebnik. Diverse and accurate image description using a variational auto-encoder with an additive gaussian encoding space. Curran Associate,2017
    [5] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
    [6] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998
    [7] F.Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 1958
    [8] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
    [9] R.S.Michalski, J.G.Carbonell, T.M.Mitchell. Machine learing : An artificial intelligence approach, Springer Science & Business Media, 2013.
    [10] H.W.Eves. Foundations and fundamental concepts of mathematics. Courier Corporation, 1997.
    [11] P. Murugan, “Feed forward and backward run in deep convolution neural network,” arXiv preprint arXiv,2017.
    [12] Hubel, D.H., Wiesel, T.N. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology ,1959.
    [13] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, “Deconvolutional Networks,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
    [14] Zeiler, M. D., Taylor, G. W., & Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In ICCV , 2011.
    [15] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    [16] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2015
    [17] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. In ECCV, 2014.
    [18] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    [19] X. Xie, D. Du, Q. Li, Y. Liang, W. T. Tang, Z. L. Ong, M. Lu, H. P. Huynh, and R. S. M. Goh. Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-based Applications on Mobile SoCs . ACM Trans. Embed. Comput. Syst., forthcoming.
    [20] N.Bell. Sparse Matrix Representations & Iterative Solvers, 2010
    [21] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Pensky. Sparse convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [22] Chen X,Xie P,Chi L” An efficient SIMD com-pression format for sparse matrix-vector multi-plication.”,2018
    [23] X. Xie, D. Du, Q. Li, Y. Liang, W. T. Tang, Z. L. Ong, M. Lu, H. P. Huynh, and R. S. M. Goh. Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-based Applications on Mobile SoCs . ACM Trans. Embed. Comput. Syst., forthcoming.
    [24] Y.LeCun.”A theoretical framework for back-propagation”.In Touretzky.1988
    [25] YU. J.LUKEFAHR A,PALFRAMAN D,et a1.Scalpel”Customizing dnn pruning to the underlying hardware parallelism”ACM,2017
    [26] M. Alwani, H. Chen, M. Ferdman, and P. Milder. Fused-Layer CNN Accelerators. In Proceedings of the International Symposium on Microarchitecture (MICRO).2016
    [27] X. Zhang et al., “A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA,” in arXiv, 2017.
    [28] H. Wang, M. Shao, Y. Liu, “Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator,” IEEE Access, vol. 5, 2017
    [29] M. Zhang, L, Li, H, Wang, Y. Lio, H. Qin, W. Zhoa “Optimized Compression for Implementing Convolutional Neural Networks on FPGA” 2019
    [30] L. Zhu, R. Deng, M. Maire, Z. Deng, G. Mori, and P. Tan. “Sparsely aggregated convolutional networks. In Proceedings of the European Conference on Computer Vision “(ECCV), pp. 186–201, 2018
    [31] S. Liu et al., “Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA,” ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2018.
    [32] H Fan, S Liu, M Ferianc, HC Ng, Z Que,”A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA” Conference on Field, 2018 - ieeexplore.ieee.org

    無法下載圖示 校內:立即公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE