| 研究生: |
張清宇 Zhang, Qin-Yu |
|---|---|
| 論文名稱: |
降低類神經網路之運算複雜度的方法 Method of Reducing the Computational Complexity of Artificial Neural Network |
| 指導教授: |
卿文龍
Chin, Wen-Long |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 深度學習 、卷積神經網路 、能量效率 |
| 外文關鍵詞: | Deep neural network, convolutional neural network, energy efficient. |
| 相關次數: | 點閱:112 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,深度學習技術在學術界與產業界取得了良好的效能與眾多突破,但是,最先進的深度學習模型計算成本高昂、占用了大量儲存空間。移動平台、可穿戴設備、自主機器人、物聯網(IoT)設備等領眾多應用也對深度學習有著強烈的需求,如何在這樣的低功耗設備有效實施深度模型成了一個難題。
因此本篇論文提出了可實作,並在硬體上大幅降低卷積神經網路計算複雜度的方法,使用較少的位元數來預測經過激勵函數Rectified Linear Unit(ReLU)前輸入激發的正負號,若預測負值成功,則可省下運算所需的位元數,以達到降低複雜度的目標。本篇論文實作在AlexNet上,並探討兩種預測正負號的方式,一個是並列輸入的模式,另一個則是串列輸入的模式。在並列輸入的模式下最高可降低51.42%的計算複雜度,在串列輸入的模式下最高可降低65.92%的計算複雜度,在1000張影像辨識中,Alexnet浮點數架構辨識出了626張,並列輸入演算法辨識出了625張,串列輸入演算法辨識出了626張。
This thesis proposes a method that can significantly reduce the computational complexity of convolutional neural networks in hardware by using fewer bits to predict the sign bit of the input activation of the Rectified Linear Unit (ReLU). If the predicted negative value is valid, the number of bits required for the operation can be saved to achieve the goal of reducing complexity. This paper is based on the popular AlexNet and explores two ways to predict the sign bit. One for the parallel input mode and the other for the serial input mode. In the parallel input mode, the computational complexity can be reduced by up to 51.42%, and the computational complexity can be reduced by up to 65.92% in the serial input mode. In 1000 images identification, Alexnet's floating-point architecture have 626 successful idnetification. The parallel input algorithm identified have 625 successful identification, and the serial input algorithm have 626 successful identification.
[1] Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks", Proc. ISCA, pp. 367-379, 2016.
[2] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” in CVPR Workshop, 2014, pp. 682–687
[3] M. Sankaradas, et al., “A massively parallel coprocessor for convolutional neural networks,” in Proc. ASAP, 2009, pp. 53–60.
[4] S. Chakradhar, M. Sankaradas, V. Jakkula,and S. Cadambi, “A dynamically configurable coprocessor for convolutional neural networks,” in Proc. ISCA, 2010, pp. 247–257.
[5] S. Park, K. Bong, D. Shin, J. Lee, S. Choi,and H.-J. Yoo, “A 1.93 TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big data applications,” in IEEE ISSCC Dig. Tech. Papers, 2015, pp. 1–3.
[6] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A convolutional network accelerator,” in Proc.GLVLSI, 2015, pp. 199–204.
[7] Z. Du, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in Proc.ISCA, 2015, pp. 92–104.
[8] S. Gupta, A. Agrawal, K. Gopalakrishnan,and P. Narayanan, “Deep learning with limited numerical precision,” in Proc.ICML, 2015, pp. 1737–1746.
[9] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-centric accelerator design for convolutional neural networks,” in Proc. ICCD, 2013, pp. 13–19.
[10] T. Chen, et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in Proc. ASPLOS, 2014,pp. 269–284.
[11] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “LogNet: Energy-efficient neural networks using logrithmic computations,” in Proc. ICASSP, 2017, pp. 5900–5904.
[12] Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, “Scalable and modularized RTL compilation of convolutional neural networks onto FPGA,” in Proc. FPL, 2016, pp. 1–8.
[13] M. Horowitz, “Computing’s energy problem (and what we can do about it),” in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 10–14.
[14] P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos, “Stripes:Bit-serial deep neural network computing,” in Proc. MICRO, 2016, pp. 1–12.
[15] M. Courbariaux, Y. Bengio, and J.-P. David, “BinaryConnect: Training deep neural networks with binary weights during propagations,” in Proc. NIPS, 2015, pp. 3123–3131.
[16] M. Courbariaux and Y. Bengio, (Feb. 2016). “Binarized neural networks: Training deep neural networks with weights and 激發(activations) constrained to +1 or −1.”[Online]. Available: https://arxiv.org/abs/1602.02830
[17] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet classification using binary convolutional neural networks,” in Proc. ECCV, 2016,pp. 525–542.
[18] Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, (Sep. 2016). “Quantized neural networks: Training neural Sze et al .: Ef ficient Processing of Deep Neural Networks: A Tutorial and Survey 2328 Proceedings of the IEEE | Vol. 105, No. 12, December 2017 networks with low precision weights and activations.” [Online]. Available: https://arxiv.org/abs/1609.07061
[19] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, (Jun. 2016). “DoReFa-Net: Training low bitwidth convolutional neural networks with lowbit width gradients.” [Online].Available:https://arxiv.org/abs/1606.06160
[20] Z. Cai, X. He, J. Sun, and N. Vasconcelos,“Deep learning with low precision by halfwave Gaussian quantization,” in Proc.CVPR, 2017.
[21] R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights,” in Proc. ISVLSI, 2016, pp. 236–241.
[22] K. Ando, et al., “BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS,” in Proc. Symp. VLSI, 2017, pp. C24–C25.
[23] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in Proc. ICLR, 2016.
[24] C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,” in Proc.ICLR, 2017.
[25] Z. Cai, X. He, J. Sun, and N. Vasconcelos,“Deep learning with low precision by halfwaveGaussian quantization,” in Proc. CVPR, 2017.
[26] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network quantization: Towards lossless CNNs with low-precision weights,” in Proc. ICLR, 2017.
[27] Methods Deep Neural Netw., 2016. F. Li and B. Liu, “Ternary weight networks,” in Proc. NIPS Workshop Efficient Methods Deep Neural Netw., 2016.
[28] Norman P. Jouppi, Cliff Young, Nishant Pati, “In-Datacenter Performance Analysis of a Tensor Processing Unit” To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 26, 2017
[29] Bert Moons, Bert De Brabandere, Luc Van Gool, Marian Verhelst, “Energy-Efficient ConvNets Through Approximate Computing” in Proc WACV ,2016
[30] Patrick Judd , Jorge Albericio , Tayler Hetherington,” Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets” in Proc ICLR ,2016