| 研究生: |
曾微中 Tseng, Wei-Chung |
|---|---|
| 論文名稱: |
深度卷積網路之逐層定點數量化方法與實作YOLOv3推論引擎 Layer-wise Fixed Point Quantization for Deep Convolutional Neural Networks and Implementation of YOLOv3 Inference Engine |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 類神經網路加速 、卷積運算 、網路量化 、前項傳播 、終端AI |
| 外文關鍵詞: | Edge Device, Machining Learning, CNNs quantization, CNNs optimization |
| 相關次數: | 點閱:132 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
先進的深度卷積神經網絡在許多領域取得巨大的成功,但由於其通常需要龐大的運算資源,而無法應用於終端移動設備。例如在Raspberry Pi 3上使用Tensorflow執行SSD_Mobilenet-專為終端裝置優化的物件辨識網路,辨識單張圖片需要大約25秒。而對於更多層的DNN模型例如Resnet101 Faster RCNN光是權重就需要接近600Mbyte ,由於所需記憶體過大甚至無法在Raspberry Pi 3上執行。
為了符合有限的硬體資源,以及達到較低的計算延遲,優化網路結構、量化網路運算等方法皆在嘗試解決以上問題。優化網路結構透過修改網路架構降低模型運算量並縮小模型尺寸如Mobilenet、Squeezenet。量化網路運算則縮小模型權重並加速DNN運算,但通常需要特殊硬體以支援相對應的量化儲存格式以及量化計算行為如EIE、EYERISS。
本論文提出一種網路量化方法以及硬體前期設計架構MDFI(Micro Darknet For Inference)。MDFI作為純C語言構成的前向傳導DNN框架,主要支援物件辨識網路模型,不使用動態函式庫例如Protocol-buffer以及保持不到280kByte的執行檔大小,適合為終端移動設備所使用。由於不使用動態函式庫,其運算行為可作為硬體設計的參照,作為ESL的前期描述模型。
量化方案使DNN網路得以定點數進行前向傳播(Inference),比起通用的浮點數更有效率,亦可以消除原模型中的過度擬合現象,所以在AlexNet-Imagenet-Top1及Top5的測試中分別可以提升0.5%及0.1%的準確度。並估計採用本方法的硬體加速單元相較於浮點數可節省高達90%以上的功率消耗。
With the increasing popularity of mobile devices and the effectiveness of deep learning-based algorithms, people try to put deep learning models on mobile devices. However, it is limited by the complexity of computational and software overhead.
We propose an efficient framework for inference to fit resource-limited devices with about 1000 times smaller than Tensorflow in code size, and a layer-wised quantization scheme that allows inference computed by fixed-point arithmetic. The fixed-point quantization scheme is more efficient than floating point arithmetic with power consumption reduced to 8% left in cost grained evaluation and reduce model size to 40%~25% left, and keep TOP5 accuracy loss under 1% in Alexnet on ImageNet.
[1] G. E. H.Krizhevsky, Alex, Ilya Sutskever, “ImageNet Classification with Deep Convolutional Neural Networks,” J. Geotech. Geoenvironmental Eng., vol. 12, p. 04015009, 2015.
[2] O.Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[3] F. and othersChollet, “Keras,” 2015. [Online]. Available: https://keras.io.
[4] M.Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.”
[5] Y.Jia et al., “Caffe: Convolutional Architecture for Fast Feature Embedding *,” 2014.
[6] A.Paszke et al., “Automatic differentiation in PyTorch,” 31st Conf. Neural Inf. Process. Syst., no. Nips, pp. 1–4, 2017.
[7] L.Lab, “theano Documentation Release 1.0.0,” 2017.
[8] J. Redmon, “Darknet: Open source neural networks in c.,” 2013. [Online]. Available: http://pjreddie.com/darknet/.
[9] M.Abadi et al., “TensorFlow Lite,” 2017. [Online]. Available: https://www.tensorflow.org/lite/.
[10] S.Teerapittayanon, B.McDanel, andH. T.Kung, “Distributed Deep Neural Networks over the Cloud, the Edge and End Devices,” Proc. - Int. Conf. Distrib. Comput. Syst., pp. 328–339, 2017.
[11] J.Redmon andA.Farhadi, “YOLOv3: An Incremental Improvement,” 2018.
[12] P.Molchanov, S.Tyree, T.Karras, T.Aila, andJ.Kautz, “Pruning Convolutional Neural Networks for Resource Efficient Inference,” no. 2015, pp. 1–17, 2016.
[13] R.Girshick, J.Donahue, T.Darrell, U. C.Berkeley, andJ.Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2–9, 2012.
[14] R.Girshick, “Fast R-CNN.”
[15] R.Girshick, J.Donahue, T.Darrell, U. C.Berkeley, J.Malik, andF. und T. des L. N.-W.Ministerium für Innovation, Wissenschaft, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” 2014 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–9, 2012.
[16] J.Redmon, S.Divvala, R.Girshick, andA.Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2015.
[17] J.Redmon, “Yolo9000,” Cvpr, 2017.
[18] W.Liu et al., “SSD: Single shot multibox detector,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9905 LNCS, pp. 21–37.
[19] J. R. R.Uijlings, K. E. A.Van DeSande, T.Gevers, andA. W. M.Smeulders, “Selective Search for Object Recognition,” 2012.
[20] T.Durand, T.Mordan, N.Thome, andM.Cord, “Learning to Refine Object Segments,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017–Janua, pp. 5957–5966.
[21] E.Asensio, C.Medina, M.Frías, andM. I. S.deRojas, “Microsoft COCO: Common Objects in Context,” J. Am. Ceram. Soc., vol. 99, no. 12, pp. 4121–4127, 2016.
[22] K.He, X.Zhang, S.Ren, andJ.Sun, “Deep Residual Learning for Image Recognition,” Comput. Vis. Pattern Recognit., pp. 770–778, 2016.
[23] C.Szegedy, S.Reed, P.Sermanet, V.Vanhoucke, andA.Rabinovich, “Going deeper with convolutions,” pp. 1–12.
[24] J.Hale, “Deep Learning Framework Power Scores 2018,” kaggle, 2018. [Online]. Available: https://www.kaggle.com/discdiver/deep-learning-framework-power-scores-2018.
[25] Y.Cheng, D.Wang, P.Zhou, andT.Zhang, “A Survey of Model Compression and Acceleration for Deep Neural Networks,” pp. 1–10, 2017.
[26] F. N.Iandola, S.Han, M. W.Moskewicz, K.Ashraf, W. J.Dally, andK.Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and,” Feb.2016.
[27] A. G.Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr.2017.
[28] X.Zhang, X.Zhou, M.Lin, andJ.Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” Jul.2017.
[29] G.Huang, Z.Liu, L.Van DerMaaten, andK. Q.Weinberger, “Densely Connected Convolutional Networks.”
[30] S.Anwar, K.Hwang, andW.Sung, “Structured Pruning of Deep Convolutional Neural Networks,” ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 3, 2017.
[31] G.Hinton, O.Vinyals, andJ.Dean, “Distilling the Knowledge in a Neural Network,” Mar.2015.
[32] F.Li, B.Zhang, andB.Liu, “Ternary Weight Networks,” no. Nips, 2016.
[33] M.Courbariaux, I.Hubara, D.Soudry, R.El-Yaniv, andY.Bengio, “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1,” 2016.
[34] C.Leng, H.Li, S.Zhu, andR.Jin, “Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM,” Jul.2017.
[35] “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.”
[36] B.Jacob et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” 2017.
[37] S.Han, H.Mao, andW. J.Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” pp. 1–14, 2015.
[38] S.Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016, vol. 16, pp. 243–254, 2016.
[39] Y.Chen, T.Krishna, J.Emer, andV.Sze, “Eyeriss : An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Future of Deep Learning Recognit ion DCNN Accelerator is Crucial • High Throughput for Real-time,” IEEE Int. Solid-State Circuits Conf., pp. 1–43, 2016.
[40] N. P.Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” ACM SIGARCH Comput. Archit. News, vol. 45, no. 2, pp. 1–12, 2017.
[41] I.Hubara, M.Courbariaux, D.Soudry, R.El-Yaniv, andY.Bengio, “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations,” Sep.2016.
[42] “IBM Knowledge Center - Managing code size.” [Online]. Available: https://www.ibm.com/support/knowledgecenter/SSAT4T_16.1.1/com.ibm.xlf1611.lelinux.doc/proguide/managingcodesize.html. [Accessed: 26-Dec-2018].
[43] Min-Zhi Ji, “Optimization of YOLOv3 Inference Engine for Edge Device,” Natl. Cheng K. Univ. - NCKU, 2019.
[44] “GNU linker.” [Online]. Available: https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html.
[45] “GNU, Code Coverage Reports.”
[46] Balau, “Analyzing C source files dependencies in a program.” [Online]. Available: https://balau82.wordpress.com/2013/11/24/analyzing-c-source-files-dependencies-in-a-program/.
[47] “gcov—a Test Coverage Program.”
[48] M.Rastegari, V.Ordonez, J.Redmon, andA.Farhadi, “XNOR-net: Imagenet classification using binary convolutional neural networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9908 LNCS.
[49] “Benefits of Using Fixed-Point Hardware - MATLAB & Simulink.” [Online]. Available: https://www.mathworks.com/help/fixedpoint/gs/benefits-of-fixed-point-hardware.html. [Accessed: 26-Dec-2018].
[50] Shao-Ming Lai, Chih-Hung Kuo, “An Efficient Dual-Precision Floating-Point Special Function Unit,” 29th VLSI Des. Symp.
[51] C.Lomont, “Introduction to Intel Advanced Vector Extensions,” p. 21, 2011.
[52] “Graphviz - Graph Visualization Software.” [Online]. Available: http://www.graphviz.org/documentation/.