成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	曾微中 Tseng, Wei-Chung
論文名稱：	深度卷積網路之逐層定點數量化方法與實作YOLOv3推論引擎 Layer-wise Fixed Point Quantization for Deep Convolutional Neural Networks and Implementation of YOLOv3 Inference Engine
指導教授：	陳中和 Chen, Chung-Ho
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	70
中文關鍵詞：	類神經網路加速、卷積運算、網路量化、前項傳播、終端AI
外文關鍵詞：	Edge Device, Machining Learning, CNNs quantization, CNNs optimization
相關次數：	點閱：204 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

先進的深度卷積神經網絡在許多領域取得巨大的成功，但由於其通常需要龐大的運算資源，而無法應用於終端移動設備。例如在Raspberry Pi 3上使用Tensorflow執行SSD_Mobilenet-專為終端裝置優化的物件辨識網路，辨識單張圖片需要大約25秒。而對於更多層的DNN模型例如Resnet101 Faster RCNN光是權重就需要接近600Mbyte ，由於所需記憶體過大甚至無法在Raspberry Pi 3上執行。
為了符合有限的硬體資源，以及達到較低的計算延遲，優化網路結構、量化網路運算等方法皆在嘗試解決以上問題。優化網路結構透過修改網路架構降低模型運算量並縮小模型尺寸如Mobilenet、Squeezenet。量化網路運算則縮小模型權重並加速DNN運算，但通常需要特殊硬體以支援相對應的量化儲存格式以及量化計算行為如EIE、EYERISS。
本論文提出一種網路量化方法以及硬體前期設計架構MDFI(Micro Darknet For Inference)。MDFI作為純C語言構成的前向傳導DNN框架，主要支援物件辨識網路模型，不使用動態函式庫例如Protocol-buffer以及保持不到280kByte的執行檔大小，適合為終端移動設備所使用。由於不使用動態函式庫，其運算行為可作為硬體設計的參照，作為ESL的前期描述模型。
量化方案使DNN網路得以定點數進行前向傳播(Inference)，比起通用的浮點數更有效率，亦可以消除原模型中的過度擬合現象，所以在AlexNet-Imagenet-Top1及Top5的測試中分別可以提升0.5%及0.1%的準確度。並估計採用本方法的硬體加速單元相較於浮點數可節省高達90%以上的功率消耗。

With the increasing popularity of mobile devices and the effectiveness of deep learning-based algorithms, people try to put deep learning models on mobile devices. However, it is limited by the complexity of computational and software overhead.
We propose an efficient framework for inference to fit resource-limited devices with about 1000 times smaller than Tensorflow in code size, and a layer-wised quantization scheme that allows inference computed by fixed-point arithmetic. The fixed-point quantization scheme is more efficient than floating point arithmetic with power consumption reduced to 8% left in cost grained evaluation and reduce model size to 40%~25% left, and keep TOP5 accuracy loss under 1% in Alexnet on ImageNet.

摘要	I
INTRODUCTION	II
MATERIALS AND METHODS	II
RESULTS AND DISCUSSION	IV
CONCLUSION	V
目錄	VI
圖目錄	X
第1章 序論	1
1 論文動機	1
2 論文貢獻	2
3 論文架構	3
第2章 背景知識	4
1 類神經網路(Neural Network)	4
1.1 卷積神經網路(Convolution Neural Network )	4
2 DNN in Computer Vision	7
2.1 Image Classification	7
2.2 Objection Detection	8
2.3 Image Segmentation	12
3 DNN開發介紹	13
4 DNN開發環境	14
第3章 相關議題探討 related work	15
第4章 MDFI (Micro Darknet for Inference)	16
1 實作目的	16
2 基礎架構選擇	17
3 Darknet	19
3.1 Darknet使用	19
3.2 Darknet 架構	21
4 MDFI (Micro Darknet For Inference)	22
4.1 系統架構與實作方法	23
4.2 實作結果	25
第5章 定點數量化(Fixed Point)	29
1 Why Fixed Point	29
2 定點數格式	30
3 量化方法	31
3.1 量化目標	33
3.2 點積	34
3.3 偏移量相加	36
3.4 方法調校	37
3.5 小結	37
第6章 實驗環境與數據	38
1 定點數相關參數探討	38
1.1 模型各卷積層對定點數的敏感度	38
1.2 不同輸入對特徵圖的影響	41
1.3 數值分佈	42
2 定點數準確度比較	44
3 方法參數與模型準確度	45
3.1 影像分類	45
3.2 物件辨識	46
4 量化對模型尺寸的影響	47
5 方法參數與位元長度	48
6 定點數運算對運算功率的影響	49
7 小結	50
第7章 結論與未來目標	52
1 結論	52
2 未來目標	52
參考文獻	53
                                    

[1] G. E. H.Krizhevsky, Alex, Ilya Sutskever, “ImageNet Classification with Deep Convolutional Neural Networks,” J. Geotech. Geoenvironmental Eng., vol. 12, p. 04015009, 2015.
[2] O.Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[3] F. and othersChollet, “Keras,” 2015. [Online]. Available: https://keras.io.
[4] M.Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.”
[5] Y.Jia et al., “Caffe: Convolutional Architecture for Fast Feature Embedding *,” 2014.
[6] A.Paszke et al., “Automatic differentiation in PyTorch,” 31st Conf. Neural Inf. Process. Syst., no. Nips, pp. 1–4, 2017.
[7] L.Lab, “theano Documentation Release 1.0.0,” 2017.
[8] J. Redmon, “Darknet: Open source neural networks in c.,” 2013. [Online]. Available: http://pjreddie.com/darknet/.
[9] M.Abadi et al., “TensorFlow Lite,” 2017. [Online]. Available: https://www.tensorflow.org/lite/.
[10] S.Teerapittayanon, B.McDanel, andH. T.Kung, “Distributed Deep Neural Networks over the Cloud, the Edge and End Devices,” Proc. - Int. Conf. Distrib. Comput. Syst., pp. 328–339, 2017.
[11] J.Redmon andA.Farhadi, “YOLOv3: An Incremental Improvement,” 2018.
[12] P.Molchanov, S.Tyree, T.Karras, T.Aila, andJ.Kautz, “Pruning Convolutional Neural Networks for Resource Efficient Inference,” no. 2015, pp. 1–17, 2016.
[13] R.Girshick, J.Donahue, T.Darrell, U. C.Berkeley, andJ.Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2–9, 2012.
[14] R.Girshick, “Fast R-CNN.”
[15] R.Girshick, J.Donahue, T.Darrell, U. C.Berkeley, J.Malik, andF. und T. des L. N.-W.Ministerium für Innovation, Wissenschaft, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” 2014 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–9, 2012.
[16] J.Redmon, S.Divvala, R.Girshick, andA.Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2015.
[17] J.Redmon, “Yolo9000,” Cvpr, 2017.
[18] W.Liu et al., “SSD: Single shot multibox detector,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9905 LNCS, pp. 21–37.
[19] J. R. R.Uijlings, K. E. A.Van DeSande, T.Gevers, andA. W. M.Smeulders, “Selective Search for Object Recognition,” 2012.
[20] T.Durand, T.Mordan, N.Thome, andM.Cord, “Learning to Refine Object Segments,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017–Janua, pp. 5957–5966.
[21] E.Asensio, C.Medina, M.Frías, andM. I. S.deRojas, “Microsoft COCO: Common Objects in Context,” J. Am. Ceram. Soc., vol. 99, no. 12, pp. 4121–4127, 2016.
[22] K.He, X.Zhang, S.Ren, andJ.Sun, “Deep Residual Learning for Image Recognition,” Comput. Vis. Pattern Recognit., pp. 770–778, 2016.
[23] C.Szegedy, S.Reed, P.Sermanet, V.Vanhoucke, andA.Rabinovich, “Going deeper with convolutions,” pp. 1–12.
[24] J.Hale, “Deep Learning Framework Power Scores 2018,” kaggle, 2018. [Online]. Available: https://www.kaggle.com/discdiver/deep-learning-framework-power-scores-2018.
[25] Y.Cheng, D.Wang, P.Zhou, andT.Zhang, “A Survey of Model Compression and Acceleration for Deep Neural Networks,” pp. 1–10, 2017.
[26] F. N.Iandola, S.Han, M. W.Moskewicz, K.Ashraf, W. J.Dally, andK.Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and,” Feb.2016.
[27] A. G.Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr.2017.
[28] X.Zhang, X.Zhou, M.Lin, andJ.Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” Jul.2017.
[29] G.Huang, Z.Liu, L.Van DerMaaten, andK. Q.Weinberger, “Densely Connected Convolutional Networks.”
[30] S.Anwar, K.Hwang, andW.Sung, “Structured Pruning of Deep Convolutional Neural Networks,” ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 3, 2017.
[31] G.Hinton, O.Vinyals, andJ.Dean, “Distilling the Knowledge in a Neural Network,” Mar.2015.
[32] F.Li, B.Zhang, andB.Liu, “Ternary Weight Networks,” no. Nips, 2016.
[33] M.Courbariaux, I.Hubara, D.Soudry, R.El-Yaniv, andY.Bengio, “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1,” 2016.
[34] C.Leng, H.Li, S.Zhu, andR.Jin, “Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM,” Jul.2017.
[35] “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.”
[36] B.Jacob et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” 2017.
[37] S.Han, H.Mao, andW. J.Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” pp. 1–14, 2015.
[38] S.Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016, vol. 16, pp. 243–254, 2016.
[39] Y.Chen, T.Krishna, J.Emer, andV.Sze, “Eyeriss : An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Future of Deep Learning Recognit ion DCNN Accelerator is Crucial • High Throughput for Real-time,” IEEE Int. Solid-State Circuits Conf., pp. 1–43, 2016.
[40] N. P.Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” ACM SIGARCH Comput. Archit. News, vol. 45, no. 2, pp. 1–12, 2017.
[41] I.Hubara, M.Courbariaux, D.Soudry, R.El-Yaniv, andY.Bengio, “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations,” Sep.2016.
[42] “IBM Knowledge Center - Managing code size.” [Online]. Available: https://www.ibm.com/support/knowledgecenter/SSAT4T_16.1.1/com.ibm.xlf1611.lelinux.doc/proguide/managingcodesize.html. [Accessed: 26-Dec-2018].
[43] Min-Zhi Ji, “Optimization of YOLOv3 Inference Engine for Edge Device,” Natl. Cheng K. Univ. - NCKU, 2019.
[44] “GNU linker.” [Online]. Available: https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html.
[45] “GNU, Code Coverage Reports.”
[46] Balau, “Analyzing C source files dependencies in a program.” [Online]. Available: https://balau82.wordpress.com/2013/11/24/analyzing-c-source-files-dependencies-in-a-program/.
[47] “gcov—a Test Coverage Program.”
[48] M.Rastegari, V.Ordonez, J.Redmon, andA.Farhadi, “XNOR-net: Imagenet classification using binary convolutional neural networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9908 LNCS.
[49] “Benefits of Using Fixed-Point Hardware - MATLAB & Simulink.” [Online]. Available: https://www.mathworks.com/help/fixedpoint/gs/benefits-of-fixed-point-hardware.html. [Accessed: 26-Dec-2018].
[50] Shao-Ming Lai, Chih-Hung Kuo, “An Efficient Dual-Precision Floating-Point Special Function Unit,” 29th VLSI Des. Symp.
[51] C.Lomont, “Introduction to Intel Advanced Vector Extensions,” p. 21, 2011.
[52] “Graphviz - Graph Visualization Software.” [Online]. Available: http://www.graphviz.org/documentation/.

2024-01-01公開

簡易檢索 / 詳目顯示

相關論文