| 研究生: |
蘇郁翔 Su, Yu-Xiang |
|---|---|
| 論文名稱: |
移植Tensorflow至CASLAB-GPUSIM模擬平台與矩陣函式庫優化 Porting Tensorflow to CASLAB-GPUSIM and Optimization of Matrix Multiplication Library |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 終端裝置 、通用繪圖處理器 、矩陣乘法 、機器學習 |
| 外文關鍵詞: | Edge Device, GPGPU, Matrix Multiplication, Machine Learning |
| 相關次數: | 點閱:92 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著雲端計算的蓬勃發展,機器學習的應用也逐漸拓展到終端裝置的應用上,為了能夠在終端硬體之開發階段或是終端應用的效能分析,本論文整合了機器學習框架Tensorflow與本實驗室所開發的OpenCL Runtime,成功將Tensorflow Runtime移植至本實驗室所開發的CASLAB-GPUSIM模擬平台上,接著又透過以Tensorflow所撰寫的測試程式進行了一系列的系統驗證,借此模擬終端裝置上的機器學習應用情境。
除了終端機器學習模擬平台的搭建,本論文認為在以通用繪圖處理器作為終端加速的解決方案中,線性代數的函式庫並沒有隨著該應用情境以及計算資源而有所變化,其中尤以矩陣乘法影響最甚,因其為建構卷積神經模型之卷積層與全連結層的基本運算單元,有鑑於此,本論文針對CLBlast函式庫的矩陣乘法演算法提出了優化建議,亦即針對終端機器學習應用的運算型態減少矩陣乘法函式庫的前處理以達到減少整體矩陣乘法函式庫所需要的執行時間。
With the rapid development of cloud computing, the application of machine learning has gradually expanded to the application of edge devices. In order to analyze the performance of edge application in the early development stage of edge hardware, we complete the suggest that integration of Tensorflow and the GPGPU simulator, called CASLAB-GPUSIM.
In addition to the building of edge device simulation platform, we propose a matrix multiplication library for the machine learning application on edge device using GPGPU as the acceleration solution. According to our experiment result, we have 5.6 average speed up in the fully-connected layer of our benchmarks, including MNIST mode, Lenet-5 and MobileNet.
[1] “Movidius Official Website.” [Online]. Available: https://www.movidius.com/.
[2] “Tensorflow Official Website.” [Online]. Available: https://www.Tensorflow.org/.
[3] “Eigen Library Offical Website.” [Online]. Available: https://eigen.tuxfamily.org/dox/.
[4] “Nvidia CUDA Toolkit.” [Online]. Available: https://developer.nvidia.com/cuda-downloads.
[5] “Documentation for StreamExecutor open source proposal.” [Online]. Available: https://github.com/henline/streamexecutordoc.
[6] “cuBLAS Offical Website.” [Online]. Available: https://developer.nvidia.com/cublas.
[7] “Tf-coriander githut repository.” [Online]. Available: https://github.com/hughperkins/Tf-coriander.
[8] “Tuned OpenCL BLAS, CLBlast.” [Online]. Available: https://github.com/CNugteren/CLBlast.
[9] “EasyCL github repository.” [Online]. Available: https://github.com/hughperkins/EasyCL.
[10] “coriander github repository.” [Online]. Available: https://github.com/hughperkins/coriander/tree/f069f52b0574148c51151b7baee13616daba56f5.
[11] “The LLVM Compiler Infrastructure.” [Online]. Available: https://llvm.org/.
[12] A.Munshi, “OpenCL 1.2 Specification,” Version 1.2, p. 380, 2012.
[13] “Khronos Official Website.” [Online]. Available: https://www.khronos.org/.
[14] “OpenCL Offline Compiler.” [Online]. Available: https://github.com/HSAFoundation/CLOC.
[15] O.Api, R.Card, andC.Queues, “OpenCL API 1.2 Reference Card,” Khronos Gr., pp. 1–8, 2011.
[16] HSA Foundation, “HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG),” no. May, pp. 1–391, 2013.
[17] H.Foundation, “HSA Runtime Programmer ’ s Reference Manual,” pp. 1–147, 2015.
[18] “PTX ISA.” [Online]. Available: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html.
[19] J. L.Hennessy andD. aPatterson, Computer Architecture, Fourth Edition: A Quantitative Approach, no. 0. 2006.
[20] Y.LeCun, L.Bottou, Y.Bengio, andP.Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2323, 1998.
[21] “The MNIST dataset.” [Online]. Available: http://yann.lecun.com/exdb/mnist/.
[22] “Linear Regression.” [Online]. Available: https://en.wikipedia.org/wiki/Linear_regression.
[23] S.Chetlur, C.Woolley, P.Vandermersch, J.Cohen, J.Tran, B.Catanzaro, andE.Shelhamer, “cuDNN: Efficient Primitives for Deep Learning,” pp. 1–9, 2014.
[24] “Tensorflow MNIST tutorial.” [Online]. Available: https://www.Tensorflow.org/tutorials/.
[25] “Tensorflow Lenet-5 Model.” [Online]. Available: https://blog.csdn.net/NNNNNNNNNNNNY/article/details/70216265.
[26] T. D.Han andT. S.Abdelrahman, “Reducing branch divergence in GPU programs,” Proc. Fourth Work. Gen. Purp. Process. Graph. Process. Units, p. 3:1--3:8, 2011.
[27] “Direct Implementation.” [Online]. Available: https://www.quantstart.com/articles/Matrix-Matrix-Multiplication-on-the-GPU-with-Nvidia-CUDA.
[28] X.Cui, Y.Chen, C.Zhang, andH.Mei, “Auto-tuning dense matrix multiplication for GPGPU with cache,” Proc. Int. Conf. Parallel Distrib. Syst. - ICPADS, pp. 237–242, 2010.
[29] B.Wu, F.Iandola, P. H.Jin, andK.Keutzer, “SqueezeDet: UWu, B., Iandola, F., Jin, P. H., &Keutzer, K. (2016). SqueezeDet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. ArXiv Preprint ArXiv:1612.01051, 129–137.nified, small, low,” arXiv Prepr. arXiv1612.01051, pp. 129–137, 2016.
[30] A. G.Howard andW.Wang, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew,” 2012.
[31] X.Sun, N.Ansari, N. E.Sun, X., & Ansari, X.Sun, andN.Ansari, “EdgeIoT: Mobile Edge Computing for the Internet of Things,” IEEE Commun. Mag., vol. 54, no. 12, pp. 22–29, 2016.
[32] P. N.Glaskowsky, “NVIDIA’s Fermi : The First Complete GPU Computing Architecture,” White Pap., no. September, pp. 1–26, 2009.
[33] K.Mo, “MS108 COMPUTER SYSTEM(1) Final Report — gpgpu-sim,” no. 1, pp. 1–17, 2014.
[34] “SystemC Offical Website.” [Online]. Available: http://www.accellera.org/downloads/standards/systemc.
[35] “GeForce 10 series Specification.” [Online]. Available: https://en.wikipedia.org/wiki/GeForce_10_series.
[36] “Adding a New Op.” [Online]. Available: https://www.Tensorflow.org/extend/adding_an_op.
[37] “SWIG Official Website.” [Online]. Available: http://www.swig.org/tutorial.html.
[38] “Tensorflow Tensorboard.” [Online]. Available: https://www.Tensorflow.org/guide/summaries_and_tensorboard.
[39] “Python3.3 time library.” [Online]. Available: https://docs.python.org/3/library/time.html.