| 研究生: |
陳惇介 Chen, Dun-Jie |
|---|---|
| 論文名稱: |
CASLab-GPU OpenCL LLVM編譯器實作與優化 LLVM-based OpenCL Compiler for CASLab-GPU |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 100 |
| 中文關鍵詞: | 終端裝置 、通用繪圖處理器 、編譯器 、編譯最佳化 |
| 外文關鍵詞: | Compiler, Compiler optimization, Edge device, GPGPU, LLVM |
| 相關次數: | 點閱:104 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著機器學習相關應用開始蓬勃發展,日益複雜的模型架構、資料隱私與即時反應的需求提升,運算單元逐漸從雲端計算移向終端計算;為了能夠使本實驗室所開發的CASLab-GPU可與不同架構的CPU組成終端運算裝置、且能提供一對於開發人員便利使用的開源執行環境,本實驗室設計之CASLab-GPU所採用的語言即採用開源的OpenCL;但由於先前所採用的AMD OpenCL offline compiler (AMD CLOC)為一Closed source專案,其編譯器僅提供x86平台使用,對於本實驗室的CASLab-GPU將成為平台移植的一大阻礙;且原先的編譯流程(AMD CLOC + Finalizer)效率不佳,對於OpenCL這類對Compilation time敏感的語言也會是一效能瓶頸。
因此本論文基於Open source LLVM Compiler Infrastructure Project設計了一CASLab-GPU OpenCL offline compiler,並整合至本實驗室先前開發的OpenCL Runtime、Tensorflow Runtime,透過以OpenCL所撰寫的應用程式/Tensroflow上所執行的Neural network model、搭配CASLab-GPUSim來模擬實際應用情境,並驗證整體編譯流程的正確性。
為了提升CASLab-GPU的執行效率,本論文所設計的Compiler即針對CASLab-GPU所採用的ISA與執行架構提出了Branch optimization、Load/Store optimization與Instruction optimization等平台相關的優化方式,讓CASLab-GPUSim platform有能力執行更貼近實際應用的OpenCL應用程式/Tensorflow Neural model;藉由OpenCL編譯流程改善、本論文所設計的OpenCL compiler,使Tensorflow上所執行的Neural network inference能達到15%的整體效能提升。而除了硬體的執行效能提升外,也使運行於CPU上的OpenCL runtime達到高達85%的效能提升。
With the increasing popularity of machine learning applications, the computing model of machine learning applications has gradually extended from cloud computing to edge computing. In order to provide an edge-computing platform with the GPGPU hardware designed by CASLab called CASLab-GPU, we have implemented the software development environment including OpenCL runtime, HSA runtime, and compilation tools.
To support the software development environment for CASLab-GPU platform, this thesis implements an OpenCL complier with optimization methods that greatly increase the execution efficiency on CASLab-GPU. As a result, this new compiler replaces the original AMD CLOC closed source compiler used by the CASLab-GPU. According to our experiment result, we have achieved an average of 7.6 speed up in OpenCL runtime execution and 1.4 execution speed up in various OpenCL benchmarks, including a Tensorflow CNN Model LeNet-5.
[1] Shi, W., Cao, J., Zhang, Q., et al.: “Edge computing: vision and challenges”,IEEE Internet Things J., 2016, 3, (5), pp. 637–646
[2] “Movidius official website” [Online]. Avaiable: https://www.movidius.com
[3] “Nvidia Jetson TX2 website” [Online]. Avaiable: https://www.nvidia.com/zh-tw/autonomous-machines/embedded-systems/jetson-tx2/
[4] A. Munshi.: “The OpenCL specification”, in Hot Chips 21 Symposium(HCS), 2009 IEEE. IEEE, 2009. doi: 10.1109/HOTCHIPS.2009.7478342 pp. 1–314.
[5] “AMD CLOC” [Online]. Avaiable: https://github.com/HSAFoundation/CLOC
[6] “HSA Technologies” [Online]. Avaiable: http://www.hsafoundation.com/
[7] “Tensorflow Lite” [Online]. Avaiable: https://www.tensorflow.org/lite
[8] “CUDA Overview” [Online]. Avaiable: https://developer.nvidia.com/cuda-zone
[9] Perkins, Hugh.: “CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++ 11 applications on OpenCL™ 1.2 Devices”, Proceedings of the 5th International Workshop on OpenCL, 2017.
[10] “TF-Coriander” [Online]. Avaiable: https://github.com/hughperkins/tf-coriander
[11] C. Nugteren.: “CLBlast: A Tuned OpenCL BLAS Library”. arXiv preprint arXiv:1705.05249(2017), 0–7
[12] “Compute Library for Deep Neural Networks” [Online]. Avaiable: https://github.com/intel/clDNN
[13] “ LLVM Language Reference Manual” [Online]. Avaiable: https://llvm.org/docs/LangRef.html
[14] “ Static single assignment form” [Online]. Avaiable: https://en.wikipedia.org/wiki/Static_single_assignment_form
[15] “Directed acyclic graph” [Online]. Avaiable: https://en.wikipedia.org/wiki/Directed_acyclic_graph
[16] Lengauer, Thomas; and Tarjan; Robert Endre.: “A fast algorithm for finding dominators in a flowgraph”. In: ACM Transactions on Programming Languages and Systems. 1 (1): 121–141. CiteSeerX 10.1.1.117.8843. doi:10.1145/357062.357071. (July 1979)
[17] Georgiadis, L., Werneck, R.F., Tarjan, R.E., Triantafyllis, S., August, D.I.: “Finding dominators in practice”. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, pp. 677–688. Springer, Heidelberg (2004)
[18] “LLVM Selection DAG nod types - LLVM ISD” [Online] Avaiable: https://llvm.org/doxygen/namespacellvm_1_1ISD.html
[19] “ Executable and Linkable Format” [Online] Avaiable: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
[20] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P.: “Gradient-based learning applied to document recognition”. In: Proceedings of the IEEE 86, 2278–2324 (1998).
校內:2024-11-01公開