簡易檢索 / 詳目顯示

研究生: 徐鏞
Hsu, Yung
論文名稱: 符合HSA中介語言並支援三維繪圖與通用運算之繪圖處理器設計平台
An HSAIL Conformed GPU Design Platform for General Purpose Computing and 3D Rendering Applications
指導教授: 陳中和
Chen, Chung-Ho
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 87
中文關鍵詞: 繪圖處理器異質架構系統平行運算繪圖管線
外文關鍵詞: GPU, heterogeneous system architecture, parallel computing, rendering pipeline
相關次數: 點閱:175下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 繪圖處理器具有強大的平行運算能力,因此不僅使用在三維計算機繪圖,也被用於一般任務。本論文提出一系統層級的繪圖處理器設計平台,可同時支援三維繪圖與通用目的運算。此平台之目的在於,幫助處理器架構設計者在早期設計階段進行軟硬體的開發與驗證。此平台具有基於現代繪圖處理器之硬體架構的模擬器。該模擬器包含化可程式化且具有客製指令集架構的單一指令多執行緒處理器、針對繪圖管線所設計的特定模組,以及記憶體系統。此繪圖處理器針對高效能運算以及異質運算而設計,並符合異質架構系統的運行模式與其中介語言。本平台亦提供一特殊的編譯流程與工具鏈,用於編譯OpenGL著色程式與OpenCL內核至HSA中介語言以及客製的二進位指令集。本論文發展了一模擬框架,使設計平台得以運行OpenCL與OpenGL應用程式,該框架實作OpenCL與OpenGL 應用程式介面與其執行期函式庫、模擬器的驅動程式,以及客製的內文與視窗管理函式庫。數個OpenCL與OpenGL基準測試程式已被移植至此平台,開發者可剖析其程式行為並評估效能議題。

    Graphics Processing Unit (GPU) has powerful parallel computing ability, so it can not only be used for 3D graphic application, but also for general purpose task. This work proposes a system level GPU design platform supporting 3D rendering and general purpose computing applications. The goal of the platform is to assist the processor architects to explore and verify the hardware as well as the software in the early design stage. The platform has a simulator which models the hardware architecture of the modern GPU, including the programmable Single Instruction Multiple Thread (SIMT) processors with customized instruction set architecture, the dedicated modules for the rendering pipeline, and the memory system. This GPU design is aimed for high performance and heterogeneous computing, and it conforms to the Heterogeneous System Architecture (HSA) execution model and HSA intermediate language (HSAIL). This platform also provides a special compilation flow and a tool chain to compile OpenGL shader programs and OpenCL kernels to HSAIL and our custom binary instruction set. To support executing OpenCL and OpenGL applications on this platform, we also develop a simulation framework, including the implementation of OpenCL and OpenGL APIs and runtime libraries, the driver for the simulator, and a customized context and window management library. Several benchmarks have been ported to this platform. Developers can profile the behavior of programs and evaluate the performance issue for both OpenCL and OpenGL applications

    Abstract (Chinese) i Abstract ii Acknowledgment iv Table of Contents v List of Figures viii List of Tables x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 2 1.3 Organization 2 Chapter 2 Backgrounds 4 2.1 Computer Graphics 4 2.1.1 OpenGL 4 2.1.2 The Rendering Pipeline 5 2.1.3 Programmable Shader 6 2.2 GPU 7 2.2.1 General Purpose Computing on Graphics Processing Units (GPGPUs) 8 2.2.2 OpenCL Framework 8 2.2.3 Task Scheduling and Control Divergence 9 2.3 Heterogeneous System Architecture (HSA) 11 2.3.1 HSA Execution Model 12 2.3.2 HSA Intermediate Language 13 2.3.3 Heterogeneous Queuing and Uniform Memory Access 14 Chapter 3 Related Work 16 3.1 ATTILA 16 3.2 GPGPU-Sim 18 3.3 TEAPOT 19 Chapter 4 GPU Architecture Design 22 4.1 GPU System Architecture 22 4.2 Streaming Multi-Processor 24 4.2.1 Instruction Set Architecture 24 4.2.2 Extension Instruction Set 26 4.2.3 SIMT Processor 30 4.3 Fixed Function Units for Rendering 32 4.3.1 Geometry Unit 33 4.3.2 Rasterizer Unit 34 4.3.3 Per-fragment Operation Unit 35 4.4 Texture Unit 36 4.4.1 Texture Unit Architecture 36 4.4.2 Address Generation Processor 38 4.4.3 Filter Processor 38 4.5 Memory System 42 4.5.1 Memory Segments 42 4.5.2 Memory Hierarchy Model 42 Chapter 5 Compilation Flow of Shader and Computing Kernel 44 5.1 Overview of the Compilation Flow 44 5.2 GLSL Shader Compilation 46 5.2.1 Translator 46 5.2.2 Scalarizer 48 5.2.3 Syntax Conversion 49 5.2.4 Kernel Synthesis 53 5.3 Finalizer 55 Chapter 6 Simulation Framework 56 6.1 Framework Overview 56 6.2 Application Layer 57 6.3 Runtime Libraries Layer 58 6.4 Driver Layer 61 6.5 Context and Window Management 62 6.5.1 X Window System 62 6.5.2 OpenGL Simulation Toolkit for CASLAB (GLSC) 63 Chapter 7 Benchmarks and Evaluation 68 7.1 Benchmarks 68 7.2 Experiment Environment 71 7.3 Experiment Result 73 7.3.1 Shader Workload Profiling 73 7.3.2 Instruction Breakdown 75 7.3.3 Memory Access Profiling 77 Chapter 8 Conclusion 83 References 84

    [1] J.W. Sheaffer, K. Skadron, and D.P. Luebke. “Temperature-aware GPU design,” ACM SIGGRAPH Posters, New York, NY, USA, August 2004.
    [2] V.M. del Barrio, C. Gonzalez, J. Roca, A. Fernandez and R. Espasa, “ATTILA: a cycle-level execution-driven simulator for modern GPU architectures,” International. Symposium on Performance Analysis of Systems and Software, March 2006, pp. 231-241.
    [3] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing CUDA workloads using a detailed GPU simulator,” in Proc. of ISPASS, 26-28 April 2009 pp. 163-174.
    [4] J.M. Arnau, J.M. Parcerisa and P. Xekalakis, “TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems,” in Proc. of the 27th international ACM conference on International conference on supercomputing, New York, NY, USA, 2013, pp. 37-46.
    [5] NVIDIA Corporation. (2009) Whitepaper: NVIDIA’s Next Generation CUDA(TM) Compute Architecture: Fermi. [Online]Available:
    http://www.nvidia.com.tw/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
    [6] Khronos Group Inc. OpenGL: The Industry's Foundation for High Performance Graphics. [Online] Available: https://www.opengl.org/
    [7] HAS Foundation. (2015) HSA Programmer's Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer, and Object Format (BRIG) [Online] Available: http://www.hsafoundation.com/standards/
    [8] The HSA Foundation. Heterogeneous System Architecture. [Online] Available: http://www.hsafoundation.com/
    [9] The Mesa 3D Graphics Library. [Online] Available: http://www.mesa3d.org/
    [10] Khronos Group Inc. [Online] Available: https://www.khronos.org/
    [11] NVIDIA Corporation. (2009) Whitepaper: NVIDIA’s Next Generation CUDA(TM) Compute Architecture: Fermi. [Online]Available:
    http://www.nvidia.com.tw/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
    [12] OpenGL Architecture Review Board. [Online] Available:
    https://www.opengl.org/archives/about/arb/
    [13] Khronos Group Inc. OpenCL: The open standard for parallel programming of heterogeneous systems. [Online] Available: https://www.opencl.org/
    [14] NVIDIA Corporation. (September 2015) Parallel Thread Execution ISA. Application Guide (Version 4.3).
    [Online] Available: http://docs.nvidia.com/cuda/pdf/ptx_isa_4.3.pdf
    [15] J. Leng, T. Hetherington, A. Eltantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, “GPUWattch : Enabling energy optimizations in GPGPUs,” in Proc. of the 40th Annual International Symposium on Computer Architecture (ISCA '13), New York, NY, USA , June 2013, pp. 487-498.
    [16] S. Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D. M. Tullsen, and N.P. Jouppi, “McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures,” in Proc. of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, 2009, pp. 469-480.
    [17] S. Thoziyoor, J. Ahn, M. Monchiero, J. Brockman, and N. Jouppi, "A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies," in 35th International Symposium on Computer Architecture, pp.51-62, 21-25 June 2008.
    [18] Gallium 3D. TGSI, Tungsten Graphics Shader Infrastructure. [Online] Available: http://gallium.readthedocs.org/en/latest/tgsi.html
    [19] H.Y. Cheng, “An HSAIL conformed GPU platform,” master thesis, National Cheng Kung University, Tainan, Taiwan, 2015.
    [20] Intel Corporation. (2015) The Compute Architecture of Intel® Processor Graphics Gen9 [Online] Available:
    https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf
    [21] AMD Inc. (2012) White paper: AMD Graphics Cores Next (GCN) Architecture. [Online] Available: https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
    [22] AMD Inc. CL Offline Compiler: Compile OpenCL kernels to HSAIL. [Online] Available: https://github.com/HSAFoundation/CLOC
    [23] NVIDIA Corporation. NV_gpu_program4. [Online] Available:
    https://www.opengl.org/registry/specs/NV/gpu_program4.txt
    [24] NVIDIA Corporation. Cg Toolkit. [Online] Available:
    https://developer.nvidia.com/cg-toolkit
    [25] Y.C. Huang, “Dynamic SIMD re-convergence with paired-path comparison,” master thesis, National Cheng Kung University, Tainan, Taiwan, 2015.
    [26] J.Y. Liou and C.H Chen, “Re-visit blocking texture cache design for modern GPU,” 11th Int. SoC Design Conference (ISOCC), Jeju, Korea, November 2014, pp. 288-289.
    [27] X.Org Foundation. [Online] Available: http://www.x.org/wiki/
    [28] GLUT - The OpenGL Utility Toolkit. [Online] Available:
    https://www.opengl.org/resources/libraries/glut/
    [29] GLFW - An OpenGL library. [Online] Available: http://www.glfw.org/
    [30] SFML: Simple and Fast Multimedia Library. [Online] Available: http://www.sfml-dev.org/
    [31] J. Leech. (2005) OpenGL(R) Graphics with the X Window System(R) (Version 1.4). [Online] Available: https://www.opengl.org/registry/doc/glx1.4.pdf
    [32] AMD Inc. APP SDK - A Complete Development Platform. [Online] Available: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
    [33] K. Zhou, X. Wang, Y. Tong, M. Desbrun, B. Guo and H. Shum, “Texture Montage: Seamlessly Texturing of Arbitrary Surfaces From Multiple Images”, ACM Trans. on Graphics, vol. 24, No. 3, pp. 1148-1155, 2005.
    [34] T.G. Roger, M. O’Connor, and T.M. Aamodt. “Cache-Conscious Wavefront Scheduling,” in Proc. of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45), Washington, DC, USA, Dec 2012, pp. 72-83.
    [35] S. Molnar, M. Cox, D. Ellsworth, and H. Fuchs. 1994, "A Sorting Classification of Parallel Rendering.", in Computer Graphics and Applications, IEEE, vol.14, no.4, pp.23-32, July 1994.
    [36] H. Gouraud, "Continuous Shading of Curved Surfaces," in IEEE Transactions on Computers, vol.C-20, no.6, pp.623-629, June 1971.
    [37] B.T. Phong. "Illumination for computer generated pictures." Communications of the ACM, vol.18.6, pp. 311-317, June 1975.

    下載圖示 校內:2021-02-15公開
    校外:2021-02-15公開
    QR CODE