簡易檢索 / 詳目顯示

研究生: 林煌程
Lin, Huang-Cheng
論文名稱: 應用CUDA及OpenGL於有限元素分析
Development of an Integrated CUDA / OpenGL Finite Element Method (FEM) Analysis Tool
指導教授: 李汶樺
Matthew R. Smith
學位類別: 碩士
Master
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 96
中文關鍵詞: 有限元素法圖形處理器平行計算開放圖形庫共軛梯度法線性系統
外文關鍵詞: Finite Element Method (FEM), Graphics Processing Units (GPU), CUDA, OpenGL, Conjugate Gradient, Linear Systems
相關次數: 點閱:132下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 同時運用中央處理器及圖形處理器運算複雜的工程問題已經成為現代電腦科學的趨勢。本研究運用開放圖形庫以及圖形處理器進行平行運算求解有限元素問題。研究方法主要分為三個方向: 1.以瑞雷-瑞茲為基礎的有限元素法推導出有限單元結構在三維空間的位移量和應力分佈。2.使用共軛梯度法並配合平行化語言和雅可比預選矩陣和壓縮稀疏矩陣存放型式加速求解線性系統並討論其效能。3.應用開放圖形庫繪出空間幾何圖形來選取邊界條件進一步利用平行化演算法求解。有限元素採用線性四面體分析三維空間簡支樑形變問題以及自行車碟煞盤在受力時的應力分佈情形,並比較這些問題在不同處理器上的加速效能。在本篇研究中使用的單一核心中央處理器為英特爾i3-2120,使用的圖形處理器分別為輝達之Tesla C2075以及 GTX Titan。在計算的加速效能結果顯示,運算簡支樑問題時最高平行化加速效能為11.51倍,使用了33214個有限元素。運算碟煞盤問題時最高平行化加速效能為10.43倍。

    The development of an integrated CUDA / OpenGL Finite Element Method (FEM) analysis tool which performs real-time computation of finite element problems is presented. The analysis tool can be broken down into three key parts: (a) the formulation of the displacement and stress field using a Rayleigh-Ritz based FEM approach, (b) parallel solution of the resulting linear system of equations using the Conjugate Gradient (CG) method accelerated using custom-written CUDA kernels, and (c) the presentation of geometry and boundary conditions using hardware accelerated graphics rendering through the application of OpenGL. For simplicity, the FEM solution employed is based on linear tetrahedral elements (Constant Strain Triangles, or CST’s), though the solution can be extended to higher order without modification to the core solver kernels. Nvidia’s Compute Unified Device Architecture (CUDA) is applied for the parallelization of the various components of the CG calculation using several Graphics Processing Units (GPU’s). The best reported speedup when compared to a single CPU core is 11.51x for a simple benchmark problem using 33214 finite elements. The tool is then applied to a simple case study for design of a bicycle frame supporting a disc brake. For the case study presented, the performance increase of 10.43x allows students / engineers to make quick evaluations to designs, permitting increased design turnaround times.

    Nomenclature viii List of Figures / Tables ix Chapter 1 - Introduction to the Finite Element Method 1 1.1 Background and Motivation 1 1.2 Shape functions for 1-D elements 1 1.3 Shape functions for 3-D Tetrahedral Elements 2 1.4 Stiffness matrix for 3-D Tetrahedral Elements 4 Chapter 2 - Parallel computing using Graphics Processing Units (GPU) 8 2.1 Theory of Parallel Computation 8 2.2 CUDA Program Structure 13 2.3 Memory Management in CUDA 14 Allocating Memory in CUDA 16 Host-Device Memory Transfer in CUDA 18 2.4 CUDA Threads 19 One-Dimensional Blocks with One-Dimensional threads 20 One-Dimensional Blocks with Two-Dimensional threads 21 Two-Dimensional Blocks with Two-Dimensional threads 22 Two-Dimensional Blocks with Three-Dimensional threads 23 Chapter 3 - Sparse Linear Systems 25 3.1 System Singularity and Condition Number 25 3.2 Direct Solution Techniques 27 Gaussian Elimination 27 LU decomposition 29 3.3 Iterative Solution Techniques 30 Jacobi Iteration 31 Conjugate Gradient (CG) Method 32 3.4 Sparse Matrix Storage Techniques 33 Coordinate Storage (COO) Format 33 The Compressed Sparse Row (CSR) Format 34 Chapter 4 - Open Graphics Library (OpenGL) 36 4.1 OpenGL Graphics Pipeline 36 4.2 OpenGL 2.0 Primitives 39 4.3 OpenGL 2.0 Lighting 41 4.4 GL Utility Library (GLUT) 43 Chapter 5 – Methodology 45 5.1 The Rayleigh-Ritz Method 45 5.2 GPU Parallelization 46 Memory Management 46 Construction of the Global Stiffness Matrix 47 The Preconditioned Conjugate Gradient Method Implementation 47 Chapter 6 – Benchmark and Case-Study Results 51 6.1 Benchmark 1 – Tensile Beam 51 6.2 Benchmark 2 – Bending Cantilever 51 6.3 Case Study – Disc Brake Design 52 Chapter 7 – Parallel Computing Performance and Discussion 54 Chapter 8 – Conclusion 62 References 64 Figures 65 Tables 88

    [1] G.M. Amdahl, Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities, AFIPS '67 (Spring) Proceedings, Vol. 30, pp. 473-496, 1983.
    [2] J.L. Gustafson, Reevaluating Amdahl’s Law, Communications of the ACM, Vol. 31, Number 5, pp. 532-533, 1988.
    [3] D.B. Kirk and W.W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Newnes, pp. 41-42, 2010.
    [4] J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Addison-Wesley Professional, 2010.
    [5] S. Cook, CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, Newnes, 2012.
    [6] N. Jamil, A Comparison of Direct and Indirect Solvers for Linear Systems of Equations, New Zealand, 2012.
    [7] A.T. Chronopoulos and C.W. Gear , s-step iterative methods for symmetric linear systems, Journal of Computational and Applied Mathematics, Vol. 25, pp. 153-168, 1989.
    [8] O. Kolditz, Computational Methods in Environmental Fluid Mechanics, Springer Science & Business Media, pp. 132-134, 2002.

    下載圖示 校內:2016-02-17公開
    校外:2016-02-17公開
    QR CODE