成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林煌程 Lin, Huang-Cheng
論文名稱：	應用CUDA及OpenGL於有限元素分析 Development of an Integrated CUDA / OpenGL Finite Element Method (FEM) Analysis Tool
指導教授：	李汶樺 Matthew R. Smith
學位類別：	碩士 Master
系所名稱：	工學院 - 機械工程學系 Department of Mechanical Engineering
論文出版年：	2015
畢業學年度：	103
語文別：	英文
論文頁數：	96
中文關鍵詞：	有限元素法、圖形處理器、平行計算、開放圖形庫、共軛梯度法、線性系統
外文關鍵詞：	Finite Element Method (FEM), Graphics Processing Units (GPU), CUDA, OpenGL, Conjugate Gradient, Linear Systems
相關次數：	點閱：132 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

同時運用中央處理器及圖形處理器運算複雜的工程問題已經成為現代電腦科學的趨勢。本研究運用開放圖形庫以及圖形處理器進行平行運算求解有限元素問題。研究方法主要分為三個方向: 1.以瑞雷-瑞茲為基礎的有限元素法推導出有限單元結構在三維空間的位移量和應力分佈。2.使用共軛梯度法並配合平行化語言和雅可比預選矩陣和壓縮稀疏矩陣存放型式加速求解線性系統並討論其效能。3.應用開放圖形庫繪出空間幾何圖形來選取邊界條件進一步利用平行化演算法求解。有限元素採用線性四面體分析三維空間簡支樑形變問題以及自行車碟煞盤在受力時的應力分佈情形，並比較這些問題在不同處理器上的加速效能。在本篇研究中使用的單一核心中央處理器為英特爾i3-2120，使用的圖形處理器分別為輝達之Tesla C2075以及 GTX Titan。在計算的加速效能結果顯示，運算簡支樑問題時最高平行化加速效能為11.51倍，使用了33214個有限元素。運算碟煞盤問題時最高平行化加速效能為10.43倍。

The development of an integrated CUDA / OpenGL Finite Element Method (FEM) analysis tool which performs real-time computation of finite element problems is presented. The analysis tool can be broken down into three key parts: (a) the formulation of the displacement and stress field using a Rayleigh-Ritz based FEM approach, (b) parallel solution of the resulting linear system of equations using the Conjugate Gradient (CG) method accelerated using custom-written CUDA kernels, and (c) the presentation of geometry and boundary conditions using hardware accelerated graphics rendering through the application of OpenGL. For simplicity, the FEM solution employed is based on linear tetrahedral elements (Constant Strain Triangles, or CST’s), though the solution can be extended to higher order without modification to the core solver kernels. Nvidia’s Compute Unified Device Architecture (CUDA) is applied for the parallelization of the various components of the CG calculation using several Graphics Processing Units (GPU’s). The best reported speedup when compared to a single CPU core is 11.51x for a simple benchmark problem using 33214 finite elements. The tool is then applied to a simple case study for design of a bicycle frame supporting a disc brake. For the case study presented, the performance increase of 10.43x allows students / engineers to make quick evaluations to designs, permitting increased design turnaround times.

Nomenclature	viii
List of Figures / Tables	ix
Chapter 1 - Introduction to the Finite Element Method	1
1.1 Background and Motivation	1
1.2 Shape functions for 1-D elements	1
1.3	Shape functions for 3-D Tetrahedral Elements	2
1.4 Stiffness matrix for 3-D Tetrahedral Elements	4
Chapter 2 - Parallel computing using Graphics Processing Units (GPU)	8
2.1 Theory of Parallel Computation	8
2.2 CUDA Program Structure	13
2.3 Memory Management in CUDA	14
Allocating Memory in CUDA	16
Host-Device Memory Transfer in CUDA	18
2.4 CUDA Threads	19
One-Dimensional Blocks with One-Dimensional threads	20
One-Dimensional Blocks with Two-Dimensional threads	21
Two-Dimensional Blocks with Two-Dimensional threads	22
Two-Dimensional Blocks with Three-Dimensional threads	23
Chapter 3 - Sparse Linear Systems	25
3.1 System Singularity and Condition Number	25
3.2 Direct Solution Techniques	27
Gaussian Elimination	27
LU decomposition	29
3.3 Iterative Solution Techniques	30
Jacobi Iteration	31
Conjugate Gradient (CG) Method	32
3.4 Sparse Matrix Storage Techniques	33
Coordinate Storage (COO) Format	33
The Compressed Sparse Row (CSR) Format	34
Chapter 4 - Open Graphics Library (OpenGL)	36
4.1 OpenGL Graphics Pipeline	36
4.2 OpenGL 2.0 Primitives	39
4.3 OpenGL 2.0 Lighting	41
4.4 GL Utility Library (GLUT)	43
Chapter 5 – Methodology	45
5.1 The Rayleigh-Ritz Method	45
5.2 GPU Parallelization	46
Memory Management	46
Construction of the Global Stiffness Matrix	47
The Preconditioned Conjugate Gradient Method Implementation	47
Chapter 6 – Benchmark and Case-Study Results	51
6.1 Benchmark 1 – Tensile Beam	51
6.2 Benchmark 2 – Bending Cantilever	51
6.3 Case Study – Disc Brake Design	52
Chapter 7 – Parallel Computing Performance and Discussion	54
Chapter 8 – Conclusion	62
References	64
Figures	65
Tables	88

                                    

[1] G.M. Amdahl, Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities, AFIPS '67 (Spring) Proceedings, Vol. 30, pp. 473-496, 1983.
[2] J.L. Gustafson, Reevaluating Amdahl’s Law, Communications of the ACM, Vol. 31, Number 5, pp. 532-533, 1988.
[3] D.B. Kirk and W.W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Newnes, pp. 41-42, 2010.
[4] J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Addison-Wesley Professional, 2010.
[5] S. Cook, CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, Newnes, 2012.
[6] N. Jamil, A Comparison of Direct and Indirect Solvers for Linear Systems of Equations, New Zealand, 2012.
[7] A.T. Chronopoulos and C.W. Gear , s-step iterative methods for symmetric linear systems, Journal of Computational and Applied Mathematics, Vol. 25, pp. 153-168, 1989.
[8] O. Kolditz, Computational Methods in Environmental Fluid Mechanics, Springer Science & Business Media, pp. 132-134, 2002.

校內：2016-02-17公開
校外：2016-02-17公開

簡易檢索 / 詳目顯示

相關論文