| 研究生: |
林煌程 Lin, Huang-Cheng |
|---|---|
| 論文名稱: |
應用CUDA及OpenGL於有限元素分析 Development of an Integrated CUDA / OpenGL Finite Element Method (FEM) Analysis Tool |
| 指導教授: |
李汶樺
Matthew R. Smith |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 英文 |
| 論文頁數: | 96 |
| 中文關鍵詞: | 有限元素法 、圖形處理器 、平行計算 、開放圖形庫 、共軛梯度法 、線性系統 |
| 外文關鍵詞: | Finite Element Method (FEM), Graphics Processing Units (GPU), CUDA, OpenGL, Conjugate Gradient, Linear Systems |
| 相關次數: | 點閱:132 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
同時運用中央處理器及圖形處理器運算複雜的工程問題已經成為現代電腦科學的趨勢。本研究運用開放圖形庫以及圖形處理器進行平行運算求解有限元素問題。研究方法主要分為三個方向: 1.以瑞雷-瑞茲為基礎的有限元素法推導出有限單元結構在三維空間的位移量和應力分佈。2.使用共軛梯度法並配合平行化語言和雅可比預選矩陣和壓縮稀疏矩陣存放型式加速求解線性系統並討論其效能。3.應用開放圖形庫繪出空間幾何圖形來選取邊界條件進一步利用平行化演算法求解。有限元素採用線性四面體分析三維空間簡支樑形變問題以及自行車碟煞盤在受力時的應力分佈情形,並比較這些問題在不同處理器上的加速效能。在本篇研究中使用的單一核心中央處理器為英特爾i3-2120,使用的圖形處理器分別為輝達之Tesla C2075以及 GTX Titan。在計算的加速效能結果顯示,運算簡支樑問題時最高平行化加速效能為11.51倍,使用了33214個有限元素。運算碟煞盤問題時最高平行化加速效能為10.43倍。
The development of an integrated CUDA / OpenGL Finite Element Method (FEM) analysis tool which performs real-time computation of finite element problems is presented. The analysis tool can be broken down into three key parts: (a) the formulation of the displacement and stress field using a Rayleigh-Ritz based FEM approach, (b) parallel solution of the resulting linear system of equations using the Conjugate Gradient (CG) method accelerated using custom-written CUDA kernels, and (c) the presentation of geometry and boundary conditions using hardware accelerated graphics rendering through the application of OpenGL. For simplicity, the FEM solution employed is based on linear tetrahedral elements (Constant Strain Triangles, or CST’s), though the solution can be extended to higher order without modification to the core solver kernels. Nvidia’s Compute Unified Device Architecture (CUDA) is applied for the parallelization of the various components of the CG calculation using several Graphics Processing Units (GPU’s). The best reported speedup when compared to a single CPU core is 11.51x for a simple benchmark problem using 33214 finite elements. The tool is then applied to a simple case study for design of a bicycle frame supporting a disc brake. For the case study presented, the performance increase of 10.43x allows students / engineers to make quick evaluations to designs, permitting increased design turnaround times.
[1] G.M. Amdahl, Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities, AFIPS '67 (Spring) Proceedings, Vol. 30, pp. 473-496, 1983.
[2] J.L. Gustafson, Reevaluating Amdahl’s Law, Communications of the ACM, Vol. 31, Number 5, pp. 532-533, 1988.
[3] D.B. Kirk and W.W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Newnes, pp. 41-42, 2010.
[4] J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Addison-Wesley Professional, 2010.
[5] S. Cook, CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, Newnes, 2012.
[6] N. Jamil, A Comparison of Direct and Indirect Solvers for Linear Systems of Equations, New Zealand, 2012.
[7] A.T. Chronopoulos and C.W. Gear , s-step iterative methods for symmetric linear systems, Journal of Computational and Applied Mathematics, Vol. 25, pp. 153-168, 1989.
[8] O. Kolditz, Computational Methods in Environmental Fluid Mechanics, Springer Science & Business Media, pp. 132-134, 2002.