成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	曼亞倫 MANN, Allan
論文名稱：	應用開放計算語言及圖形處理器於結構有限元素分析 Static and Dynamic Finite Element Analysis Using Parallel Programming on GPU with OpenCL
指導教授：	崔兆棠 CHOI, Siu-Tong
共同指導:	李汶樺 Matthew SMITH
學位類別：	碩士 Master
系所名稱：	工學院 - 航空太空工程學系 Department of Aeronautics & Astronautics
論文出版年：	2014
畢業學年度：	102
語文別：	英文
論文頁數：	128
外文關鍵詞：	GPU, OpenCL, Finite Element, Static analysis, Dynamic analysis
相關次數：	點閱：182 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Modern engineering systems are becoming increasingly complex, there is a need of more and more degrees of freedom to simulate their accurate behavior. Due to the limitation in computing capabilities, there is an urge of finding a new concept for simulating large problems. The Graphics Processing Units (GPU) were originally designed for offloading graphical display but with the improved capabilities of GPU, the use of GPU for General Purpose Programming (GPGPU) has been noticed. Due to the parallel architecture which allows the concurrency of the tasks, GPU can be used for solving large systems of equations. Implementation of the Finite Element Method on GPU architecture is quite straightforward because this method deals with linear equations or ordinary differential equations and when a problem has a large number of degrees of freedom, the use of GPU capabilities can significantly decrease the computation time. Two specific examples are the transient analysis and the static analysis which can reach higher performance using parallel programming. By using appropriate numerical methods and techniques for matrix-vectors operations such as the Conjugate Gradient method, the solution of a linear system using an AMD HD 7970 can be found 11 times faster than using an Intel i7. By using an appropriate precondition matrix in the Preconditioned Conjugate Gradient method (PCG), the solution can be found 14 times faster for a problem involving 50000 degrees of freedom. Transient analysis of 3D problems show that using the Newmark method of integration with about 50000 degrees of freedom, the solution can be found 18 times faster on the AMD HD 7970. Such high levels of performance are unable with current processors (CPU) even the most powerful ones.

1 Introduction and Motivations 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Specific Objectives of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Finite Element Method in Engineering 4
2.1 The Rayleigh-Ritz Method in Finite Element Approach . . . . . . . . . . . . . . . . 4
2.2 Shape Functions in Finite Element Analysis . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Stiffness Matrix Computation using Shape Functions . . . . . . . . . . . . . . . . . 6
2.4 Transient Analysis of a Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Lumped Mass Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Consistent Mass Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.3 Damping Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.4 Eigenvalues and Eigenvectors in Transient Analysis . . . . . . . . . . . . . . 19
3 Finite Element Method Using Parallel
Computation of Data 20
3.1 Main Differences between CPU and GPU . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Notes about Hardwares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Parallel Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Brief Introduction to Open CL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Thread Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Thread Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Numerical Methods of Computation and Optimization Techniques 31
4.1 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 The Newmark Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 The Preconditioned Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Incomplete Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.2 Jacobi Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.3 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.4 Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.5 Identity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 The Compressed Sparse Row Method . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Implementation on GPU Architecture 39
5.1 Generalities about Writing a GPU Program . . . . . . . . . . . . . . . . . . . . . . 39
5.2 The Preconditioned Conjugate Gradient Method and Parallel Computing . . . . . . 44
5.3 Transient Response Analysis Using Parallel
Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Results and Discussion 52
6.1 Error Comparison of Different Finite Elements . . . . . . . . . . . . . . . . . . . . . 52
6.2 Solution of Static Equation of Mechanics . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.1 2D Problem: Plate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.2 2D Problem: Spanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.3 3D Problem: Slender Beam . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Solution of Dynamic Equation of Mechanics . . . . . . . . . . . . . . . . . . . . . . 64
6.3.1 Transient Analysis of a Short Beam (Ref [22]) . . . . . . . . . . . . . . . . . 64
6.3.2 Eigenfrequencies and Mode Shapes . . . . . . . . . . . . . . . . . . . . . . . 66
6.3.3 Discussion of the Mode Shapes Analysis . . . . . . . . . . . . . . . . . . . . 72
7 Conclusion and Recommendations for
Future Work 73
7.1 Validity of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1.1 Analytical Deflection of a Beam . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1.2 Choice of the Mass Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.1.3 Conclusions on the Static Solver . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.1.4 Conclusions on the Precondition for the PCG Method . . . . . . . . . . . . . 75
7.1.5 Conclusions on OpenCL Performance . . . . . . . . . . . . . . . . . . . . . . 75
7.1.6 Conclusions on the Dynamic Solver . . . . . . . . . . . . . . . . . . . . . . . 75
7.1.7 Conclusions on the research project . . . . . . . . . . . . . . . . . . . . . . . 76
7.2 Speed-up Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.3 Potential Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Bibliography 79
Appendices 81
A Error Comparison for Different
Boundary Conditions 82
B Convergence Results of Different Type of Precondition 86
C Algorithm for Implementing the CG
Method on GPU 87
D Algorithm for Implementing the PCG
Method on GPU 90
E Algorithm for Implementing the Newmark Method on GPU 93
F Static Analysis Using Comsol 97
F.1 Static Analysis of a Plate, Spanner and Slender Beam . . . . . . . . . . . . . . . . . 97
G Modes Shapes Results using Comsol for Benchmark Problems 100
G.1 Modes Shapes Results using Comsol for
Benchmark 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
G.2 Modes Shapes Results using Comsol for
Benchmark 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
G.3 Modes Shapes Results using Comsol for Rao (Ref [1]) . . . . . . . . . . . . . . . . . 104
G.4 Modes Shapes Results using Comsol for A Fixed-Fixed Beam Problem . . . . . . . 106
H Transient Analysis with Comsol
Multiphysics 108
I OpenCL Code for Static and Dynamic Analysis 112
                                    

[1] S. S. Rao, The Finite element method in engineering: Fifth Edition, Waltham MA: Else-
vier/Butterworth Heinemann, 2011
[2] L. J. Segerlind, Applied Finite element analysis, Michigan State University, 1976
[3] N. Kikuchi, Finite element methods in mechanics, Press syndicate of the university of Cambridge,
1986
[4] A. Munshi, B. Gaster, T. G. Mattson, J. Fung and D. Ginsburg, OpenCL Programming Guide,
Addison-Wesley professionnal,2012
[5] J. Sanders and E. Kandrot, CUDA BY EXAMPLE: An introduction to General-Purpose GPU pro-
gramming, Addison-Wesley professionnal, 16/07/2010
[6] D.Weber, J. Bender, M. Schnoes, A. Stork and D. Fellner, Efficient GPU data structures and methods
to solve sparse linear systems in dynamics applications, Computer Graphics Forum, Wiley Online
Library, 2012
[7] C. Cecka, A. Lew and E. Darve, Assembly of Finite element methods on graphics processors, Interna-
tional journal for numerical methods in engineering, vol. 85, no. 5, pp. 640-669, 2011
[8] R. Ma , S. Sirouspour, B. Mahdavikhah, B. Moody, K. Elizeh, A. Kinsman and N. Nicolici, A parallel
computing platform for real-time haptic interaction with deformable bodies, IEEE Transactions on
Haptics, vol. 3, no. 3, pp. 211-223, July/September 2010
[9] B. Mahdavikhah, R. Ma , S. Sirouspour and N. Nicolici, Haptic rendering of deformable objects
using a multiple FPGA parallel computing architecture, FPGA 10: Proceedings of the 18th annual
ACM/SIGDA international symposium on Field programmable gate arrays, 2010
[10] A. Cevahir, A. Nukada, and S.Matsuoka, High performance conjugate gradient solver on multi-GPU
clusters using hypergraph partitioning, Computer Science-Research and Development, vol. 25, no. 1/2,
pp. 83-91, 2010
[11] S. Georgescu and H. Okuda, Conjugate gradients on multiple GPUs, International Journal for Nu-
merical Methods in Fluids, vol. 64, no. 10-12, pp. 1254-1273, 2010
[12] M. Ament, G. Knittel, D. Weiskopf and W. Strasser, A parallel preconditioned conjugate gradient
solver for the poisson problem on a multi-GPU platform, Parallel, Distributed and Network-Based
Processing (PDP), 2010 18th Euromicro International Conference on IEEE, pp. 583-592, 2010
[13] Y. Liu, W. Zhou and Q. Yang, A distributed memory parallel element-by-element scheme based on
jacobi-conditioned conjugate gradient for 3D finite element analysis, Finite Elements in Analysis and
Design, vol. 43, no. 6, pp.494-503, 2007
[14] J. Allard, H. Courtecuisse and F. Faure, Implicit FEM solver on GPU for interactive deformation
simulation, GPU Computing Gems Jade Edition, Morgan Kaufmann Publishers, San Francisco, CA,
USA, ch. 21, pp. 281-294, 2012
[15] Origin 8 User Guide, OriginLab Corporation, 2007
[16] Introduction to Comsol Multiphysics, Version 4.4, December 2013
[17] K. J. Bathe and E. L. Wilson, Numerical methods in finite element analysis, US Edition, 1975
[18] N. Bell and M. Garland, Efficient Sparse Matrix- Vector Multiplication on CUDA NVIDIA Technical
Report Dec. 2008: NVR-2008- 004
[19] A.I. Khan and B.H.V. Topping, Parallel finite element analysis using Jacobi-conditioned conjugate
gradient algorithm, Advances in Engineering Software, Elsevier, March/April 1996
[20] J.S. Kowalik and S.P. Kumar, An efficient parallel block conjugate gradient method for linear equa-
tions, Proc. 1982 International Conference on Parallel Processing, ICPR 93, pp. 47-52, August 1982
[21] B. Gaster, Heterogeneous computing with OpenCL, Waltham MA : Morgan Kaufmann, 2012
[22] E. H. Dill, The Finite Element Method for Mechanics of Solids with ANSYS Applications, CRC
Press, Taylor and Francis Group, 2012

2016-06-27公開

簡易檢索 / 詳目顯示

相關論文