簡易檢索 / 詳目顯示

研究生: 曼亞倫
MANN, Allan
論文名稱: 應用開放計算語言及圖形處理器於結構有限元素分析
Static and Dynamic Finite Element Analysis Using Parallel Programming on GPU with OpenCL
指導教授: 崔兆棠
CHOI, Siu-Tong
共同指導教授: 李汶樺
Matthew SMITH
學位類別: 碩士
Master
系所名稱: 工學院 - 航空太空工程學系
Department of Aeronautics & Astronautics
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 128
外文關鍵詞: GPU, OpenCL, Finite Element, Static analysis, Dynamic analysis
相關次數: 點閱:116下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Modern engineering systems are becoming increasingly complex, there is a need of more and more degrees of freedom to simulate their accurate behavior. Due to the limitation in computing capabilities, there is an urge of finding a new concept for simulating large problems. The Graphics Processing Units (GPU) were originally designed for offloading graphical display but with the improved capabilities of GPU, the use of GPU for General Purpose Programming (GPGPU) has been noticed. Due to the parallel architecture which allows the concurrency of the tasks, GPU can be used for solving large systems of equations. Implementation of the Finite Element Method on GPU architecture is quite straightforward because this method deals with linear equations or ordinary differential equations and when a problem has a large number of degrees of freedom, the use of GPU capabilities can significantly decrease the computation time. Two specific examples are the transient analysis and the static analysis which can reach higher performance using parallel programming. By using appropriate numerical methods and techniques for matrix-vectors operations such as the Conjugate Gradient method, the solution of a linear system using an AMD HD 7970 can be found 11 times faster than using an Intel i7. By using an appropriate precondition matrix in the Preconditioned Conjugate Gradient method (PCG), the solution can be found 14 times faster for a problem involving 50000 degrees of freedom. Transient analysis of 3D problems show that using the Newmark method of integration with about 50000 degrees of freedom, the solution can be found 18 times faster on the AMD HD 7970. Such high levels of performance are unable with current processors (CPU) even the most powerful ones.

    1 Introduction and Motivations 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Specifi c Objectives of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Finite Element Method in Engineering 4 2.1 The Rayleigh-Ritz Method in Finite Element Approach . . . . . . . . . . . . . . . . 4 2.2 Shape Functions in Finite Element Analysis . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Stif fness Matrix Computation using Shape Functions . . . . . . . . . . . . . . . . . 6 2.4 Transient Analysis of a Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 Lumped Mass Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.2 Consistent Mass Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.3 Damping Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.4 Eigenvalues and Eigenvectors in Transient Analysis . . . . . . . . . . . . . . 19 3 Finite Element Method Using Parallel Computation of Data 20 3.1 Main Diff erences between CPU and GPU . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Notes about Hardwares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Parallel Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Brief Introduction to Open CL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.2 Thread Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.3 Thread Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Numerical Methods of Computation and Optimization Techniques 31 4.1 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 The Newmark Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 The Preconditioned Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . 33 4.3.1 Incomplete Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.2 Jacobi Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.3 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.4 Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.5 Identity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4 The Compressed Sparse Row Method . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5 Implementation on GPU Architecture 39 5.1 Generalities about Writing a GPU Program . . . . . . . . . . . . . . . . . . . . . . 39 5.2 The Preconditioned Conjugate Gradient Method and Parallel Computing . . . . . . 44 5.3 Transient Response Analysis Using Parallel Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6 Results and Discussion 52 6.1 Error Comparison of Di fferent Finite Elements . . . . . . . . . . . . . . . . . . . . . 52 6.2 Solution of Static Equation of Mechanics . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.1 2D Problem: Plate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.2 2D Problem: Spanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.2.3 3D Problem: Slender Beam . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Solution of Dynamic Equation of Mechanics . . . . . . . . . . . . . . . . . . . . . . 64 6.3.1 Transient Analysis of a Short Beam (Ref [22]) . . . . . . . . . . . . . . . . . 64 6.3.2 Eigenfrequencies and Mode Shapes . . . . . . . . . . . . . . . . . . . . . . . 66 6.3.3 Discussion of the Mode Shapes Analysis . . . . . . . . . . . . . . . . . . . . 72 7 Conclusion and Recommendations for Future Work 73 7.1 Validity of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.1.1 Analytical Deflection of a Beam . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.1.2 Choice of the Mass Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.1.3 Conclusions on the Static Solver . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.1.4 Conclusions on the Precondition for the PCG Method . . . . . . . . . . . . . 75 7.1.5 Conclusions on OpenCL Performance . . . . . . . . . . . . . . . . . . . . . . 75 7.1.6 Conclusions on the Dynamic Solver . . . . . . . . . . . . . . . . . . . . . . . 75 7.1.7 Conclusions on the research project . . . . . . . . . . . . . . . . . . . . . . . 76 7.2 Speed-up Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.3 Potential Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Bibliography 79 Appendices 81 A Error Comparison for Di fferent Boundary Conditions 82 B Convergence Results of Di fferent Type of Precondition 86 C Algorithm for Implementing the CG Method on GPU 87 D Algorithm for Implementing the PCG Method on GPU 90 E Algorithm for Implementing the Newmark Method on GPU 93 F Static Analysis Using Comsol 97 F.1 Static Analysis of a Plate, Spanner and Slender Beam . . . . . . . . . . . . . . . . . 97 G Modes Shapes Results using Comsol for Benchmark Problems 100 G.1 Modes Shapes Results using Comsol for Benchmark 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 G.2 Modes Shapes Results using Comsol for Benchmark 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 G.3 Modes Shapes Results using Comsol for Rao (Ref [1]) . . . . . . . . . . . . . . . . . 104 G.4 Modes Shapes Results using Comsol for A Fixed-Fixed Beam Problem . . . . . . . 106 H Transient Analysis with Comsol Multiphysics 108 I OpenCL Code for Static and Dynamic Analysis 112

    [1] S. S. Rao, The Finite element method in engineering: Fifth Edition, Waltham MA: Else-
    vier/Butterworth Heinemann, 2011
    [2] L. J. Segerlind, Applied Finite element analysis, Michigan State University, 1976
    [3] N. Kikuchi, Finite element methods in mechanics, Press syndicate of the university of Cambridge,
    1986
    [4] A. Munshi, B. Gaster, T. G. Mattson, J. Fung and D. Ginsburg, OpenCL Programming Guide,
    Addison-Wesley professionnal,2012
    [5] J. Sanders and E. Kandrot, CUDA BY EXAMPLE: An introduction to General-Purpose GPU pro-
    gramming, Addison-Wesley professionnal, 16/07/2010
    [6] D.Weber, J. Bender, M. Schnoes, A. Stork and D. Fellner, Efficient GPU data structures and methods
    to solve sparse linear systems in dynamics applications, Computer Graphics Forum, Wiley Online
    Library, 2012
    [7] C. Cecka, A. Lew and E. Darve, Assembly of Finite element methods on graphics processors, Interna-
    tional journal for numerical methods in engineering, vol. 85, no. 5, pp. 640-669, 2011
    [8] R. Ma , S. Sirouspour, B. Mahdavikhah, B. Moody, K. Elizeh, A. Kinsman and N. Nicolici, A parallel
    computing platform for real-time haptic interaction with deformable bodies, IEEE Transactions on
    Haptics, vol. 3, no. 3, pp. 211-223, July/September 2010
    [9] B. Mahdavikhah, R. Ma , S. Sirouspour and N. Nicolici, Haptic rendering of deformable objects
    using a multiple FPGA parallel computing architecture, FPGA 10: Proceedings of the 18th annual
    ACM/SIGDA international symposium on Field programmable gate arrays, 2010
    [10] A. Cevahir, A. Nukada, and S.Matsuoka, High performance conjugate gradient solver on multi-GPU
    clusters using hypergraph partitioning, Computer Science-Research and Development, vol. 25, no. 1/2,
    pp. 83-91, 2010
    [11] S. Georgescu and H. Okuda, Conjugate gradients on multiple GPUs, International Journal for Nu-
    merical Methods in Fluids, vol. 64, no. 10-12, pp. 1254-1273, 2010
    [12] M. Ament, G. Knittel, D. Weiskopf and W. Strasser, A parallel preconditioned conjugate gradient
    solver for the poisson problem on a multi-GPU platform, Parallel, Distributed and Network-Based
    Processing (PDP), 2010 18th Euromicro International Conference on IEEE, pp. 583-592, 2010
    [13] Y. Liu, W. Zhou and Q. Yang, A distributed memory parallel element-by-element scheme based on
    jacobi-conditioned conjugate gradient for 3D finite element analysis, Finite Elements in Analysis and
    Design, vol. 43, no. 6, pp.494-503, 2007
    [14] J. Allard, H. Courtecuisse and F. Faure, Implicit FEM solver on GPU for interactive deformation
    simulation, GPU Computing Gems Jade Edition, Morgan Kaufmann Publishers, San Francisco, CA,
    USA, ch. 21, pp. 281-294, 2012
    [15] Origin 8 User Guide, OriginLab Corporation, 2007
    [16] Introduction to Comsol Multiphysics, Version 4.4, December 2013
    [17] K. J. Bathe and E. L. Wilson, Numerical methods in finite element analysis, US Edition, 1975
    [18] N. Bell and M. Garland, Efficient Sparse Matrix- Vector Multiplication on CUDA NVIDIA Technical
    Report Dec. 2008: NVR-2008- 004
    [19] A.I. Khan and B.H.V. Topping, Parallel finite element analysis using Jacobi-conditioned conjugate
    gradient algorithm, Advances in Engineering Software, Elsevier, March/April 1996
    [20] J.S. Kowalik and S.P. Kumar, An efficient parallel block conjugate gradient method for linear equa-
    tions, Proc. 1982 International Conference on Parallel Processing, ICPR 93, pp. 47-52, August 1982
    [21] B. Gaster, Heterogeneous computing with OpenCL, Waltham MA : Morgan Kaufmann, 2012
    [22] E. H. Dill, The Finite Element Method for Mechanics of Solids with ANSYS Applications, CRC
    Press, Taylor and Francis Group, 2012

    下載圖示 校內:2016-06-27公開
    校外:2016-06-27公開
    QR CODE