| 研究生: |
陳加恩 Chen, Ja-En |
|---|---|
| 論文名稱: |
應用通訊避免演算法執行GMRES數值解法於計算力學之分析 Application of GMRES ( General Minimized RESidual ) method for computational mechanics applications using communication avoiding algorithms |
| 指導教授: |
曾建洲
Tseng, Chien-Chou |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 178 |
| 中文關鍵詞: | 廣義最小殘值法(GMRES) 、通訊避免演算法 、MPI-CUDA叢集電腦 、Open MPI通用訊息傳遞介面 、圖形顯示卡(GPU) |
| 外文關鍵詞: | General Minimal Residual method (GMRES), Communication Avoiding algorithm, MPI-CUDA cluster, Open MPI, Graphical processing Unit (GPU) |
| 相關次數: | 點閱:120 下載:6 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文的目標是要分析適用於計算力學求解之平行化廣義最小殘值(GMRES)法,論文中所寫的程式架構為使用Open MPI做為使用C語言的平行介面。在本研究中共測試了三種計算力學中的係數矩陣以作為平行解法的測試,分別有二維穩態熱傳之係數矩陣、使用SIMPLE法求解穴流問題中Navier-Stokes方程時所求解的速度係數矩陣、使用有限元素法在固體力學中的剛性係數矩陣。
二維穩態熱傳為Laplace方程差分離散格式。使用SIMPLE求解二維穩態穴流場分佈其中共需解三個線性系統,分別求解兩個維度的速度和流場壓力。在研究中測試了將求解x方向速度的線性系統交由GMRES求解。其中有限元素法使用常應變三角形元素模擬中間有圓孔洞的薄平板以及T字型帶有倒圓角之薄板的案例。在模擬中計算其應力集中係數,且每個三角形元素的von Mises應力都會被計算並且從中找出最大值作為近似的最大應力。另外,為了改進使用一般桌上型電腦的硬體架構所限制的解法最大運算量,論文中亦發展了由MATLAB能直接產出對應於有限元素法之剛性係數矩陣的CSR壓縮形式之方法。
共軛梯度法(CG)在論文作為廣義最小殘值法在誤差收斂速度上的對照。另外使用不同迭代重置間隔的廣義最小殘值法也互相進行了比較,以進一步了解較佳的間隔次數。在GMRES的平行化中,為了降低不同平行線程之間的通訊所造成之延遲,在論文中也探討了建構並使用牛頓基底的另一種GMRES對平行效率的影響。
The goal of this research is the development of a distributed parallel implementation of the general minimal residual (GMRES) solver with the target application of computational mechanics. The program of the solver is aimed to be a two-layered parallel structure, it could have the potential to be implemented on the usage of multi-GPU. The first layer employs OpenMPI (or simply MPI) for management of communications between distributed computing nodes. The second layer of parallelization is the use of a local GPU device for heterogeneous parallel computation using Nvidia’s CUDA. For the sake of simplicity, the investigated Finite Element Method is a stiffness-method based approach for solving the system Kx = F where the stiffness matrix K is computed using a Rayleigh-Ritz method based on a linear shape function, i.e. a Constant Strain Triangle (CST) formulation. The data from the stiffness matrix is stored using a Compressed Sparse Row (CSR) method for eliminating the need to store zero values in the sparse matrix. In this study, this stiffness method is first generated using MATLAB and later exported to the GPU for solution.
The accuracy of the GMRES implementation is tested through several benchmarks. 2D heat transfer being discretized from the Laplace’s equation is tested as benchmark for Newton basis GMRES. The lid-driven cavity solved using SIMPLE is also used to be the benchmark. Finally, the solver is applied to the computation of stress concentration factors for several flat plates using multiple Nvidia GPU devices distributed across an MPI cluster. First, the stress concentration factors of a flat plate with round circular hole located at middle and a T-shape flat plate with round fillet are computed for various loading conditions and geometries. With each different radius, the von Mises stress of each element is calculated and used to create a stress concentration relationship.
The convergence performance of conjugate gradient method (CG) is compared with GMRES since our stiffness matrices are symmetric. GMRES with different restart is also investigated to find the best number for restart. To further investigate the possibility of decreasing the overhead cause by communication between threads, Newton basis GMRES and its communication avoiding algorithm is also studied.
[1]J. Dongarra, "An Overview of High Performance Computing and Challenges for the Future," in VECPAR, 2008, p. 1.
[2]D. B. Kirk and W. H. Wen-Mei, Programming massively parallel processors: a hands-on approach: Morgan kaufmann, 2016.
[3]J. Bolz, I. Farmer, E. Grinspun, and P. Schröoder, "Sparse matrix solvers on the GPU: conjugate gradients and multigrid," ACM transactions on graphics (TOG), vol. 22, pp. 917-924, 2003.
[4]E. J. Kelmelis, J. R. Humphrey, J. P. Durbano, and F. E. Ortiz, "Accelerated modeling and simulation with a desktop supercomputer," in Enabling Technologies for Simulation Science X, 2006, p. 62270N.
[5]A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C.-L. Wang, "Heterogeneous computing: Challenges and opportunities," Computer, vol. 26, pp. 18-27, 1993.
[6]J. L. Gustafson, "Reevaluating Amdahl's law," Communications of the ACM, vol. 31, pp. 532-533, 1988.
[7]J. Sanders and E. Kandrot, CUDA by example: an introduction to general-purpose GPU programming: Addison-Wesley Professional, 2010.
[8]P. G. Ciarlet, B. Miara, and J.-M. Thomas, Introduction to numerical linear algebra and optimisation: Cambridge University Press, 1989.
[9]Z.-C. Li, C.-S. Chien, and H.-T. Huang, "Effective condition number for finite difference method," Journal of computational and applied mathematics, vol. 198, pp. 208-235, 2007.
[10]A. K. Cline, C. B. Moler, G. W. Stewart, and J. H. Wilkinson, "An estimate for the condition number of a matrix," SIAM Journal on Numerical Analysis, vol. 16, pp. 368-375, 1979.
[11]J. H. Wilkinson, The algebraic eigenvalue problem vol. 87: Clarendon Press Oxford, 1965.
[12]N. Jamil, "A comparison of direct and indirect solvers for linear systems of equations," International Journal of Emerging Sciences, vol. 2, pp. 310-321, 2012.
[13]Y. Saad, "Communication complexity of the Gaussian elimination algorithm on multiprocessors," Linear Algebra and its Applications, vol. 77, pp. 315-340, 1986.
[14]N. Galoppo, N. K. Govindaraju, M. Henson, and D. Manocha, "LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware," in Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005, p. 3.
[15]D. Coppersmith and S. Winograd, "Matrix multiplication via arithmetic progressions," in Proceedings of the nineteenth annual ACM symposium on Theory of computing, 1987, pp. 1-6.
[16]H. C. Elman, "Iterative methods for large, sparse, nonsymmetric systems of linear equations," Yale University New Haven, Conn, 1982.
[17]A. Chronopoulos and C. W. Gear, "s-Step iterative methods for symmetric linear systems," Journal of Computational and Applied Mathematics, vol. 25, pp. 153-168, 1989.
[18]G. H. Golub and C. F. Van Loan, Matrix computations vol. 3: JHU Press, 2012.
[19]Y. Saad and M. H. Schultz, "GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems," SIAM Journal on scientific and statistical computing, vol. 7, pp. 856-869, 1986.
[20]V. Frayssé, L. Giraud, S. Gratton, and J. Langou, "Algorithm 842: A set of GMRES routines for real and complex arithmetics on high performance computers," ACM Transactions on Mathematical Software (TOMS), vol. 31, pp. 228-238, 2005.
[21]I. C. Ipsen and C. D. Meyer, "The idea behind Krylov methods," American Mathematical Monthly, pp. 889-899, 1998.
[22]P. N. Brown, "A theoretical comparison of the Arnoldi and GMRES algorithms," SIAM Journal on Scientific and Statistical Computing, vol. 12, pp. 58-78, 1991.
[23]Z. Bai, D. Hu, and L. Reichel, "A Newton basis GMRES implementation," IMA Journal of Numerical Analysis, vol. 14, pp. 563-581, 1994.
[24]D. Calvetti, J. Petersen, and L. Reichel, "A parallel implementation of the GMRES method," L. Reichel, A. Ruttan and RS Varga, de Gruyter, Berlin, pp. 31-45, 1993.
[25]E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, et al., "Open MPI: Goals, concept, and design of a next generation MPI implementation," in European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, 2004, pp. 97-104.
[26]J. Erhel, "A parallel GMRES version for general sparse matrices," Electronic Transactions on Numerical Analysis, vol. 3, pp. 160-176, 1995.
[27]H. Huang, L. Wang, E.-J. Lee, and P. Chen, "An MPI-CUDA implementation and optimization for parallel sparse equations and least squares (LSQR)," Procedia Computer Science, vol. 9, pp. 76-85, 2012.
[28]A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks," in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, 2009, pp. 233-244.
[29]K. He, S. X.-D. Tan, H. Zhao, X.-X. Liu, H. Wang, and G. Shi, "Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms," INTEGRATION, the VLSI journal, vol. 52, pp. 10-22, 2016.
[30]R. D. Cook, Concepts and applications of finite element analysis: John Wiley & Sons, 2007.
[31]J. Fish and T. Belytschko, "A first course in finite elements," 2007.
[32]H. Bang and Y. W. Kwon, The finite element method using MATLAB: CRC press, 2000.
[33]J. Langou, "Computing the R of the QR factorization of tall and skinny matrices using MPI_Reduce," arXiv preprint arXiv:1002.4250, 2010.
[34]C. Lipson and R. C. Juvinall, Handbook of stress and strength: design and material applications: Macmillan, 1963.