成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳加恩 Chen, Ja-En
論文名稱：	應用通訊避免演算法執行GMRES數值解法於計算力學之分析 Application of GMRES ( General Minimized RESidual ) method for computational mechanics applications using communication avoiding algorithms
指導教授：	曾建洲 Tseng, Chien-Chou
學位類別：	碩士 Master
系所名稱：	工學院 - 機械工程學系 Department of Mechanical Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	178
中文關鍵詞：	廣義最小殘值法(GMRES) 、通訊避免演算法、MPI-CUDA叢集電腦、Open MPI通用訊息傳遞介面、圖形顯示卡(GPU)
外文關鍵詞：	General Minimal Residual method (GMRES), Communication Avoiding algorithm, MPI-CUDA cluster, Open MPI, Graphical processing Unit (GPU)
相關次數：	點閱：250 下載：6
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本篇論文的目標是要分析適用於計算力學求解之平行化廣義最小殘值(GMRES)法，論文中所寫的程式架構為使用Open MPI做為使用C語言的平行介面。在本研究中共測試了三種計算力學中的係數矩陣以作為平行解法的測試，分別有二維穩態熱傳之係數矩陣、使用SIMPLE法求解穴流問題中Navier-Stokes方程時所求解的速度係數矩陣、使用有限元素法在固體力學中的剛性係數矩陣。
二維穩態熱傳為Laplace方程差分離散格式。使用SIMPLE求解二維穩態穴流場分佈其中共需解三個線性系統，分別求解兩個維度的速度和流場壓力。在研究中測試了將求解x方向速度的線性系統交由GMRES求解。其中有限元素法使用常應變三角形元素模擬中間有圓孔洞的薄平板以及T字型帶有倒圓角之薄板的案例。在模擬中計算其應力集中係數，且每個三角形元素的von Mises應力都會被計算並且從中找出最大值作為近似的最大應力。另外，為了改進使用一般桌上型電腦的硬體架構所限制的解法最大運算量，論文中亦發展了由MATLAB能直接產出對應於有限元素法之剛性係數矩陣的CSR壓縮形式之方法。
共軛梯度法(CG)在論文作為廣義最小殘值法在誤差收斂速度上的對照。另外使用不同迭代重置間隔的廣義最小殘值法也互相進行了比較，以進一步了解較佳的間隔次數。在GMRES的平行化中，為了降低不同平行線程之間的通訊所造成之延遲，在論文中也探討了建構並使用牛頓基底的另一種GMRES對平行效率的影響。

The goal of this research is the development of a distributed parallel implementation of the general minimal residual (GMRES) solver with the target application of computational mechanics. The program of the solver is aimed to be a two-layered parallel structure, it could have the potential to be implemented on the usage of multi-GPU. The first layer employs OpenMPI (or simply MPI) for management of communications between distributed computing nodes. The second layer of parallelization is the use of a local GPU device for heterogeneous parallel computation using Nvidia’s CUDA. For the sake of simplicity, the investigated Finite Element Method is a stiffness-method based approach for solving the system Kx = F where the stiffness matrix K is computed using a Rayleigh-Ritz method based on a linear shape function, i.e. a Constant Strain Triangle (CST) formulation. The data from the stiffness matrix is stored using a Compressed Sparse Row (CSR) method for eliminating the need to store zero values in the sparse matrix. In this study, this stiffness method is first generated using MATLAB and later exported to the GPU for solution.
The accuracy of the GMRES implementation is tested through several benchmarks. 2D heat transfer being discretized from the Laplace’s equation is tested as benchmark for Newton basis GMRES. The lid-driven cavity solved using SIMPLE is also used to be the benchmark. Finally, the solver is applied to the computation of stress concentration factors for several flat plates using multiple Nvidia GPU devices distributed across an MPI cluster. First, the stress concentration factors of a flat plate with round circular hole located at middle and a T-shape flat plate with round fillet are computed for various loading conditions and geometries. With each different radius, the von Mises stress of each element is calculated and used to create a stress concentration relationship.
The convergence performance of conjugate gradient method (CG) is compared with GMRES since our stiffness matrices are symmetric. GMRES with different restart is also investigated to find the best number for restart. To further investigate the possibility of decreasing the overhead cause by communication between threads, Newton basis GMRES and its communication avoiding algorithm is also studied.

摘要	i
Abstract	iii
誌謝	vi
Contents	vii
List of Tables	xii
List of Figures	xiii
Nomenclature	xx
Chapter 1	Introduction	1
1	Background survey	1
2	Motivation	2
Chapter 2	Theory of Parallel computing	4
1	Efficiency Theory of parallel computing	4
1.1	Amdahl's Law	6
1.2	Gustafson–Barsis's Law	8
2	Program Structure with the usage of CUDA	9
2.1	Memory in GPU	10
2.2	Memory allocation in CUDA	10
2.3	Message passing in CUDA	11
Chapter 3	Sparse Linear Systems	13
1	Challenges of solving real linear systems	13
1.1	Condition Number	14
1.2	Direct Solvers for Linear Systems	17
2	Iterative Solvers for Large Linear System	21
2.1	Jacobi Iteration	21
2.2	Conjugate Gradient Method (CG)	23
2.3	General Minimized Residual Method (GMRES)	25
2.4	Newton-Basis GMRES	30
3	Storage Methods for Sparse Matrix Storage	31
3.1	Coordinate Storage Format (COO)	32
3.2	Compressed Sparse Row Format (CSR)	32
Chapter 4	Message Passing Interface (MPI)	34
1	Message Passing Interface (MPI)	34
1.1	Message Passing Interface (MPI) API	34
1.2	Designing the size of message passing	35
1.3	MPI Environment (MPE) and the Jumpshot analyzing tool	37
2	Beowulf Cluster using GPU	38
2.1	Environment settings	38
2.2	The Structure of the Cluster	39
3	Basic Parallel Vector Operations	39
3.1	Data Separation of CSR Form Arrays	39
3.2	The Parallel Vector-Vector Dot Product	39
3.3	The Parallel Matrix-Vector Dot Product in CSR Form	40
Chapter 5	Finite Element Methodology	42
1	Finite Element Procedure	42
1.1	The Rayleigh-Ritz Method	42
1.2	2D Constant Strain Triangular (CST) Element	43
1.3	Linear Elastics	45
1.4	Strong Form and Weak Form	48
2	CSR Form for Finite Element	51
2.1	Assembly of Local Stiffness Matrix	51
2.2	Assembly of Global Stiffness Matrix in CSR Form	52
2.3	Reduced Global Stiffness Matrix	55
Chapter 6	Methodology	57
1	CUDA-MPI Algorithm Description	57
2	Data Structure	57
3	Host-Host Communications	58
4	Host-Device Communications	59
5	GMRES Parallelization using multiple CUDA devices	59
5.1	Arnoldi Iteration Part One with Matrix-Vector Production	60
5.2	Arnoldi Iteration Part Two with Vector-Vector Production	62
5.3	Arnoldi Iteration Part Three	63
5.4	Givens Rotation	63
5.5	Backward Substitution of the Least Square Problem	64
6	Newton Basis GMRES Parallelization using MPI	64
6.1	The First Iteration of Newton Basis GMRES	64
6.2	Construction of the Newton Basis	64
6.3	QR Factorization of the Basis	65
Chapter 7	Benchmarks and Results	66
1	Introduction of Heat Transfer Benchmark	66
2	Introduction of the Lid-driven Cavity Benchmark	66
3	Introduction of FEM Benchmarks	66
3.1	Simple Beam with Axial Load	67
3.2	Cantilever Beam with Point Load at Free-end	67
3.3	A Circular Hole at the middle of a Flat Plate	68
3.4	Flat Plate with Fillet	68
4	FEM usage in Results	69
4.1	Results with Meshes and von Mises Stress	69
5	Speed of Convergence with respect of Iterations	71
5.1	Jacobi Preconditioner	71
5.2	Restart	72
5.3	Comparison with CG	73
6	Parallel Performance using Multi-GPUs	73
7	Speedup of Communication Avoiding Algorithm	75
7.1	Speedup comparison between MGS and Newton Basis GMRES Running with different number of CPU core	75
7.2	Analyzation of the bottleneck of the parallel process	75
Chapter 8	Conclusion	77
References	79
Table	83
Figures	87
                                    

[1]J. Dongarra, "An Overview of High Performance Computing and Challenges for the Future," in VECPAR, 2008, p. 1.
[2]D. B. Kirk and W. H. Wen-Mei, Programming massively parallel processors: a hands-on approach: Morgan kaufmann, 2016.
[3]J. Bolz, I. Farmer, E. Grinspun, and P. Schröoder, "Sparse matrix solvers on the GPU: conjugate gradients and multigrid," ACM transactions on graphics (TOG), vol. 22, pp. 917-924, 2003.
[4]E. J. Kelmelis, J. R. Humphrey, J. P. Durbano, and F. E. Ortiz, "Accelerated modeling and simulation with a desktop supercomputer," in Enabling Technologies for Simulation Science X, 2006, p. 62270N.
[5]A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C.-L. Wang, "Heterogeneous computing: Challenges and opportunities," Computer, vol. 26, pp. 18-27, 1993.
[6]J. L. Gustafson, "Reevaluating Amdahl's law," Communications of the ACM, vol. 31, pp. 532-533, 1988.
[7]J. Sanders and E. Kandrot, CUDA by example: an introduction to general-purpose GPU programming: Addison-Wesley Professional, 2010.
[8]P. G. Ciarlet, B. Miara, and J.-M. Thomas, Introduction to numerical linear algebra and optimisation: Cambridge University Press, 1989.
[9]Z.-C. Li, C.-S. Chien, and H.-T. Huang, "Effective condition number for finite difference method," Journal of computational and applied mathematics, vol. 198, pp. 208-235, 2007.
[10]A. K. Cline, C. B. Moler, G. W. Stewart, and J. H. Wilkinson, "An estimate for the condition number of a matrix," SIAM Journal on Numerical Analysis, vol. 16, pp. 368-375, 1979.
[11]J. H. Wilkinson, The algebraic eigenvalue problem vol. 87: Clarendon Press Oxford, 1965.
[12]N. Jamil, "A comparison of direct and indirect solvers for linear systems of equations," International Journal of Emerging Sciences, vol. 2, pp. 310-321, 2012.
[13]Y. Saad, "Communication complexity of the Gaussian elimination algorithm on multiprocessors," Linear Algebra and its Applications, vol. 77, pp. 315-340, 1986.
[14]N. Galoppo, N. K. Govindaraju, M. Henson, and D. Manocha, "LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware," in Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005, p. 3.
[15]D. Coppersmith and S. Winograd, "Matrix multiplication via arithmetic progressions," in Proceedings of the nineteenth annual ACM symposium on Theory of computing, 1987, pp. 1-6.
[16]H. C. Elman, "Iterative methods for large, sparse, nonsymmetric systems of linear equations," Yale University New Haven, Conn, 1982.
[17]A. Chronopoulos and C. W. Gear, "s-Step iterative methods for symmetric linear systems," Journal of Computational and Applied Mathematics, vol. 25, pp. 153-168, 1989.
[18]G. H. Golub and C. F. Van Loan, Matrix computations vol. 3: JHU Press, 2012.
[19]Y. Saad and M. H. Schultz, "GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems," SIAM Journal on scientific and statistical computing, vol. 7, pp. 856-869, 1986.
[20]V. Frayssé, L. Giraud, S. Gratton, and J. Langou, "Algorithm 842: A set of GMRES routines for real and complex arithmetics on high performance computers," ACM Transactions on Mathematical Software (TOMS), vol. 31, pp. 228-238, 2005.
[21]I. C. Ipsen and C. D. Meyer, "The idea behind Krylov methods," American Mathematical Monthly, pp. 889-899, 1998.
[22]P. N. Brown, "A theoretical comparison of the Arnoldi and GMRES algorithms," SIAM Journal on Scientific and Statistical Computing, vol. 12, pp. 58-78, 1991.
[23]Z. Bai, D. Hu, and L. Reichel, "A Newton basis GMRES implementation," IMA Journal of Numerical Analysis, vol. 14, pp. 563-581, 1994.
[24]D. Calvetti, J. Petersen, and L. Reichel, "A parallel implementation of the GMRES method," L. Reichel, A. Ruttan and RS Varga, de Gruyter, Berlin, pp. 31-45, 1993.
[25]E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, et al., "Open MPI: Goals, concept, and design of a next generation MPI implementation," in European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, 2004, pp. 97-104.
[26]J. Erhel, "A parallel GMRES version for general sparse matrices," Electronic Transactions on Numerical Analysis, vol. 3, pp. 160-176, 1995.
[27]H. Huang, L. Wang, E.-J. Lee, and P. Chen, "An MPI-CUDA implementation and optimization for parallel sparse equations and least squares (LSQR)," Procedia Computer Science, vol. 9, pp. 76-85, 2012.
[28]A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks," in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, 2009, pp. 233-244.
[29]K. He, S. X.-D. Tan, H. Zhao, X.-X. Liu, H. Wang, and G. Shi, "Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms," INTEGRATION, the VLSI journal, vol. 52, pp. 10-22, 2016.
[30]R. D. Cook, Concepts and applications of finite element analysis: John Wiley & Sons, 2007.
[31]J. Fish and T. Belytschko, "A first course in finite elements," 2007.
[32]H. Bang and Y. W. Kwon, The finite element method using MATLAB: CRC press, 2000.
[33]J. Langou, "Computing the R of the QR factorization of tall and skinny matrices using MPI_Reduce," arXiv preprint arXiv:1002.4250, 2010.
[34]C. Lipson and R. C. Juvinall, Handbook of stress and strength: design and material applications: Macmillan, 1963.

校外：立即公開

簡易檢索 / 詳目顯示

相關論文