| 研究生: |
陳武勇 Chen, Wu-Yung |
|---|---|
| 論文名稱: |
使用圖形處理器於B-Spline有限元素分析 Using graphics processor on B-Spline finite element analysis |
| 指導教授: |
何旭彬
Ho, Shi-Pin |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 66 |
| 中文關鍵詞: | 圖形處理器 、Jacobi 預選矩陣共軛梯度法 、預選矩陣 |
| 外文關鍵詞: | Precondition conjugate gradient method, Graphics processor, GeForce 8800 GTX, B-Spline |
| 相關次數: | 點閱:116 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文探討如何以圖形處理器作為B-Spline有限元素法分析及效能分析。基本線性代數運算如向量內積、向量加乘、全舉陣向量相乘、全矩陣相乘及稀疏矩陣向量相乘會在本論文中被分析,並以不同的執行緒數量作運算及CUBLAS(Basic Linear Algebraic Subprogram)所提供的函式作比較,發現整體上有較好的運算效能。另外,在稀疏矩陣向量相乘部分,為了使運算符合圖形處理器的最佳化運作,我對稀疏矩陣中的非零元素作了資料上處理,而增進其效能。最後利用迭代法求解一B-Spline有限元素問題,迭代法是採用Jacobi預選矩陣共軛梯度法。在有限元素問題結果中發現,圖形處理器所提供單精度浮點數其精確度比較中央處理器所提供雙精度浮點數仍不能完全的收斂到一定結果,誤差值約在0.04~2 %左右。
本研究使用的圖形處理器為Nvidia公司在今年初所推出的GeForce 8800 GTX圖形處理運算裝置,內部包含128個運算單元,運作於SIMD(Single Instruction Multiple Data)平行模組之上,最高運算速度可以高達350Gflops左右,資料傳輸帶寬為86.5GB/s。GeForce 8800 GTX以新型CUDA架構整合並獨立了所有運算單元的計算能力,比較於傳統圖形處理器其使用128個運算單元的限制更少。
This paper discusses how to use the graphics processor on B-Spline finite element analysis and studies the performance analysis. The basic linear algebraic arithmetic operations on the graphics processor, such as vector inner product, vector-vector addition and multiplication, full matrix-vector multiplication, full matrix-matrix multiplication and sparse matrix-vector multiplication are studied in this paper. The performances are improved when executed by different threads and compared with the CUBLAS(Basic Linear Algebraic Subprogram)functions. Beside, the data of sparse matrix’s non zero elements are sorted to gain the performance when executed sparse matrix-vector multiplication. Finally, I used the iterative method to solve a B-Spline finite element problem. The Jacobi precondition conjugate gradient method is used in the iterative method. At the conclusion of finite element problem, the accuracy of the single precision floating point operation in graphics processor can’t absolutely converge to the result when compared with central processing unit that used the double precision floating point operation. The errors are about 0.04~2 %.
The graphics processor used is the GeForce 8800 GTX, which is introduced in the beginning of this year by Nvidia company. GeForce 8800 GTX includes 128 stream processors and operates on SIMD(Single Instruction Multiple Data)parallel model. The peak performance can reach about 350Gflops and the bandwidth of data transfer is 86.5GB/s. The GeForce 8800 GTX has a new architecture called CUDA(Compute Unified Device Architecture) that is integrated and independent of GeForce 8800 GTX stream processor’s calculation capability. When compared with the traditional graphics processor, the GeForce 8800 GTX has less limitation to use the 128 stream processors.
[1] Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., van der Vorst, H., “Templates for the solution of linear systems : building blocks for iterative methods”, SIAM, 1994.
[2] Bolz, J., Farmer, I., Grinspun, E., Schröder, P., “Sparse matrix solvers on the GPU:Conjugate gradient and multigrid”, ACM, Inc, 2003.
[3] Galoppo, N., Govindaraju, N. K., Henson, M., Manocha, D., “LU-GPU:Efficient algorithms for solving dense linear systems on graphics hardware”, University of North Carolina chapel hill, 2005.
[4] Kelmelis, E. J., Humphrey, J. R., Durbano, J. P., Ortiz, F. E., “Accelerated modeling and simulation with a desktop supercomputer”, SPIE, Vol. 6227 62270N, 2006.
[5] Shewchuk, J. R., “An introduction to the conjugate gradient method without the agonizing pain”, School of computer science camegie mellon university, 1994.
[6] “NVIDIA GeForce 8800 GPU architecture overview:World’s first unified directX 10 GPU delivering unparalleled performance and image quality”, Nvidia Corp., 2006.
[7] “NVIDIA CUDA compute unified device architecture:Programming guide”, Nvidia Corp., 2007.
[8] 林香君,“多處理器個人電腦上的平行有限元素程式設計”碩士論文, 國立成功大學機械工程系, 1998.
[9] 徐文政,“大型三角稀疏矩陣平行解法” 碩士論文 , 國立成功大學機械工程系, 1996.
[10] 許育展,“在奔騰4處理器及個人電腦叢集上的計算最佳化” 碩士論文, 國立成功大學機械工程系, 2002.