成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳武勇 Chen, Wu-Yung
論文名稱：	使用圖形處理器於B-Spline有限元素分析 Using graphics processor on B-Spline finite element analysis
指導教授：	何旭彬 Ho, Shi-Pin
學位類別：	碩士 Master
系所名稱：	工學院 - 機械工程學系 Department of Mechanical Engineering
論文出版年：	2007
畢業學年度：	95
語文別：	中文
論文頁數：	66
中文關鍵詞：	圖形處理器、Jacobi 預選矩陣共軛梯度法、預選矩陣
外文關鍵詞：	Precondition conjugate gradient method, Graphics processor, GeForce 8800 GTX, B-Spline
相關次數：	點閱：204 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文探討如何以圖形處理器作為B-Spline有限元素法分析及效能分析。基本線性代數運算如向量內積、向量加乘、全舉陣向量相乘、全矩陣相乘及稀疏矩陣向量相乘會在本論文中被分析，並以不同的執行緒數量作運算及CUBLAS(Basic Linear Algebraic Subprogram)所提供的函式作比較，發現整體上有較好的運算效能。另外，在稀疏矩陣向量相乘部分，為了使運算符合圖形處理器的最佳化運作，我對稀疏矩陣中的非零元素作了資料上處理，而增進其效能。最後利用迭代法求解一B-Spline有限元素問題，迭代法是採用Jacobi預選矩陣共軛梯度法。在有限元素問題結果中發現，圖形處理器所提供單精度浮點數其精確度比較中央處理器所提供雙精度浮點數仍不能完全的收斂到一定結果，誤差值約在0.04~2 %左右。
本研究使用的圖形處理器為Nvidia公司在今年初所推出的GeForce 8800 GTX圖形處理運算裝置，內部包含128個運算單元，運作於SIMD(Single Instruction Multiple Data)平行模組之上，最高運算速度可以高達350Gflops左右，資料傳輸帶寬為86.5GB/s。GeForce 8800 GTX以新型CUDA架構整合並獨立了所有運算單元的計算能力，比較於傳統圖形處理器其使用128個運算單元的限制更少。

This paper discusses how to use the graphics processor on B-Spline finite element analysis and studies the performance analysis. The basic linear algebraic arithmetic operations on the graphics processor, such as vector inner product, vector-vector addition and multiplication, full matrix-vector multiplication, full matrix-matrix multiplication and sparse matrix-vector multiplication are studied in this paper. The performances are improved when executed by different threads and compared with the CUBLAS(Basic Linear Algebraic Subprogram)functions. Beside, the data of sparse matrix’s non zero elements are sorted to gain the performance when executed sparse matrix-vector multiplication. Finally, I used the iterative method to solve a B-Spline finite element problem. The Jacobi precondition conjugate gradient method is used in the iterative method. At the conclusion of finite element problem, the accuracy of the single precision floating point operation in graphics processor can’t absolutely converge to the result when compared with central processing unit that used the double precision floating point operation. The errors are about 0.04~2 %.
The graphics processor used is the GeForce 8800 GTX, which is introduced in the beginning of this year by Nvidia company. GeForce 8800 GTX includes 128 stream processors and operates on SIMD(Single Instruction Multiple Data)parallel model. The peak performance can reach about 350Gflops and the bandwidth of data transfer is 86.5GB/s. The GeForce 8800 GTX has a new architecture called CUDA(Compute Unified Device Architecture) that is integrated and independent of GeForce 8800 GTX stream processor’s calculation capability. When compared with the traditional graphics processor, the GeForce 8800 GTX has less limitation to use the 128 stream processors.

摘要	I
Abstract	II
誌謝	III
目錄	IV
表目錄	VI
圖目錄	IX
符號說明	XI
第一章	諸論	1
1.1	前言		1
1.2	文獻回顧	2
1.3	協同處理器	3
1.4	有限元素分析	3
1.5	論文概況	4
第二章	基本理論	6
2.1	預選矩陣共軛梯度法	6
2.2	CRS資料儲存格式	8
第三章	圖形處理器硬體架構	11
3.1	回顧	11
3.2	CUDA架構	12
3.3	GeForce 8800 GTX硬體內部	13
3.3.1	概述	13
3.3.2	快取記憶體、暫存器及區域記憶體	14
3.4	相關運作模式	16
3.4.1	執行模式	16
3.4.2	記憶體模式	17
3.4.3	運作上的限制	18
第四章	效能準則	19
	4.1	資料重複使用	19
	4.2	進入記憶體方式	19
4.2.1	global記憶體空間	19
4.2.2	constant記憶體空間	20
4.2.3	texture記憶體空間	20
4.2.4	shared記憶體空間	21
	4.3		block中適當的執行緒數量	23
	4.4		減少使用if、switch	23
	4.5		避免資料傳遞於中央處理器與圖形處理器	23
第五章	研究成果	24
5.1 基本線性代數運算	25
	5.1.1	向量內積(DOT)	25
	5.1.2	向量加乘(SAXPY)	30
	5.1.3  全矩陣相乘	35
	5.1.4  全矩陣向量相乘	40
	5.1.5  稀疏矩陣向量相乘	44
5.2	B-Spline有限元素問題求解	53
第六章	建議與討論	62
參考文獻	64
附錄A 樹狀演算法	65
自述	66
                                    

[1] Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., van der Vorst, H., “Templates for the solution of linear systems : building blocks for iterative methods”, SIAM, 1994.
[2] Bolz, J., Farmer, I., Grinspun, E., Schröder, P., “Sparse matrix solvers on the GPU：Conjugate gradient and multigrid”, ACM, Inc, 2003.
[3] Galoppo, N., Govindaraju, N. K., Henson, M., Manocha, D., “LU-GPU：Efficient algorithms for solving dense linear systems on graphics hardware”, University of North Carolina chapel hill, 2005.
[4] Kelmelis, E. J., Humphrey, J. R., Durbano, J. P., Ortiz, F. E., “Accelerated modeling and simulation with a desktop supercomputer”, SPIE, Vol. 6227 62270N, 2006.
[5] Shewchuk, J. R., “An introduction to the conjugate gradient method without the agonizing pain”, School of computer science camegie mellon university, 1994.
[6] “NVIDIA GeForce 8800 GPU architecture overview：World’s first unified directX 10 GPU delivering unparalleled performance and image quality”, Nvidia Corp., 2006.
[7] “NVIDIA CUDA compute unified device architecture：Programming guide”, Nvidia Corp., 2007.
[8] 林香君,“多處理器個人電腦上的平行有限元素程式設計”碩士論文, 國立成功大學機械工程系, 1998.
[9] 徐文政,“大型三角稀疏矩陣平行解法” 碩士論文 , 國立成功大學機械工程系, 1996.
[10] 許育展,“在奔騰4處理器及個人電腦叢集上的計算最佳化” 碩士論文, 國立成功大學機械工程系, 2002.

2007-07-04公開

簡易檢索 / 詳目顯示

相關論文