| 研究生: |
賴奕良 Lai, Yi-Liang |
|---|---|
| 論文名稱: |
使用OpenCL及HSA架構加速處理器作有限元素法計算分析 Finite Element Analysis with OpenCL and HSA Accelerated Processing Unit |
| 指導教授: |
何旭彬
Ho, Shi-Pin |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 65 |
| 中文關鍵詞: | 圖形處理器 、中央處理器 、加速處理器 、有限元素法 、異構系統架構 |
| 外文關鍵詞: | Graphic Processor, Central Processor, Accelerated Processor, Finite element method, Heterogeneous system architecture |
| 相關次數: | 點閱:116 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,中央處理器以及圖形處理器被大量地使用在科學計算,兩者的架構及能力也一直不斷的在進步,中央處理器從原本的單核心到多核心,但卻礙於半導體微電子學的物理因素而逐漸到了極限。圖形處理器有著大量的運算吞吐量以及頻寬,但耗電以及程式擴充性卻也是其缺點。
2014年微星AMD推出了首次採用異構系統架構的加速處理器A10-7850K,希望改善中央處理器及圖形處理器的缺點,整合兩者的優點,讓兩者有良好的異質運算模式來得到更好的效能,而利用其異構系統對科學計算上的實際效能提升即是本文想探討的。
使用有限元素法求解問題所得到的線性聯立方程組占整體運算的一大部分,而本文使用共軛梯度法配合Jacobi預選矩陣求解聯立方程組。我們將使用加速處理器 A10-7850K運算並分析迭代過程中的向量內積、向量加乘及稀疏矩陣向量相乘運算。此外,將使用加速處理器對全矩陣相乘及全矩陣向量相乘作運算分析。最後,分別使用加速處理器及目前市面上高規格之中央處理器求解有限元素問題,並分析比較其結果。
測試結果在求解有限元素問題時,A10-7850K比OpenMP加速過的8核心中央處理器Intel® Core™ Xeon E5-1620 運算速度快1.5倍。
In the capability of floating point operations, the graphic processor and the central processor are heavily used in scientific computation. Both architecture and capabilities have been constantly evolved, for example, the central processor from the original single-core to multi-core. But due to physical factors of semiconductor microelectronics these changes are slowing. Although Graphics processors have a lot of computation throughput and bandwidth, the power consumption and code scalability still present challenges.
In 2014 AMD introduced the first use of a heterogeneous system architecture (HSA) accelerated Processer A10-7850K, hoping to improve the shortcomings of both architecture by combining the advantages of both architecture in order to have a good heterogeneous computing model for better performance. This paper would like to explore its performance gains in scientific computing.
In finite element computations, most of the computation time is spent on solving a set of linear equations. In this paper, the Jacobi precondition conjugate gradient method has been used to solve this set of linear equations. There are vector product, the vector-vector addition and multiplication, and the sparse matrix-vector multiplication in the iterative process. These computations have been calculated and analyzed by the Accelerated Processing Unit. Furthermore, the full matrix-matrix multiplication and the full matrix-vector multiplication have been calculated and analyzed. Finally, a finite element problem has been solved using the accelerated processer and the central processor respectively.
The test results show that the efficiency of A10-7850K when compared to the Intel® Core™ Xeon E5-1620 using eight core is 1.5 times faster.
[1] Landman, J."Accelerator Processor Units (APUs) for non-scientific applications." , 2011.
[2] Maurice, S. "Amd fusion apu: Llano." Micro, IEEE 32.2: 28-37, 2012.
[3] Bolz, J., Farmer, I., Grinspun, E., Schröder, P. "Sparse matrix solvers on the GPU:Conjugate gradient and multigrid", ACM, Inc, 2003.
[4] Kelmelis, J., Humphrey, R., Durbano, P., Ortiz, E. "Accelerated modeling and simulation with a desktop supercomputer", SPIE, Vol. 6227 62270N, 2006.
[5] Mayank, D., Ashwin, M. "On the efficacy of a fused cpu+ gpu processor (or apu) for parallel computing." Application Accelerators in High-Performance Computing (SAAHPC), Symposium on. IEEE, 2011.
[6] Christian,C., Rossinelli, D., Koumoutsakos, P. "GPU and APU computations of Finite Time Lyapunov Exponent fields." Journal of Computational Physics 231.5, 2012.
[7] Khokhar, A., Prasanna, K., Shaaban, E., & Wang, L. Heterogeneous computing: Challenges and opportunities. IEEE Computer,26(6), 18-27, 1993.
[8] Gummaraju, J., et al. "Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors." Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 2010.
[9] Stone, E., Gohara, D.,Shi,G."OpenCL: A parallel programming standard for heterogeneous computing systems." Computing in science & engineering 12.3 : 66, 2010.
[10] The OpenCL Specification - Khronos Group Version: 2.0
[11] AMD Accelerated Parallel Processing OpenCL Programming Guide
[12] Golub, G., & Van Loan, C. Matrix computations (Vol. 4). JHU Press, 2013.
[13] Galoppo, N., Govindaraju, N. K., Henson, M., Manocha, D. "LU-GPU:Efficient algorithms for solving dense linear systems on graphics hardware", University of North Carolina chapel hill, 2005.
[14] 林香君, 多處理器個人電腦上的平行有限元素程式設計, 碩士論文,國立成功大學機械工程系, 1998. [15] 陳武勇, 使用圖形處理器於 B-spline有限元素分析, 碩士論文, 國立成功大學機械工程學系, 2007.
[17] 林瑞益, 使用圖形處理器作有限元素計算之效能評估, 碩士論文, 國立成功大學機械工程學系, 2010.
[18] 賴韋諺, 使用 CUDA及圖形處理器作有限元素計算, 碩士論文, 國立成功大學機械工程學, 2012.