簡易檢索 / 詳目顯示

研究生: 賴奕良
Lai, Yi-Liang
論文名稱: 使用OpenCL及HSA架構加速處理器作有限元素法計算分析
Finite Element Analysis with OpenCL and HSA Accelerated Processing Unit
指導教授: 何旭彬
Ho, Shi-Pin
學位類別: 碩士
Master
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 65
中文關鍵詞: 圖形處理器中央處理器加速處理器有限元素法異構系統架構
外文關鍵詞: Graphic Processor, Central Processor, Accelerated Processor, Finite element method, Heterogeneous system architecture
相關次數: 點閱:116下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,中央處理器以及圖形處理器被大量地使用在科學計算,兩者的架構及能力也一直不斷的在進步,中央處理器從原本的單核心到多核心,但卻礙於半導體微電子學的物理因素而逐漸到了極限。圖形處理器有著大量的運算吞吐量以及頻寬,但耗電以及程式擴充性卻也是其缺點。

    2014年微星AMD推出了首次採用異構系統架構的加速處理器A10-7850K,希望改善中央處理器及圖形處理器的缺點,整合兩者的優點,讓兩者有良好的異質運算模式來得到更好的效能,而利用其異構系統對科學計算上的實際效能提升即是本文想探討的。

    使用有限元素法求解問題所得到的線性聯立方程組占整體運算的一大部分,而本文使用共軛梯度法配合Jacobi預選矩陣求解聯立方程組。我們將使用加速處理器 A10-7850K運算並分析迭代過程中的向量內積、向量加乘及稀疏矩陣向量相乘運算。此外,將使用加速處理器對全矩陣相乘及全矩陣向量相乘作運算分析。最後,分別使用加速處理器及目前市面上高規格之中央處理器求解有限元素問題,並分析比較其結果。

    測試結果在求解有限元素問題時,A10-7850K比OpenMP加速過的8核心中央處理器Intel® Core™ Xeon E5-1620 運算速度快1.5倍。

    In the capability of floating point operations, the graphic processor and the central processor are heavily used in scientific computation. Both architecture and capabilities have been constantly evolved, for example, the central processor from the original single-core to multi-core. But due to physical factors of semiconductor microelectronics these changes are slowing. Although Graphics processors have a lot of computation throughput and bandwidth, the power consumption and code scalability still present challenges.
    In 2014 AMD introduced the first use of a heterogeneous system architecture (HSA) accelerated Processer A10-7850K, hoping to improve the shortcomings of both architecture by combining the advantages of both architecture in order to have a good heterogeneous computing model for better performance. This paper would like to explore its performance gains in scientific computing.
    In finite element computations, most of the computation time is spent on solving a set of linear equations. In this paper, the Jacobi precondition conjugate gradient method has been used to solve this set of linear equations. There are vector product, the vector-vector addition and multiplication, and the sparse matrix-vector multiplication in the iterative process. These computations have been calculated and analyzed by the Accelerated Processing Unit. Furthermore, the full matrix-matrix multiplication and the full matrix-vector multiplication have been calculated and analyzed. Finally, a finite element problem has been solved using the accelerated processer and the central processor respectively.
    The test results show that the efficiency of A10-7850K when compared to the Intel® Core™ Xeon E5-1620 using eight core is 1.5 times faster.

    摘要 I Abstract II 致謝 VIII 目錄 IX 表目錄 XI 圖目錄 XIII 符號說明 XV 第一章 緒論 1 1.1 前言 1 1.2 文獻回顧 3 1.3 動機與目的 4 1.4文章架構 5 第二章 相關理論 6 2.1 預加條件共軛梯度法 6 2.2 資料儲存方式 9 第三章KAVERI A10-7850K加速處理器架構 11 3.1 CPU架構 11 3.2 GPU架構 12 3.3異構運算系統架構HSA(Heterogeneous System Architecture) 13 3.4 運算核心 15 3.5 記憶體架構 16 3.6 運作模式 17 3.6.1 OpenCL 17 3.6.2 執行緒層級 17 3.6.3 記憶體層級 19 3.6.4異質計算 19 3.6.5 GPU技術規格 20 第四章 效能最佳化評估 21 4.1 記憶體最佳化 21 4.1.1 Global記憶體 21 4.1.2 Constant記憶體 22 4.1.3 image記憶體 22 4.1.4 local記憶體 23 4.2 程式碼最佳化 26 4.2.1 Active wavefronts 最大化 26 4.2.2 計算強度最佳化 27 4.2.3 控制流指令的使用 27 4.3 異質計算最佳化總結 28 第五章 研究成果 30 5.1 向量內積(DDOT) 31 5.2 向量加乘(DAXPY) 39 5.3 全矩陣相乘 43 5.4 全矩陣向量相乘 45 5.5 稀疏矩陣向量相乘 50 5.6 B-Spline有限元素法求解 56 第六章 結論 62 參考文獻 64

    [1] Landman, J."Accelerator Processor Units (APUs) for non-scientific applications." , 2011.
    [2] Maurice, S. "Amd fusion apu: Llano." Micro, IEEE 32.2: 28-37, 2012.
    [3] Bolz, J., Farmer, I., Grinspun, E., Schröder, P. "Sparse matrix solvers on the GPU:Conjugate gradient and multigrid", ACM, Inc, 2003.
    [4] Kelmelis, J., Humphrey, R., Durbano, P., Ortiz, E. "Accelerated modeling and simulation with a desktop supercomputer", SPIE, Vol. 6227 62270N, 2006.
    [5] Mayank, D., Ashwin, M. "On the efficacy of a fused cpu+ gpu processor (or apu) for parallel computing." Application Accelerators in High-Performance Computing (SAAHPC), Symposium on. IEEE, 2011.
    [6] Christian,C., Rossinelli, D., Koumoutsakos, P. "GPU and APU computations of Finite Time Lyapunov Exponent fields." Journal of Computational Physics 231.5, 2012.
    [7] Khokhar, A., Prasanna, K., Shaaban, E., & Wang, L. Heterogeneous computing: Challenges and opportunities. IEEE Computer,26(6), 18-27, 1993.
    [8] Gummaraju, J., et al. "Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors." Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 2010.
    [9] Stone, E., Gohara, D.,Shi,G."OpenCL: A parallel programming standard for heterogeneous computing systems." Computing in science & engineering 12.3 : 66, 2010.
    [10] The OpenCL Specification - Khronos Group Version: 2.0
    [11] AMD Accelerated Parallel Processing OpenCL Programming Guide

    [12] Golub, G., & Van Loan, C. Matrix computations (Vol. 4). JHU Press, 2013.
    [13] Galoppo, N., Govindaraju, N. K., Henson, M., Manocha, D. "LU-GPU:Efficient algorithms for solving dense linear systems on graphics hardware", University of North Carolina chapel hill, 2005.
    [14] 林香君, 多處理器個人電腦上的平行有限元素程式設計, 碩士論文,國立成功大學機械工程系, 1998. [15] 陳武勇, 使用圖形處理器於 B-spline有限元素分析, 碩士論文, 國立成功大學機械工程學系, 2007.
    [17] 林瑞益, 使用圖形處理器作有限元素計算之效能評估, 碩士論文, 國立成功大學機械工程學系, 2010.
    [18] 賴韋諺, 使用 CUDA及圖形處理器作有限元素計算, 碩士論文, 國立成功大學機械工程學, 2012.

    下載圖示 校內:2016-08-26公開
    校外:2017-08-26公開
    QR CODE