簡易檢索 / 詳目顯示

研究生: 賴昀隆
Lai, Yun-Long
論文名稱: 使用Xeon Phi進行有限元素法計算之效能評估
Performance Evaluation of Finite Element Computation on Xeon Phi
指導教授: 何旭彬
Ho, Shi-Pin
學位類別: 碩士
Master
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 59
中文關鍵詞: Intel Xeon Phi協同處理器Jacobi預選矩陣共軛梯度法有限元素法OpenMP
外文關鍵詞: Intel Xeon Phi, Coprocessor, Finite element method, OpenMP, Jacobi-preconditioned conjugate gradient method
相關次數: 點閱:103下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Intel Xeon Phi協同處理器是Intel公司為了提升在科學計算市場上的競爭力而發行的平行運算裝置。本研究用的Xeon Phi 3120A擁有57個核心,各支援四個執行緒與寬度為512位元的SIMD暫存器,理論上的雙精度運算效能為1TFlop/s,適合用來運算大量且獨立的資料。
    本論文探討如何以Xeon Phi協同處理器作B-Spline有限元素法分析及效能分析。有限元素法分析過程中,求解線性聯力方程式佔整體計算的大部分,本文利用Jacobi預選矩陣共軛梯度法求解線性聯力方程式。在Jacobi預選矩陣共軛梯度法裡有許多基本線性代數運算,因此向量內積、向量加乘、全矩陣向量相乘、全矩陣相乘及稀疏矩陣向量相乘也會在本論文中被分析。以Xeon Phi協同處理與Xeon E5中央處理器及GTX TITAN圖形處理器的有限元素法解題效能作比較。最後結果發現,Xeon Phi協同處理器在求解問題的效能比Xeon E5中央處理器快了9.54倍,GTX TITAN圖形處理器則比Xeon E5中央處理器快了14.55倍。

    The Intel Xeon Phi coprocessor is Intel's parallel computing coprocessor in order to enhance competitiveness in the scientific computing market. In this study, the Xeon Phi 3120A, which features 57 cores each supporting four threads with 512-bit wide SIMD registers, is capable of achieving a peak theoretical performance of 1Tflop/s in double precision. The Xeon Phi is suitable for a large number of data calculations. This paper discusses the use of Xeon Phi coprocessor for a B-Spline finite element analysis. During a finite element analysis, solving a set of linear equations consumes most of the calculation time; in this paper the Jacobi-preconditioned conjugate gradient method is used for solving the set of linear equations. In the Jacobi-preconditioned conjugate gradient method, there are many basic linear algebra operations; dot product, SAXPY, matrix-vector multiplication, matrix multiplication and sparse matrix-vector multiplication are analyzed in this paper. The performance of the Xeon Phi coprocessor for solving a finite element analysis is compared with the Xeon E5 and GTX TITAN. Xeon Phi coprocessor’s performance in solving a finite element problem is 9.54 times faster than the Xeon E5, while the GTX TITAN’s performance is 14.55 times faster than the Xeon E5.

    摘要(I) Abstract(II) 誌謝(VI) 目錄(VII) 表目錄(X) 圖目錄(XII) 符號說明(XIII) 第一章 緒論(1) 1-1 研究動機與目的(1) 1-2 文獻回顧(4) 1-3 文章架構(5) 第二章 相關理論(7) 2-1 預選矩陣共軛梯度法(7) 2-2 CSR資料儲存格式(10) 第三章 Xeon Phi硬體架構(12) 3-1 Intel眾多整合核心架構(12) 3-2 協同處理器卡的設計(13) 3-3 Xeon Phi微晶片(15) 3-4 Xeon Phi核心架構(16) 3-5 向量處理器架構(18) 3-6 快取的結構與轉譯後備緩衝區(19) 第四章 效能準則(22) 4-1 資料向量化(Vectorization)(22) 4-1-1 資料的編排(24) 4-1-2 資料的局部性(26) 4-1-3 資料的相依性(28) 4-2 程式的擴展性(Scaling the Code)(28) 4-2-1 OpenMP的簡介(29) 第五章 研究成果(33) 5-1 向量加乘(DAXPY)(34) 5-2 向量內積(DDOT)(36) 5-3 全矩陣向量相乘(40) 5-4 全矩陣相乘(43) 5-5 稀疏矩陣向量相乘(45) 5-6 B-Spline有限元素問題求解(48) 第六章 結論與建議(54) 參考文獻(57)

    [1]Johnson, M., & Hudson, E. “A variable delay line PLL for CPU-coprocessor synchronization.” Solid-State Circuits, IEEE Journal of, 23(5), 1218-1223, 1988.
    [2]Miyamori, T., & Olukotun, K. “REMARC: Reconfigurable multimedia array coprocessor.” IEICE Transactions on information and systems, 82(2), 389-397, 1999.
    [3]Robertson, G., Card, S., & Mackinlay, J. “The cognitive coprocessor architecture for interactive user interfaces.” In Proceedings of the 2nd annual ACM SIGGRAPH symposium on User interface software and technology (pp. 10-18), ACM, 1989.
    [4]Donandt, J. “Improving response time of Programmable Logic Controllers by use of a Boolean coprocessor.” In CompEuro'89.,'VLSI and Computer Peripherals. VLSI and Microelectronic Applications in Intelligent Peripherals and their Interconnection Networks', Proceedings (pp. 4-167), IEEE, 1989.
    [5]Thompson, C., Hahn, S., & Oskin, M. “Using modern graphics architectures for general-purpose computing: a framework and analysis.” In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture (pp. 306-317), IEEE Computer Society Press, 2002.
    [6]Maciel, A., Halic, T., Lu, Z., Nedel, L., & De, S. “Using the PhysX engine for physics-based virtual surgery with force feedback.” The International Journal of Medical Robotics and Computer Assisted Surgery, 5(3), 341-353, 2009.
    [7]Lu, Z., Sankaranarayanan, G., Deo, D., Chen, D., & De, S. “Towards physics-based interactive simulation of electrocautery procedures using PhysX.” In Haptics Symposium, 2010 IEEE (pp. 515-518), IEEE, 2010.
    [8]Rieffel, J., Saunders, F., Nadimpalli, S., Zhou, H., Hassoun, S., Rife, J., & Trimmer, B. “Evolving soft robotic locomotion in PhysX.” In Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers (pp. 2499-2504), ACM, 2009.
    [9]Demir, V., & Elsherbeni, A. “Compute unified device architecture (CUDA) based finite-difference time-domain (FDTD) implementation.” Journal of the Applied Computational Electromagnetics Society (ACES), 25(4), 303-314, 2010.
    [10]Tölke, J. “Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA.” Computing and Visualization in Science, 13(1), 29-39, 2010.
    [11]Jeffers, J., & Reinders, J. Intel Xeon Phi coprocessor high performance programming. Newnes, 2013.
    [12]Chandra, R., Dagum, L., Kohr, D., Mayden, D., McDonald, J., & Menon, R. Parallel programming in OpenMP. Morgan Kaufmann, 2001.
    [13]Li, D., & Wang, Q. “OpenMP-Based PCG Solver for Three-Dimensional Heat Equation.” Computer Technology and Application, 2(12), 34-39, 2011.
    [14]Bolz, J., Farmer, I., Grinspun, E., & Schröoder, P. “Sparse matrix solvers on the GPU: conjugate gradients and multigrid.” In ACM Transactions on Graphics (TOG) (Vol. 22, No. 3, pp. 917-924), ACM, 2003.
    [15]Golub, G., & Van Loan, C. Matrix computations (Vol. 4). JHU Press, 2013.
    [16]林香君, 多處理器個人電腦上的平行有限元素程式設計, 碩士論文, 國立成功大學機械工程系, 1998.
    [17]許育展, 在奔騰4處理器及個人電腦叢集上的計算最佳化, 碩士論文, 國立成功大學機械工程系, 2002.
    [18]陳武勇, 使用圖形處理器於B-spline有限元素分析, 碩士論文, 國立成功大學機械工程學系, 2007.
    [19]林瑞益, 使用圖形處理器作有限元素計算之效能評估, 碩士論文, 國立成功大學機械工程學, 2010.
    [20]賴韋諺, 使用CUDA及圖形處理器作有限元素計算, 碩士論文, 國立成功大學機械工程學, 2012.

    下載圖示 校內:2016-08-22公開
    校外:2019-08-22公開
    QR CODE