| 研究生: | 顏士敦 Yen, Shih-Tun | 
|---|---|
| 論文名稱: | 利用快速可編程邏輯閘陣列之互連網路以提高訊息傳遞之效能 Exploiting High Speed FPGA Interconnect to Improve Performance of Message Passing | 
| 指導教授: | 張大緯 Chang, Da-Wei | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering | 
| 論文出版年: | 2011 | 
| 畢業學年度: | 99 | 
| 語文別: | 英文 | 
| 論文頁數: | 36 | 
| 中文關鍵詞: | Multi-ARM 、訊息傳遞 、訊息傳遞介面 、MPICH2 、可編程邏輯閘陣列 、互聯網路 | 
| 外文關鍵詞: | Multi-ARM, message passing, MPI, MPICH2, FPGA interconnect | 
| 相關次數: | 點閱:124 下載:0 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
我們有一個利用快速FPGA互聯網路連結六個使用ARM核心的嵌入式Linux系統的Multi-ARM模組化平台。然而目前此FPGA互聯網路無法被MPICH2所用。在這份論文中我們將MPICH2函式庫移植到Multi-ARM平台上,撰寫FPGA在此平台上的驅動程式並修改原本MPICH2函式庫中的乙太網路模組使其支援此FPGA互聯網路。我們將連線範圍只在此一平台之內的網路連線從原本的乙太網路介面重新導向到我們寫的FPGA驅動程式,使這些連線改利用FPGA交換網路來進行溝通。這份論文的主要貢獻是讓使用訊息溝通介面的程式可以在Multi-ARM平台上執行並利用其快速的FPGA交換網路以增進效能。另外我們擴充FPGA驅動程式使其可以蒐集所有連線的詳細傳輸數據,可作為離線情況下剖析應用程式之用。最後我們用NPmpi測試程式進行FPGA互聯網路的傳輸頻寬量測,量得最高傳輸速度達到181Mbps,將近是原本乙太網路三倍快的速度。
In this thesis, based on a Multi-ARM modular platform which consists of six ARM-based embedded Linux systems and a fast FPGA interconnect, we port the MPICH2 library to the Multi-ARM platform to we allow MPICH2 to exploit the fast communication of the FPGA. To exploit the FPGA interconnect, we write a FPGA driver and modify the existing Ethernet module of MPICH2 to redirect intra-platform communications to this driver, as through the FPGA interconnect. Our main contribution is to enable MPI applications to run on the Multi-ARM platform, exploiting its fast FPGA interconnect to improve execution performance. In addition, we extend the FPGA driver to collect per-connection statistics with almost no overhead. According to the NPmpi benchmark, the maximum throughput of FPGA is 181Mbps, almost 3 times faster than existing fast Ethernet interface.
[1]	Message Passing Interface Forum. official MPI-2.2 Standard. Available: http://www.mpi-forum.org/docs/docs.html
[2]	W. Gropp, E. Lusk, N. Doss, and A. Skjellum, "A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard," Parallel Computing, vol. 22, No. 6, pp. 789-828, September 1996.
[3]	E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham and T. S.Woodall, "Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation," in Proceedings of the 11th European PVM/MPI Users' Group Meeting, Budapest, Hungary, September 2004, pp. 97-104.
[4]	W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao and D. K. Panda, "Design and Implementation of High Performance MVAPICH2: MPI2 over InfiniBand," presented at the International Sympsoium on Cluster Computing and the Grid (CCGrid), Singapore, May 2006.
[5]	Intel Corporation. 2011, Intel® MPI Library. Available: http://software.intel.com/en-us/articles/intel-mpi-library/
[6]	J.-R. Liu, "A Many-Processor Prototyping SW/HW Framework and Component Based Dataflow Programming," Institute of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, July 2011.
[7]	J. Liu, J. Wu, S. P. Kini, P. Wyckoff and D. K. Panda, "High Performance RDMA-Based MPI Implementation over InfiniBand," in Proceedings of the 17th annual International Conference on Supercomputing (ICS 2003), San Francisco, CA, USA, 2003, pp. 295-304.
[8]	J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff and D. K. Panda, "Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics," presented at the ACM/IEEE Conference on Supercomputing, 2003.
[9]	S. Coll, E. Frachtenberg, F. Petrini, A. Hoisie and L. Gurvits, "Using Multirail Networks in High-Performance Clusters," in Proceedings of the 2001 IEEE International Conference on Cluster Computing, 2001, pp. 15-24.
[10]	L. Dagum and R. Menon, "OpenMP: an Industry Standard API for Shared-Memory Programming," IEEE Computational Science and Engineering, vol. 5, No. 1,  pp. 46-55, Jan-Mar 1998.
[11]	S. Moreaud, B. Goglin and R. Namyst, "Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access," in 17th European MPI Users' Group Meeting (EuroMPI 2010), Stuttgart, Germany, 2010, pp. 239-248.
[12]	H. Chen, W. Chen, J. Huang, B. Robert and H. Kuhn, "MPIPP: an Automatic Profile-Guided Parallel Process Placement Toolset for SMP Clusters and Multiclusters," presented at the 20th Annual International Conference on Supercomputing, Cairns, Queensland, Australia, 2006.
[13]	G. Mercier and J. Clet-Ortega, "Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments," presented at the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Espoo, Finland, 2009.
[14]	E. Jeannot and G. Mercier, "Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures," presented at the 16th International Euro-Par Conference on Parallel Processing, Ischia, Italy, 2010.
[15]	T. Ma, G. Bosilca, A. Bouteiller and J. J. Dongarra, "Locality and Topology Aware Intra-Node Communication Among Multicore CPUs," presented at the 17th European MPI users' Group Meeting Conference on Recent Advances in the Message Passing Interface, Stuttgart, Germany, 2010.
[16]	C. Chang, J. Wawrzynek and R. W. Brodersen, "BEE2: A High-End Reconfigurable Computing System," IEEE Design Test of Computers, vol. 22, No. 2, pp. 114-125, March-April 2005.
[17]	A. Krasnov, A. Schultz, J. Wawrzynek, G. Gibeling and P. Y. Droz, "RAMP Blue: A Message-Passing Manycore System in FPGAs," in Proceedings of International Conference on Field Programmable Logic and Applications, 2007. (FPL 2007), Amsterdam, 2007, pp. 54-61.
[18]	M. Saldana and P. Chow, "TMD-MPI: An MPI Implementation for Multiple Processors Across Multiple FPGAs," in Proceedings of International Conference on Field Programmable Logic and Applications, 2006. (FPL 2006), Madrid 2006, pp. 1-6.
[19]	Argonne National Laboratory. MultiProcessing Environment (MPE). Available: http://www.mcs.anl.gov/research/projects/mpi/www/www4/MPE.html
[20]	Argonne National Laboratory. mpiP: Lightweight, Scalable MPI Profiling. Available: http://mpip.sourceforge.net/
[21]	D. Buntinas, G. Mercier and W. Gropp, "Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem," in Sixth IEEE International Symposium on Cluster Computing and the Grid, 2006.(CCGRID 06.), 2006
[22]	D. Buntinas, G. Mercier and W. Gropp, "Implementation and Evaluation of Shared-Memory Communication and Synchronization Operations in MPICH2 Using the Nemesis Communication Subsystem," Parallel Computing, vol. 33, No. 9, pp. 634-644, 2007.
[23]	Q. O. Snell, A. R. Mikler and J. L. Gustafson, "NetPIPE: A Network Protocol Independent Performance Evaluator," in in IASTED International Conference on Intelligent Information Management and Systems, 1996.
[24]	D. Turner, A. Oline, X. Chen and T. Benjegerdes, "Integrating New Capabilities into NetPIPE," in Lecture Notes in Computer Science on Parallel Virtual Machine / Message Passing Interface, 2003, pp. 37-44.
[25]	D. Bailey, J. Barton, T. Lasinski, H. Simon, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, V. Venkatakrishnan and S. K. Weeratunga, "The Nas Parallel Benchmarks," International Journal of High Performance Computing Applications, vol. 5, No. 3, pp. 63-73, September 1991.
 校內:2016-08-30公開
                                        校內:2016-08-30公開