成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	顏士敦 Yen, Shih-Tun
論文名稱：	利用快速可編程邏輯閘陣列之互連網路以提高訊息傳遞之效能 Exploiting High Speed FPGA Interconnect to Improve Performance of Message Passing
指導教授：	張大緯 Chang, Da-Wei
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2011
畢業學年度：	99
語文別：	英文
論文頁數：	36
中文關鍵詞：	Multi-ARM 、訊息傳遞、訊息傳遞介面、MPICH2 、可編程邏輯閘陣列、互聯網路
外文關鍵詞：	Multi-ARM, message passing, MPI, MPICH2, FPGA interconnect
相關次數：	點閱：124 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

我們有一個利用快速FPGA互聯網路連結六個使用ARM核心的嵌入式Linux系統的Multi-ARM模組化平台。然而目前此FPGA互聯網路無法被MPICH2所用。在這份論文中我們將MPICH2函式庫移植到Multi-ARM平台上，撰寫FPGA在此平台上的驅動程式並修改原本MPICH2函式庫中的乙太網路模組使其支援此FPGA互聯網路。我們將連線範圍只在此一平台之內的網路連線從原本的乙太網路介面重新導向到我們寫的FPGA驅動程式，使這些連線改利用FPGA交換網路來進行溝通。這份論文的主要貢獻是讓使用訊息溝通介面的程式可以在Multi-ARM平台上執行並利用其快速的FPGA交換網路以增進效能。另外我們擴充FPGA驅動程式使其可以蒐集所有連線的詳細傳輸數據，可作為離線情況下剖析應用程式之用。最後我們用NPmpi測試程式進行FPGA互聯網路的傳輸頻寬量測，量得最高傳輸速度達到181Mbps，將近是原本乙太網路三倍快的速度。

In this thesis, based on a Multi-ARM modular platform which consists of six ARM-based embedded Linux systems and a fast FPGA interconnect, we port the MPICH2 library to the Multi-ARM platform to we allow MPICH2 to exploit the fast communication of the FPGA. To exploit the FPGA interconnect, we write a FPGA driver and modify the existing Ethernet module of MPICH2 to redirect intra-platform communications to this driver, as through the FPGA interconnect. Our main contribution is to enable MPI applications to run on the Multi-ARM platform, exploiting its fast FPGA interconnect to improve execution performance. In addition, we extend the FPGA driver to collect per-connection statistics with almost no overhead. According to the NPmpi benchmark, the maximum throughput of FPGA is 181Mbps, almost 3 times faster than existing fast Ethernet interface.

Contents
CONTENTS	VII
LIST OF FIGURES	VIII
LIST OF TABLES	X
CHAPTER 1.	INTRODUCTION	1
CHAPTER 2.	RELATED WORK AND BACKGROUND INFORMATION	3
2-1.	Related Work	3
2-2.	Introduction to MPICH2	3
CHAPTER 3.	DESIGN AND IMPLEMENTATION	9
3-1.	Multi-ARM Platform Overview	9
3-2.	Modifying MPICH2 to Redirect Network Traffic to FPGA Interconnect	12
3-3.	Design of FPGA Interconnect Driver	17
3-4.	Simulating FPGA Interconnect with Kernel Socket	23
CHAPTER 4.	EVALUATION	25
4-1.	Introduction to Benchmark Suites	25
4-2.	Execution Environment	26
4-3.	Experimental Results	27
CHAPTER 5.	CONCLUSION AND FUTURE WORK	33
REFERENCE	34
                                    

[1] Message Passing Interface Forum. official MPI-2.2 Standard. Available: http://www.mpi-forum.org/docs/docs.html
[2] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, "A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard," Parallel Computing, vol. 22, No. 6, pp. 789-828, September 1996.
[3] E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham and T. S.Woodall, "Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation," in Proceedings of the 11th European PVM/MPI Users' Group Meeting, Budapest, Hungary, September 2004, pp. 97-104.
[4] W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao and D. K. Panda, "Design and Implementation of High Performance MVAPICH2: MPI2 over InfiniBand," presented at the International Sympsoium on Cluster Computing and the Grid (CCGrid), Singapore, May 2006.
[5] Intel Corporation. 2011, Intel® MPI Library. Available: http://software.intel.com/en-us/articles/intel-mpi-library/
[6] J.-R. Liu, "A Many-Processor Prototyping SW/HW Framework and Component Based Dataflow Programming," Institute of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, July 2011.
[7] J. Liu, J. Wu, S. P. Kini, P. Wyckoff and D. K. Panda, "High Performance RDMA-Based MPI Implementation over InfiniBand," in Proceedings of the 17th annual International Conference on Supercomputing (ICS 2003), San Francisco, CA, USA, 2003, pp. 295-304.
[8] J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff and D. K. Panda, "Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics," presented at the ACM/IEEE Conference on Supercomputing, 2003.
[9] S. Coll, E. Frachtenberg, F. Petrini, A. Hoisie and L. Gurvits, "Using Multirail Networks in High-Performance Clusters," in Proceedings of the 2001 IEEE International Conference on Cluster Computing, 2001, pp. 15-24.
[10] L. Dagum and R. Menon, "OpenMP: an Industry Standard API for Shared-Memory Programming," IEEE Computational Science and Engineering, vol. 5, No. 1, pp. 46-55, Jan-Mar 1998.
[11] S. Moreaud, B. Goglin and R. Namyst, "Adaptive MPI Multirail Tuning for Non-uniform Input/Output Access," in 17th European MPI Users' Group Meeting (EuroMPI 2010), Stuttgart, Germany, 2010, pp. 239-248.
[12] H. Chen, W. Chen, J. Huang, B. Robert and H. Kuhn, "MPIPP: an Automatic Profile-Guided Parallel Process Placement Toolset for SMP Clusters and Multiclusters," presented at the 20th Annual International Conference on Supercomputing, Cairns, Queensland, Australia, 2006.
[13] G. Mercier and J. Clet-Ortega, "Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments," presented at the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Espoo, Finland, 2009.
[14] E. Jeannot and G. Mercier, "Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures," presented at the 16th International Euro-Par Conference on Parallel Processing, Ischia, Italy, 2010.
[15] T. Ma, G. Bosilca, A. Bouteiller and J. J. Dongarra, "Locality and Topology Aware Intra-Node Communication Among Multicore CPUs," presented at the 17th European MPI users' Group Meeting Conference on Recent Advances in the Message Passing Interface, Stuttgart, Germany, 2010.
[16] C. Chang, J. Wawrzynek and R. W. Brodersen, "BEE2: A High-End Reconfigurable Computing System," IEEE Design Test of Computers, vol. 22, No. 2, pp. 114-125, March-April 2005.
[17] A. Krasnov, A. Schultz, J. Wawrzynek, G. Gibeling and P. Y. Droz, "RAMP Blue: A Message-Passing Manycore System in FPGAs," in Proceedings of International Conference on Field Programmable Logic and Applications, 2007. (FPL 2007), Amsterdam, 2007, pp. 54-61.
[18] M. Saldana and P. Chow, "TMD-MPI: An MPI Implementation for Multiple Processors Across Multiple FPGAs," in Proceedings of International Conference on Field Programmable Logic and Applications, 2006. (FPL 2006), Madrid 2006, pp. 1-6.
[19] Argonne National Laboratory. MultiProcessing Environment (MPE). Available: http://www.mcs.anl.gov/research/projects/mpi/www/www4/MPE.html
[20] Argonne National Laboratory. mpiP: Lightweight, Scalable MPI Profiling. Available: http://mpip.sourceforge.net/
[21] D. Buntinas, G. Mercier and W. Gropp, "Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem," in Sixth IEEE International Symposium on Cluster Computing and the Grid, 2006.(CCGRID 06.), 2006
[22] D. Buntinas, G. Mercier and W. Gropp, "Implementation and Evaluation of Shared-Memory Communication and Synchronization Operations in MPICH2 Using the Nemesis Communication Subsystem," Parallel Computing, vol. 33, No. 9, pp. 634-644, 2007.
[23] Q. O. Snell, A. R. Mikler and J. L. Gustafson, "NetPIPE: A Network Protocol Independent Performance Evaluator," in in IASTED International Conference on Intelligent Information Management and Systems, 1996.
[24] D. Turner, A. Oline, X. Chen and T. Benjegerdes, "Integrating New Capabilities into NetPIPE," in Lecture Notes in Computer Science on Parallel Virtual Machine / Message Passing Interface, 2003, pp. 37-44.
[25] D. Bailey, J. Barton, T. Lasinski, H. Simon, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, V. Venkatakrishnan and S. K. Weeratunga, "The Nas Parallel Benchmarks," International Journal of High Performance Computing Applications, vol. 5, No. 3, pp. 63-73, September 1991.

校內：2016-08-30公開
校外：不公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文