| 研究生: |
張志標 Chang, Jyh-Biau |
|---|---|
| 論文名稱: |
對稱式多處理機群集上透通性分散式共用記憶體系統之研究 A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors |
| 指導教授: |
謝錫堃
Shieh, Ce-Kuen 梁廷宇 Liang, Tyng-Yeu |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2005 |
| 畢業學年度: | 93 |
| 語文別: | 英文 |
| 論文頁數: | 107 |
| 中文關鍵詞: | 群集計算 、系統重組 、分散式共用記憶體 、對稱式多處理機 、透通性 |
| 外文關鍵詞: | symmetric multiprocessor, cluster computing, reconfiguration, transparency, distributed shared memory |
| 相關次數: | 點閱:92 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
一個透通的分散式共用記憶體系統必須在資料的散佈、工作的分配與執行環境的重組等三方面達到完全透通的要求。資料散佈的透通允許程式設計者能夠使用與共用記憶體系統相同的介面,來存取或配置分散式共用記憶體內的資料。而在工作分配的透通上,所要求的是無論以使用者階層或核心階層雙方面的觀點上,都要使平行程式的平行度能夠達到最佳化,而不必受限於分散式環境建制的組態。另外在達到透通的執行環境重組機制中,我們要求不能只是關心分散式記憶體系統上應用程式的效能而已,還必須以整體群集系統上所有工作的效能為考量。
為了達到這三種透通性的要求,在本論文中,我們將提出一個名為Teamster 的具透通性的分散式共用記憶體系統,並將其實現在對稱式多處理機群集系統上。Teamster 提供了全域性記憶體映象,以達成資料散佈的透通。有了它,群集中的每個處理器上所看到的記憶體空間將會完全一致。程式設計者於是可以用他們在單一對稱式多處理機電腦上相同的方式,來存取和配置群集中的共用資料。Teamster 還使用了複合式執行緒架構來達到工作分配上的透通。這一個執行緒架構不僅可以將使用者層級與核心層級都給予平行度的最佳化,還可以提升執行時期環境重組的效率。我們也發展了一個名為『漸進式多階層重組機制』,來解決分散式共用記憶體系統於非專屬群集中執行時所面臨到的執行環境重組的問題。顧名思義,在本機制中,執行環境的重組將包括三個不同的層次:他們分別為處理器層級、應用程式層級與節點層級等。根據節點負載的轉移,不同層級的重組工作也將會漸進式地分別被採用,來調整分散式共用記憶體系統應用程式的執行,以求達到群集系統整體產能的最大化。
我們的實驗結果顯示,藉由全域記憶體映像與複合式執行緒架構,不但可以滿足使用者對於透通性的要求,同時應用程式的效能也能有令人滿意的表現。而漸進式多層次重組機制使得Teamster 不但可以有效將非專屬群集中閒置的處理器資源運用於分散式記憶體應用程式上,還可以有效地減少本地端工作被分散式共用記憶體應用程式拖慢的程度。藉由Teamster 所提供的透通能力,程式設計者彷彿就像在使用單一對稱式多處理機電腦一樣,能夠透通地來使用整個對稱式多處理機群集系統上的所有計算資源,而且能有著令人滿意的效能與產量。
A transparent distributed shared memory (DSM) system must achieve complete transparency in data distribution, workload distribution, and reconfiguration respectively. The transparency of data distribution allows programmers to be able to access and allocate shared data using the same user interface as is used in shared-memory systems. The workload distribution transparency can optimize the parallelism at both the user-level and the kernel-level. The achievement of the reconfiguration transparency can prompt system-wide jobs’ throughput rather than DSM programs’ performance and resolve the reconfiguration problem of software distributed shared memory (DSM) systems in non-dedicated clusters.
In this dissertation, a transparent DSM system referred to as Teamster is proposed and is implemented for clustered symmetric multiprocessors. Teamster provides a Global Memory Image (GMI), whose purpose is to accomplish transparency of data distribution. With the GMI, the address space of each processor is precisely identical. Programmers are able to access and allocate shared data in the same way as they do in single SMP computers. Teamster uses the hybrid thread architecture to achieve the transparency of workload distribution. This thread architecture optimizes the parallelism at both the user-level and the kernel-level, and also improves the efficiency of run-time reconfiguration. We also invent a novel approach called Progressive Multilayer Reconfiguration (PMR) for DSM systems. As named, reconfiguration is divided into three different layers, i.e., processor, application, and node in this approach. According to the state transfer of the workload, the three different layer reconfigurations are progressively and respectively performed during the execution of DSM applications.
The preliminary results show that the GMI and the hybrid thread architecture of Teamster not only provide the transparency to users, but also promise the performance of the DSM applications. Meanwhile, the PMR can enable Teamster not only to effectively utilize abundant CPU cycles available in non-dedicated clusters for DSM applications but also to effectively minimize the slowdown of local jobs caused by the disturb from DSM applications. Therefore, PMR is shown able to prompt the job throughput of the whole cluster effectively. With the transparency provided by Teamster, programmers can exploit all the computing power of the clustered SMP nodes in a transparent way as they do in single SMP computer. Compared with the results of previous researches, Teamster can realize the transparency of cluster computing and obtain satisfactory system performance.
[1] C. Amza, A. Cox, S. Dwarkadas, P. Keleher, H. Ly, R. Rajamony, W. Yu, and W. Zwaenepoel, TreadMarks: Shared memory computing on networks of workstations, IEEE Computer, 29(2):18-28, 1996.
[2] I. G. Angus, G. C. Fox, J. S. Kim, and D. Walker, Solving Problems on Concurrent Processors, Prentice-Hall International, 1988.
[3] Gabriel Antoniu, Luc Bougé, and Sébastien Lacour. “Making a DSM Consistency Protocol Hierarchy-Aware: an Efficient Synchronization Scheme,” In Proc. Workshop on Distributed Shared Memory on Clusters (DSM 2003), Tokyo, pages 516-523, May 2003.
[4] R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson, "The Interaction of Parallel and Sequentail Workloads on a Network of Workstations," SIGMETRICS. May 1995, Ottawa, pp. 267-278
[5] J. K. Bennett, J. B. Carter, and W. Zwaenepoel, Munin: Distributed Shared Memory Based on Type-specific Memory Coherence, Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 168-175, 1990.
[6] B. N. Bershad, E. D. Lazowska, and H. M. Levy, PRESTO: A System for Object-oriented Parallel Programming, Software-Practice and Experience, vol. 18(8), 713-732 (August 1988).
[7] B.N. Bershad, M.J. Zekauskas. The Midway Distributed Shared Memory System. In: Proceedings of IEEE COMPCON Conference, pp. 528-537, 1993.
[8] J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens, Portable Programs for Parallel Processors, Holt, Rinehart and Winston, Inc., 1987.
[9] J.B. Carter, J.K. Bennett and W. Zwaenepoel. “Implementation and performance of Munin.” In Proceedings of 13th ACM Symposium on Operating System Principles, pp. 152-164, 1991.
[10] J. B. Chang, Y. J. Tsai, C. K. Shieh, and P.C. Chung, “An Efficient Thread Architecture for a Distributed Shared Memory on Symmetric Multiprocessor Clusters”, Proceedings of the 1998 International Conference on Parallel and Distributed Systems, p.808-815, Dec. 1998.
[11] J. B. Chang, T. Y. Liang, and C. K. Shieh, “A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors”, Accepted by Special Issue of the International Journal of Supercomputing on Infrastructures and Applications for Cluster and Grid Computing Environments.
[12] J. S. Chase, F. G. Amador, E. D. Lazowska, H. M. Levy, and R. J. Littlefield, The Amber System: Parallel Programming on a network of Multiprocessors, Proceedings of the 12th ASM Symposium on Operating System Principles, pp. 147-158, 1989.
[13] G da Silva Craveiro, LM Sato, “CPAR - Cluster: A Runtime System for Heterogeneous Clusters with Mono and Multiprocessor Nodes", Proc. Of The 2004 International Workshop on Distributed Shared Memory on Clusters (DSM 2004), Apr. 2004.
[14] Alex Dubrovski, Roy Friedman and Assaf Schuster, “Load Balancing in distributed shared memory systems”. In International Journal of Applied Software Technology, vol 3, pp. 167-202, March 1998.
[15] A. Erlichson, N. Nucholls, G. Chesson, and J. Hennessy, SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory system, Proceedings of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems, 1996.
[16] I. Foster, C. Kesselman, R. Olson, and S. Tuecke, Nexus: An interoperability layer for parallel and distributed computer systems, Technical Report, Argonne National Labs, December 1993.
[17] R. Friedman, M. Goldin, A. Itzkovitz, and A. Schuster, Millipede: Easy Parallel Programming in Available Distributed Environments, Software: Practice and Experience, vol 27(8), pp. 929--965, August 1997.
[18] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, London, 1992.
[19] Jeffrey K. Hollingsworth and Peter J. Keleher, “Prediction and adaptation in active Harmony”. In The 7th International Symposium on High Performance Distributed Computing, April 1998. http://www.cs.umd.edu/~keleher/papers.html.
[20] Y. C. Hu, L. Honghui, A. L. Cox, and W. Zwaenepoel, OpenMP for network of SMPs, Proceedings of 13th International and 10th Symposium on Parallel and Distributed Processing, pp.302–310, 1999.
[21] A. Itzkovitz, A. Schuster, and L. Wolfovich, Millipede: Towards Standard Interface for Virtual Parallel Machines on Top of Distributed Environments, Technical Report 9607, Technion IIT, 1996.
[22] A. Itzkovitz, A. Schuster, and L. Shalev, Thread Migration and its Applications in Distributed Shared Memory Systems, Journal of Systems and Software, 1997.
[23] Yvon Jégou. “Implementation of Page Management in Mome, a User-Level DSM,” In Proc. Intl. Workshop on Distributed Shared Memory on Clusters (DSM 2003), Tokyo, Japan, pages 479--486, May 2003.
[24] D. Khandekar, Quarks: Portable DSM on Unix, Technical Report, Computer Systems Laboratory, University of Utah, 1995.
[25] A. C. Lai, Design and Implementation of Release Consistency Protocol on Cohesion, Master thesis, Department of Electrical Engineering, National Cheng Kung University, R.O.C., 1994.
[26] K. Li, IVY: A Shared Virtual Memory System for Parallel computing, Proceedings of 1988 IEEE International Conference on Parallel Processing, pp. 94-101, 1988.
[27] J. M. Lin, Developing a Thread Package for a DSM System on Clustered Multiprocessor Workstations, Master Thesis, Department of Electrical Engineering, National Cheng Kung University, R.O.C., 1997
[28] Michael J. Litzkow, Miron Livny, and Matt W. Mutka, “Condor – A Hunter of Idle Workstations”, Distributed Computing Systems, 1988., 8th International Conference on , Page(s): 104 –111.
[29] E. Mascarenhas and V. Rego, Ariadne: Architecture of a Portable Threads System Supporting Thread Migration, Software: Practice & Experience, 26(3): 327-256, March 1996.
[30] M. W. Mutka and M. Livny, "The available capacity of a privately owned workstation environment," Performance Evaluation, 12, 1991, pp. 269-284.
[31] Mark Nuttall and Morris Sloman, “Workload Characteristics for Process Migration and Load Balancing”, Proceedings of the 1997 17th International Conference on Distributed Computing Systems, page 133-140.
[32] L. Peng, W.F. Wong, M.D. Feng, and C.K. Yuen, "SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Cluster", Proc. of IEEE International Conference on Cluster Computing (CLUSTER 2000), pp. 243-249. Dec 2000.
[33] L. Peng, W.F. Wong, and C.K. Yuen, "SilkRoad II: A Multi-Paradigm Runtime System for Cluster Computing", Proc. of IEEE International Conference on Cluster Computing (CLUSTER 2002) (Poster), pp. 443-444. Sep 2002.
[34] L. Peng, W.F. Wong, and C.K. Yuen, "The Performance Model of SilkRoad - A Multithreaded DSM System for Clusters", DSM2003: Workshop on Distributed Shared Memory on Clusters, appeared in Proc. of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 495-501. May 2003.
[35] L. Peng, W.F. Wong, and C.K. Yuen, "SilkRoad II: mixed paradigm cluster computing with RC_dag consistency", Parallel Computing, vol 29-8 , pp. 1091-1115. Aug 2003.
[36] M. L. Powell, S. R. Kleiman, S. Barton, D. Shah, D. Stein, M. Weeks, SunOS Multi-thread Architecture, Proc. 1991 USENIX Winter Conference.
[37] Keith H. Randall, “Cilk: Efficient Multithreaded Computing”, Ph. D. Thesis, MIT Department of Electrical Engineering and Computer Science. June 1998.
[38] S. Roy and V. Chaudhary, Strings: a high-performance distributed shared memory for symmetric multiprocessor clusters, Proceedings of the Seventh International Symposium on High Performance Distributed Computing, pp.90-97, 1998.
[39] Kyung D. Ryu and Jeffrey K. Hollingsworth, “Linger-Longer: Fine-Grain Cycle Stealing in NOW”, Proceedings of the 1998 ACM/IEEE conference on Supercomputing, Pages: 1 - 12 , 1998.
[40] R. Samanta, A. Bilas, L. Iftode, and J. Singh, Home-based SVM protocols for SMP clusters: design and performance, Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, 1998.
[41] L. M. Sato. Sistema de programac¸ ˜ao e processamento, “para sistema multiprocessadores.” In Anais do VI Simp´osio Brasileiro de Engenharia de Software, 1991.
[42] D. Scales, K. Gharachorloo, and A. Aggarwal, Fine-grain software distributed shared memory on SMP clusters, Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pp. 125-136, 1998.
[43] Weisong Shi and Zhimin Tang. “Dynamic computation scheduling for load balancing in home-based software DSMs”. In Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN'99), IEEE Computer Press, Perth, Australia, June, 1999.
[44] E. Speight and J.K. Bennett. Brazos: “A third generation DSM system”. In Proceedings of the 1997 USENIX Windows/NT Workshop, pp. 95-106, August 1997.
[45] R. Stets, S. Dwarkadas, N. Hardavellas, H. Hung, L. Kontothanassis, S. Parthasarahy, and M. Scott, Cashmere-2L: Software coherent shared memory on a clustered remote write network, Proceedings of the 16th ACM Symposium on Operating Systems Principles, pp. 170-183, 1997.
[46] Cristian Tapus, I-Hsin Chung, Jeffrey K. Hollingsworth, “Active Harmony: Towards Automated Performance Tuning“, Proceedings of SC'02, Nov. 2002.
[47] K. Thitikamol and P. Keleher, Multi-threading and Remote Latency in Software DSMs, In the 17th International Conference on Distributed Computing Systems, May 1997.
[48] Kritchalach Thitikamol and Pete Keleher “Thread migration and communication minimization in DSM systems”. IEEE Proceedings, pp. 487-497, 1999.
[49] K. Thitikamol and P. Keleher. “Thread migration and load balancing in non-dedicated environments”. In Proceeding of the 14th International Parallel and Distributed Processing Symposium, pp. 583-588, May 2000.
[50] J. C. Ueng, C. K. Shieh, T. Y. Liang, “Proteus: An Efficient Runtime Reconfigurable Distributed Shared Memory System”, The Journal of Systems and Software, Vol. 56, pp 247-260, 2001.
[51] Sean Walton, Linux Threads Frequently Asked Questions (FAQ), Jan. 19, 1997.
[52] S. Zhou, X. Zheng, J. Wang, and P. Delisle, "Utopia: a Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems," SPE, 23(12), 1993, pp. 1305-1336.