簡易檢索 / 詳目顯示

研究生: 劉晏佐
Liu, Yen-Tso
論文名稱: 具異質資源能力特性的分散式共用記憶體系統上之負載分配研究
Study on Workload Distribution for DSM Clusters with Heterogeneous Resource Capabilities
指導教授: 謝錫堃
Shieh, Ce-Kuen
梁廷宇
Liang, Tyng-Yeu
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 87
中文關鍵詞: 記憶體異質資源能力負載分配資源導向DSM
外文關鍵詞: DSM, Workload distribution, Resource-Oriented, memory resource, prediction
相關次數: 點閱:81下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 適當的負載分配對於提升分散式共用記憶體系統上的使用者程式效能是一個非常重要的考量。傳統的分散式共用記憶體系統通常是簡單地將工作負載平均分配到所有節點上執行。但如果各個節點的系統資源不相同的話,擁有較少系統資源的節點將成為平行程式之整體效能的瓶頸。針對這個問題,本研究發展出一個名為資源導向式負載分配方案(ROWDS)之動態系統負載分配方案,可確保應用程式於具備異質資源能力之節點所組成的DSM叢集環境上執行時能獲得良好的執行效能。ROWDS將預測的過程分拆成兩個獨立的步驟以減輕系統估算最佳負載分配方案的負擔。第一個步驟藉由考量個別節點可用的處理器能力和實體記憶體資源來決定最佳的工作負載比例,而第二個步驟則依據工作負載間之資料共享特性和各節點間之網路頻寬能力來決定工作負載的實際分配位置以達到通訊成本的最小化。實驗結果顯示,ROWDS因為同時考量了三種資源因素,因而比那些分別只考量單一或兩個資源因素之負載分配方案更能提供應用程式更好的執行效能。

    Workload distribution is an important issue for the performance of user programs executed on software distributed shared memory systems. Most previous DSM systems simply divided the computation by distributing the workload equally to each node. However, the node with the fewer system resources will be the bottleneck of the application performance if the system resources of each node are inconsistency. To address this problem, this study is aimed at developing a dynamic workload distribution scheme called as ROWDS (Resource-Oriented Workload Distribution Scheme) to make DSM applications obtain a good performance on a cluster of computers with heterogeneous resource capabilities. This scheme adopts a formula of predicting the execution time of a DSM application executed on a given workload distribution pattern. To minimize the complexity of searching the optimal workload distribution patterns, ROWDS divided the task of workload distribution into two distinct steps. The first step is to decide the number of threads assigned to each node according to the CPU capabilities and physical memory capabilities of computers. The second step is to determine which threads are mapped onto the same node according to data sharing and network capability among computers. The experimental investigations confirm that ROWDS, which takes these three resource capabilities, i.e., CPU, memory and network into consideration simultaneously, delivers a superior application performance than workload distribution schemes which consider only one or two of these capability resource factors.

    Chapter 1. Introduction 1 1.1 Motivation 2 1.2 Contributions 4 1.3 Thesis Organization 6 Chapter 2. Related Works 7 Chapter 3. Proposed Workload Distribution Scheme 13 3.1 Scheme Overview 13 3.2 Formulae Analysis 16 3.3 Multi-combination and Multi-phases (MCMP) Algorithm 26 3.3.1 The First Subphase of MCMP 29 3.3.2 The Second Subphase of MCMP 30 3.3.3 The Third Subphase of MCMP 35 3.3.4 The Fourth Subphase of MCMP 37 Chapter 4. Implementation 39 4.1 System Overview of Teamster 39 4.2 Information Collection Policy 41 4.3 Adaptive Trigger Condition 43 4.4 Communication Minimization 45 4.4.1 Page Pre-fetching Mechanism 45 4.4.2 Copyset Adjustment Mechanism 47 Chapter 5. Performance Evaluation 50 5.1 Emulation Analyses 50 5.2 Experimental Results 52 5.2.1 Experimental Applications 53 5.2.2 Effectiveness of ROWDS 61 5.2.3 Advanced Experiments and Analyses 70 5.2.4 Effectiveness of Communication Minimization Mechanisms 74 Chapter 6. Conclusions and Future Work 78 Bibliography 80 Vita 87

    [1] G. A. Geist, V. S. Sunderam, “The evolution of the PVM concurrent computing system”, IEEE Comput. Soc. Intl. Conf. (COMPCON), pp. 549-557, February 1993.
    [2] Lusk, E., “MPI in 2002: has it been ten years already?”, Proceedings of the IEEE International Conference on Cluster Computing, pp. 435, 2002.
    [3] Protic J, Tomasevic M, Milutinovic V, “Distributed shared memory: concepts and systems”, IEEE Parallel & Distributed Technology: Systems & Applications [see also IEEE Concurrency], Volume: 4 Issue: 2, summer 1996.
    [4] Niranjan G. Shivaratri, Phillip Krueger, Mukesh Singhal, “Load Distributing for Locally Distributed Systems”, Computer, Vol. 25, No. 12, pp. 33-44, December 1992.
    [5] Ammar Alhusaini, Viktor K. Prasanna, and C.S. Raghavendra, “A unified resource scheduling framework for heterogeneous computing systems”, Proceedings of the IEEE 8th Heterogeneous Computing Workshop, 1999. (HCW 99)
    [6] Weisong Shi, Zhimin Tang, “Dynamic computation scheduling for load balancing in home-based software DSMs”, Parallel Architectures, Algorithms, and Networks, 1999. (I-SPAN '99)
    [7] Alex Dubrovski, Roy Friedman, Assaf Schuster, “Load Balancing in Distributed Shared Memory Systems”, International Journal of Applied Software Technology, Vol. 3, pp. 167-202, March 1998.
    [8] Thitikamol, K., and Keleher, P., “Thread Migration and Load Balancing in Non-Dedicated Environments”, 14th International Parallel and Distributed Processing Symposium (IPDPS'00), pp. 583-588, 2000.
    [9] Tyng-Yeu Liang, Jyh-Chang Ueng, Ce-Kuen Shieh, Deh-Yuan Zhuang, Jun-Qi Lee, “Distinguishing Sharing Types to Minimize Communication in Software Distributed Shared Memory Systems”, Journal of Systems and Software, vol.55, pp.73-85, 2000.
    [10] Yi-Chang Zhuang, Ce-Kuen Shieh, Tyng-Yue Liang, Jun-Qi Lee, and Li-Ming Tseng, “A group-based load balance scheme for software distributed shared memory systems”, Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001.
    [11] Tyng-Yeu Liang, Ce-Kuen Shieh, Jun-Qi Li, “An effective selection policy for load balancing in software DSM”, Proceedings of the International Conference on Parallel Processing, 2000.
    [12] Chi-Chung Liao, “Communication Minimization of Progressive Multi-layer Reconfiguration on Teamster”, Master thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Twiwan, R.O.C, 2001.
    [13] Yi-Chang Zhuang, Ce-Kuen Shieh, Tyng-Yue Liang, Chih-Hui Chou, “Maximizing speedup through performance prediction for distributed shared memory systems”, 21st International Conference on Distributed Computing Systems, Apr 2001.
    [14] M.J. Litzkow, M. Livny, and M.W. Mutka, “Condor - A hunter of Idle Workstations”, Proceedings of the Eighth International Conference on Distributed Computing Systems, IEEE CS Press, Los Alamitos, Calif., Order No. 865, pp. 104-111, 1988.
    [15] An-Chow Lai; Ce-Kuen Shieh; Yih-Tzye Kok, “Load balancing in distributed shared memory systems”, Proceedings of IEEE International Performance, Computing, and Communications Conference, Arizona, U.S.A., pp. 152-158, February 1997.
    [16] Hollingsworth, J.K.; Keleher, P.J., “Prediction and adaptation in Active Harmony”, Proceedings of the Seventh International Symposium on the High Performance Distributed Computing, 28-31 Jul, 1998.
    [17] Xiaodong Zhang, Yanxia Qu, and Li Xiao, “Improving Distributed Workload Performance by Sharing Both CPU and Memory Resources”, Proceedings of 20th International Conference on Distributed and Computing Systems, 2000.
    [18] Songqing Chen, Li Xiao, and Xiaodong Zhang, “Dynamic Load Sharing with Unknown Memory Demands in Clusters”, Proceedings of the 21th International Conference on Distributed and Computing Systems, 2001.
    [19] Li Xiao, Songqing Chen, and Xiaodong Zhang, “Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands”, IEEE Transactions on Parallel and Distributed Systems, Vol.13, No.3, March 2002.
    [20] Thitikamol, K.; Keleher, P.J. “Active correlation tracking”, Proceedings of the 19th IEEE International Conference on Distributed Computing Systems, 1999.
    [21] Vincent W. Freeh, David K. Lowenthal, and Gregory R. Andrews, “Distributed Filaments: Efficient Fine-Grain Parallelism on A Cluster of Workstations”, Proceedings of First Symposium on Operating Systems Design and Implementation, pp. 201-212, 1994.
    [22] Jim Mauro and Richard McDougall, “Solaris Internals: Core Kernel Components”, Sun Microsystems Press, 2001, ISBN: 0-13-022496-0
    [23] Kritchal Thitikamol, Peter J. Keleher, “Active Tracking Correlation”, Proceedings of the 19th International Conference on Distributed Computing Systems, 1999.
    [24] Eom, H., and Hollingsworth, J. K., “LBF: A Performance Metric for Program Reorganization”, International Conference on Distributed Computing Systems, pp. 222-229, 1998.
    [25] Zhuang, Y. C., Shieh, C. K., and Liang, T. Y., “Automatic Reconfiguration for Maximizing System Performance on Software Distributed Shared Memory Systems”, The Proceedings of the IASTED 2002 International Conference on Networks, Parallel and Distributed Processing, and Applications, pp. 53-58, 2002.
    [26] Zhuang, Y. C., Shieh, C. K., Chen, K. L., Liang, T. Y., and Ueng, J. C., “Maximizing Speedup through Self-tuning of Node Allocation on Heterogeneous DSM Systems”, International Conference on Open Source, 2002.
    [27] Zhuang, Y. C., Shieh, C. K., and Liang, T. Y., “Centralized Load Balance on Distributed Shared Memory Systems”, The Fourth Workshop on Compiler Techniques for High-Performance Computing (CTHPC'98), pp. 166-174, 1998.
    [28] Zhuang, Y. C., Shieh, C. K., and Liang, T. Y., “The Study of Centralized Load Balance on Distributed Shared Memory Systems”, Proceedings of the 1998 International Computer Symposium, pp.137-143, 1998.
    [29] Geist, G. A., and Sundern, V. S., “Network-Based Concurrent Computing on the PVM system”, Concurrency: Practice and Experience, pp. 293-311, 1992.
    [30] Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard version 1.0”, 1994.
    [31] Agrawal, V. D., and Chakradhar, S. T., “Performance Analysis of Synchronized Iterative Algorithms on Multiprocessor Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 5, pp. 739-746, November 1992.
    [32] Bennett, J. K., Carter, J. B and Zwaenepoel, W., “Munin: Distributed Shared Memory Based on Type-Specific Memory Coherence”, Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), pp. 168-176, March, 1990.
    [33] Bershad, B. N., Zekauskas, M. J., and Sawdon, W. A., “Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors”, Annual IEEE International Computer Conference, pp. 528-537, 1993.
    [34] Johnson, K. L., Kashoek, M. F. and Wallach, D. A., “CRL: High-Performance All-Software Distributed Shared Memory”, ACM Operating Systems Review, volume 29, No. 5, pp. 213-226, 1995.
    [35] Li, K., “Shared Virtual Memory on Loosely Coupled Multi-processors”, Ph.D. Thesis, Yale University, 1986.
    [36] Li, K., “IVY: A Shared Virtual Memory System for Parallel Computing”, Proceedings of 1988 IEEE International Conference on Parallel Processing, pp. 94-101, August 1988.
    [37] Shieh, C. K., Lai, A. C., Ueng, J. C., Liang, T. Y., Chang, T. Z., and Mac, S. C., “Cohesion: An Efficient Distributed Shared Memory System Supporting Multiple Memory Consistency Models”, Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, March, 1995.
    [38] Speight, W. E., and Bennett, J. K., “Brazos: A Third Generation DSM System”, Proceedings of the USENIX Windows NT Workshop, 1997.
    [39] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel, “TreadMarks: Shared Memory Computing on Networks of Workstations”, IEEE Computer, Vol. 29, No. 2, pp. 18-28, February 1996.
    [40] Hutto, P. W., and Ahamad, M., “Slow memory: Weakening consistency to enhance concurrency in distributed shared memories”, Proceedings of 10th International Conference on Distributed Computing Systems, pp. 302-311, May 1990.
    [41] Zhou, J. Z., Mizuno, M., and Singh, G., “A Sequentially Consistent Distributed Shared Memory”, International Conference on Computing and Information, pp. 165-169, 1993.
    [42] Keleher, P., Cox, A. L. and Zwaenepoel W., “Lazy Release Consistency for Software Distributed Shared Memory”, Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA92), pp. 13-21, 1992.
    [43] Lamport, L., “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs”, IEEE Transactions on Computer, volume C-28, pp. 690-691, September 1979.
    [44] Cater J. B, “Efficient Distributed Shared Memory Based on Multi-Protocol Release Consistency”, Ph. D. thesis, Rice University, 1993.
    [45] Gharachorloo, K., Lenoski, D., Laudon, J., Gibbons, P., Gupta, A., and Hennessy, J., “Memory consistency and event ordering in scalable shared-memory multiprocessors”, in Proc. 17th Annual International Symposium on Computer Architecture, pp. 15-26, 1990.
    [46] Iftode, L., Singh, J. P., Li, K., “Scope Consistency: A Bridge between Release Consistency and Entry Consistency”, Theory of Computing Systems, 31, pp. 451-473, 1998.
    [47] Itzkovitz, A., Schuster, A., and Shalev, L., “Thread migration and its applications in distributed shared memory systems”, The Journal of Systems and Software, volume 42, No. 1, pp.71-87, 1998.
    [48] Thitikamol, K., and Keleher P., “Per-Node Multithreading and Remote Latency”, IEEE Transactions on Computers, volume 47, No. 4, pp. 414-426, 1998.
    [49] Tyng-Yeu Liang, Yeu-Tso Liu, Yi-Ching Chen, Ce-Kuen Shieh. “Considering Thread Memory Demand into Workload Distribution for Software DSM Systems”, Proceedings of 6th World Multi-conference on Systemic, Cybernetics and Informatics, vol. XI, Computer Science II, p. 553-558, July 14-18, 2002.
    [50] Ueng, J. C., Shieh, C. K, Mac, S. C., Lai, A. C. and Liang, T. Y., “Multi-threaded Design for a Software Distributed Shared Memory System”, IEICE Transactions on Information and Systems, volume E82-D, No.12, pp.1512-1523, 1999.
    [51] Raquel Pinto, Ricardo Bianchini, and Claudio L. Amorim, “Comparing Latency-Tolerance Techniques for Software DSM Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 11, pp. 1180-1190, November 2003.
    [52] Renard, H., Robert, Y., Vivien, F. and Legrand, A., “Mapping and Load-Balancing Iterative Computations”, IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 6, pp. 546-558, June 2004.
    [53] Yen-Tso Liu, Tyng-Yeu Liang, Chi-Ting Huang, Ce-Kuen Shieh, “Memory Resource Considerations in the Load Balancing of Software DSM Systems”, Proceedings of the 2003 ICPP Workshops on CRTPC, p.71-78
    [54] Yen-Tso Liu, Tyng-Yeu Liang, Zhe-Hung Kuo, Ce-Kuen Shieh, “Involving Memory Resource Consideration into Workload Distribution for Software DSM Systems”, CCGrid 2004 WORKSHOPS (DSM 2004), Disc1, April 19-22, Chicago, Illinois, USA
    [55] Yen-Tso Liu, Tyng-Yeu Liang, Ce-Kuen Shieh, “Adding Memory Resource Consideration into Workload Distribution for Software DSM Systems”, Proceedings of the Fifteenth IASTED International Conference on PDCS 2003, Volume II, p.713-718
    [56] Yen-Tso Liu, Tyng-Yeu Liang, Jyh-Biau Chang, Ce-Kuen Shieh, "Incorporating Memory Resource Considerations into the Workload Distribution of Software DSM Systems", Journal of Information Science and Engineering(JISE), January 2007, Vol. 23 No.1, pp.243-257.
    [57] Tyng-Yeu Liang, Yen-Tso Liu, Ce-Kuen Shieh, Ching-Min Huang, Liang-I Chang, "A Multiphased Algorithm of Workload Distribution for Software DSM Clusters", ICOMP'05 - The 2005 International Conference on Internet Computing, Monte Carlo Resort, Las Vegas, Nevada, USA, (June 27-30, 2005), Disc1
    [58] Yen-Tso Liu, Tyng-Yeu Liang, Ce-Kuen Shieh, "Adapting workload distribution on software DSM clusters", Software-Practice and Experience(SP&E) 2006; 36:1133–1155. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/spe.762
    [59] Tyng-Yeu Liang, Yen-Tso Liu, Ce-Kuen Shieh, “A New Approach for Workload Distribution on Software DSM Systems”, Appearing in International Symposiums on Parallel Architecture, Algorithm and Network 2004 (ISPAN 2004) , May 10-12, Hong Kong, Mainland China.
    [60] Steed, M. R., and Clement, M. J., “Performance prediction of PVM programs”, Proceedings of the 10th International Parallel Processing Symposium, pp. 803-807, 1996.
    [61] Calzarossa, M., Massari, L., and Tessera, D., “A Methodology towards Automatic Performance analysis of Parallel Applications”, Parallel Computing, vol. 30, issue 2, pp. 211-223, February 2004.
    [62] Blanco, V., Gonzalez, J.A., Leon, C., Rodriguez, C., Rodriguez, G., and Printista, M., “Predicting the Performance of Parallel Programs”, Parallel Computing, vol. 30, issue 3, pp. 337-356, March 2004.
    [63] Valiant., L. G., “A Bridging Model for Parallel Computation”, Communication of ACM, vol. 33, no. 8, pp. 103-111, 1990.
    [64] Ghosal, D., Serazzi, G., and Tripathi, S. K., “The Processor Working Set and its Use in Scheduling Multiprocessor Systems”, IEEE Transactions on Software Engineering, vol. 17, no. 5, pp. 443-453, May 1991.
    [65] Vrsalovic, D. F., Siewiorek, D. P., Segall, Z. Z., and Gehringer, E.F., “Performance Prediction and Calibration for a Class of Multiprocessors”, IEEE Transactions on Computers, vol. 37, no. 11, pp. 1353-1365, November 1988.
    [66] Zhang, X., and Qin, X., “Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor”, IEEE Transactions on Software Engineering, vol. 17, no. 10, pp. 1059-1068, October 1991.
    [67] Alex Dubrovski, Roy Friedman and Assaf Schuster, “Load Balancing in Distributed Shared Memory Systems”, International Journal of Applied Software Technology, Vol. 3, pp. 167-202, March 1998.
    [68] Jeffrey K. Hollingsworth, Peter Keleher, “Prediction and Adaptation in Active Harmony”, Cluster Computing, 2 (1999), 1999
    [69] Weiwu Hu, Weisong Shi, Zhimin Tang, “JIAJIA: An SVM System Based on A New Cache Coherence, Protocol”, Proceedings of the High Performance Computing and Networking (HPCN'99), pp. 463-472, 1999.

    下載圖示 校內:立即公開
    校外:2007-07-17公開
    QR CODE