簡易檢索 / 詳目顯示

研究生: 陳志豪
Chen, Chi-Hao
論文名稱: 兼顧資料區域性與提昇Hadoop異質環境效能之工作排程
Improved Hadoop Job Scheduling with Locality in Heterogeneous Environments
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 56
中文關鍵詞: MapReduceHadoop異質環境異質工作量
外文關鍵詞: MapReduce, Hadoop, Heterogeneous workloads, Heterogeneous Environments
相關次數: 點閱:97下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 雲端運算是分散式系統中近幾年來新興的名詞,而且在大型的數據中心有愈來愈流行的趨勢。MapReduce是雲端運算中一個很重要的架構,而Hadoop則是其中一個實現MapReduce且較廣為人知的雲端運算平台。在大型的數據中心執行任務時,不同的任務往往需要使用到不同的資源,但Hadoop本身預設的排程是FCFS(先到先服務),這可能會造成資源利用度的不平衡。因此,在本文中我們提出了一個新的排程方式叫做JAS,可以有效地平衡資源利用度的問題。當不同型態的任務需要被執行時,JAS可以對其進行分類並將之放到對應的佇列中,像是CPU或I/O的佇列。美中不足的是JAS可能會造成資料區域性的降低而產生額外大量的網路流量,因此我們為了解決這個問題而又設計出了DJAS的演算法。DJAS在增加效能的同時也可以達到不菲的資料區域性。最後,我們在DJAS中增加了兩個參數來防止錯誤的插槽設置,我們將其叫做DJASL。而DJASL除了可以擁有較JAS更佳的效能外,亦能達到與JASL相仿的資料區域性。

    Cloud computing, which was a new noun in distributed computing systems in recent years, has becomes increasingly popular in large data centers. Hadoop is a system commonly used to implement the MapReduce function, which pays the important role in cloud computing. For jobs run in a large data center, the type of jobs determine the various resources that they require. The default job scheduler of Hadoop is First-Come-First-Served, which may cause the unbalance of resource utilization. This paper proposes a job scheduler, called Job Allocation Scheduler (JAS), that is designed to balance the resource utilization. Given a variety of job workloads, JAS can categorizing jobs and then put tasks into the relative queue such as CPU-bound queue or I/O-bound queue. Unfortunately, JAS may arise another problem - locality, so we modi ed JAS to address it, called Job Allocation with Locality Scheduler (JASL). The proposed scheduler can improve the usage of nodes, and the performance of Hadoop in heterogeneous environments. Finally, we add two parameters to detect the wrong slots setting, called Dynamic Job Allocation Scheduler with Locality (DJASL). DJASL has the better performance compared with JAS and the similar data locality compared with JASL.

    1 Introduction 1 2 Background 4 2.1 Default Scheduler of Hadoop 4 2.2 Job Workloads 6 2.3 The Problem of Hadoop 6 2.4 DMR Scheduler 8 2.5 Related Work 11 3 Method 13 3.1 Job Allocation Scheduler Algorithm 13 3.1.1 Job Classi cation 15 3.1.2 CPU Slots Setting 17 3.1.3 I/O Slots Setting 20 3.1.4 CPU Tasks Assignment 22 3.1.5 I/O Tasks Assignment 24 3.1.6 Job Allocation Scheduler (JAS)26 3.2 Job Allocation Scheduler with Locality Algorithm 27 3.2.1 The Problem of JAS 27 3.2.2 Job Allocation Scheduler with Locality (JASL) 29 3.3 Dynamic Job Allocation Scheduler with Locality Algorithm 32 4 Experimental Results 36 4.1 Environment 36 4.2 The Results of Experiments 37 4.2.1 Performance and Data Locality of JAS and JASL 37 4.2.2 The Performance and Data Locality of Dynamic Job Allocation Scheduler 39 5 Conclusion 43 Appendices 44 Appendix A 44 A.1 Using VirtualBox to build cluster 44 Appendix B 48 B.1 Hadoop Setup 48 Bibliography 53

    [1] Apache Hadoop. http://hadoop.apache.org/
    [2] Apache Hadoop YARN. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
    [3] Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2
    [4] Communications as a Service. http://caas.tmcnet.com
    [5] Google Compute Engine. https://cloud.google.com/products/compute-engine
    [6] Hadoop's Capacity Scheduler. http://hadoop.apache.org/core/docs/current/capacity scheduler.html
    [7] On Demand Self-Service. http://cloudstory.in/2012/07/top-10-reasons-why-start-
    ups-should-consider-cloud/
    [8] Kernel Based Virtual Machine. http://www.linux-kvm.org/page/Main Page
    [9] The Hadoop Fair Scheduler. http://developer.yahoo.net/blogs/hadoop/FairSharePres.ppt
    [10] Network as a service. http://searchsdn.techtarget.com/definition/Network-as-a-Service-NaaS
    [11] Official site of the Mendix. http://www.mendix.com/
    [12] Official site of the Heroku. https://get.heroku.com/
    [13] Oracle Infrastructure as a Service. http://www.oracle.com/us/products/engineered-systems/iaas/overview/index.html
    [14] Windows Azure. http://www.windowsazure.com/en-us/
    [15] Wikipedia Engine Yard. https://en.wikipedia.org/wiki/Engine Yard
    [16] Wikipedia Cloud Foundry. https://en.wikipedia.org/wiki/Cloud Foundry
    [17] Wikipedia Google App Engine. https://en.wikipedia.org/wiki/Google App Engine
    [18] Wikipedia OrangeScape. https://en.wikipedia.org/wiki/OrangeScape
    [19] Wikipedia AppScale. https://en.wikipedia.org/wiki/AppScale
    [20] Wikipedia OpenShift. https://en.wikipedia.org/wiki/OpenShift
    [21] Wikipedia Windows Azure Cloud Services. https://en.wikipedia.org/wiki/Azure Services Platform
    [22] Wikipedia Cloud Computing. https://en.wikipedia.org/wiki/Cloud computing
    [23] Xen Project. http://www.xenproject.org/
    [24] Ahmad, F., Chakradhar, S. T., Raghunathan, A., and Vijaykumar, T. N., "Tarazu: optimizing mapreduce on heterogeneous clusters," In ACM SIGARCH Computer ArchitectureNews, Vol. 40, No. 1, pp. 61-74, 2012.
    [25] Jadeja, Y., and Modi, K., "Cloud computing-concepts, architecture and challenges," In Proceedings of IEEE International Conference on Computing, Electronics and Electrical Technologies (ICCEET), pp. 877-880, 2012.
    [26] Atallah, M. J., Lock, C., Marinescu, D. C., Siegel, H. J., and Casavant, T. L., "Co-scheduling compute-intensive tasks on a network of workstations: model and algorithms,"
    In Proceedings of the 11th International Conference on Distributed Computing Systems, pp. 344-352, 1991.
    [27] Chen, Q., and Deng, Q., "Cloud computing and its key techniques," In Journal of Computer Applications, Vol. 29, No.9, 2012.
    [28] Feitelson, D. G., and Rudolph, L., "Gang scheduling performance benefi tsfor-grained synchronization," Journal of Parallel and Distributed Computing, Vol. 16, No.4 , pp. 306-318, 1992.
    [29] Ghemawat, S., Gobioff, H., and Leung, S. T., "The Google file system," In ACM SIGOPS Operating Systems Review, Vol. 37, No. 5, pp. 29-43, 2003.
    [30] Ghodsi, A., Zaharia, M., Shenker, S., and Stoica, I., "Choosy: max-min fair sharing for datacenter jobs with constraints," In Proceedings of the 8th ACM European Conference on Computer Systems, pp. 365-378, 2013.
    [31] Hammoud, M., and Sakr, M. F. , "Locality-aware reduce task scheduling for MapReduce," In Proceedings of IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570-576, 2011.
    [32] Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., and Qi, L, "Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud," In Proceedings of IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 17-24, 2010.
    [33] Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A., "Quincy: fair scheduling for distributed computing clusters," In Proceedings of the ACM
    SIGOPS 22nd symposium on Operating systems principles, pp. 261-276, 2009.
    [34] J.K. Ousterhout, "Scheduling techniques for concurrent systems," In Proceedings of the third International Conference on Distributed Computing Systems, pp. 22-30, 1982.
    [35] Lee, W., Frank, M., Lee, V., Mackenzie, K., and Rudolph, L., "Implications of I/O for Gang Scheduled Workloads," In Proceedings of Springer Berlin Heidelberg on Job Scheduling Strategies for Parallel Processing, pp. 215 237, 1997.
    [36] Li, B. H., Zhang, L., Wang, S. L., Tao, F., Cao, J. W., Jiang, X. D., ... and Chai, X. D., "Cloud manufacturing: a new service-oriented networked manufacturing model," In Computer Integrated Manufacturing Systems, Vol. 16, No. 1, pp. 1-7, 2010.
    [37] "ITU Focus Group on Cloud Computing - Part 1". International Telecommunication Union (ITU) TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU. Retrieved 16 December 2012.
    [38] Kousiouris, G., Cucinotta, T., and Varvarigou, T., "The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks," In Journal of Systems and Software, Vol. 84, No. 8, pp. 1270 1291, 2011.
    [39] Lee, H., Lee, D., and Ramakrishna, R. S., "An Enhanced Grid Scheduling with Job Priority and Equitable Interval Job Distribution," In Proceedings of the first International Conference on Grid and Pervasive Computing, Lecture Notes in Computer Science, pp. 53-62, 2006.
    [40] Zhang, Q., Cheng, L., and Boutaba, R., "Cloud computing: state-of-the-art and research challenges," In Journal of internet services and applications, Vol. 1, No. 1, pp. 7-18, 2010.
    [41] "Network Virtualisation{Opportunities and Challenges," Eurescom, Retrieved 16 December 2012.
    [42] Page, A. J., and Naughton, T. J., "Dynamic task scheduling using genetic algorithms for heterogeneous distributed computing," In Proceedings of 19th IEEE International on Parallel and Distributed Processing Symposium, pp. 189a-189a, 2005.
    [43] Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A., "Heterogeneity and dynamicity of clouds at scale: Google trace analysis," In Proceedings of the Third ACM Symposium on Cloud Computing, pp. 7, 2012.
    [44] Rosti, E., Serazzi, G., Smirni, E., and Squillante, M. S., "The Impact of I/O on Program Behavior and Parallel Scheduling," In ACM SIGMETRICS Performance Evaluation
    Review, Vol. 26, No. 1, pp. 56-65, 1998.
    [45] Rosti, E., Serazzi, G., Smirni, E., and Squillante, M. S., "Models of Parallel Applications with Large Computation and I/O Requirements," In IEEE Transactions on Software Engineering, Vol. 28, No. 3, pp. 286-307, 2002.
    [46] Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes, J., "Omega: flexible, scalable schedulers for large compute clusters," In Proceedings of the 8th ACM European Conference on Computer Systems, pp. 351-364, 2013.
    [47] "The role of virtualisation in future network architectures". Change Project. Retrieved 16 December 2012.
    [48] Tian, C., Zhou, H., He, Y., and Zha, L., "A Dynamic MapReduce Scheduler for Heterogeneous Workloads," In Proceedings of the 8th IEEE International Conference on Grid and Cooperative Computing, pp. 218-224, 2009.
    [49] Tumanov, A., Cipar, J., Ganger, G. R., and Kozuch, M. A., "alsched: Algebraic scheduling of mixed workloads in heterogeneous clouds," In Proceedings of the third ACM Symposium on Cloud Computing, pp. 25, 2012.
    [50] Joe Weinman, "Cloud Computing is NP-Complete" Working Paper, 2011.
    [51] Wiseman, Y., and Feitelson, D. G., "Paired Gang Scheduling," In IEEE Transactions on Parallel and Distributed System, Vol. 14, No. 6, pp. 581-592, 2003.
    [52] Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I., "Delay scheduling: a simple technique for achieving locality and fairness in cluster
    scheduling," In Proceedings of the 5th European conference on Computer systems, pp. 265-278, 2010.
    [53] Zhang, X., Zhong, Z., Feng, S., Tu, B., and Fan, J., "Improving data locality of mapreduce by scheduling in homogeneous computing environments," In Proceedings of the 9th IEEE International Symposium on Parallel and Distributed Processing with Applications(ISPA), pp. 120-126, 2011.

    下載圖示 校內:2019-08-27公開
    校外:2019-08-27公開
    QR CODE