簡易檢索 / 詳目顯示

研究生: 陳紀廷
Chen, Chi-Ting
論文名稱: Hadoop異質環境下動態集群整合鄰近搜尋之工作排程
Dynamic Grouping integrated Neighboring Search Job Allocation Scheduler for Hadoop MapReduce in Heterogeneous Computing Environments
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 56
中文關鍵詞: Hadoop異質環境異質工作量MapReduce工作排程
外文關鍵詞: Hadoop, Heterogeneous computing environments, Heterogeneous workloads, MapReduce, Scheduling
相關次數: 點閱:84下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網路的蓬勃發展、雲端環境的成長、網路資料量爆炸性的遞增,雲端運算成為分散式系統中近幾年來炙手可熱的名詞,MapReduce是雲端運算中一個很重要的架構,而Apache Hadoop則是其中一個實現MapReduce且較廣為人知的雲端運算平台。在大型的數據中心執行任務時,不同的任務往往需要使用到不同的資源,但Hadoop本身預設的排程是採用First-Come-First-Service (FCFS-先到先服務)策略,這可能會造成資源利用度的不平衡。隨著各方學者針對工作排程的研究與改進,從DMR演算法修改成JAS、JASL、甚至DJASL。從同質環境與工作深入考量至異質環境與異質工作,利用工作的分類對應至相應的佇列,而JAS及DJASL在資源利用度方面都有一定成效的提升,且DJASL更針對資料區域性做考量有效降低額外的網路傳輸流量,但美中不足的是DJASL在有效的提升資料區域性的同時對於工作執行效能並未能有明顯的成長。因此,在本文中我們提出了一個新的工作排程方式叫做DGNS,利用集群以及鄰近搜尋的概念,以綜觀的方式同時考量MapReduce(計算)以及HDFS(資料儲存)層面,除了有效地平衡資源利用度的問題。可以有相仿的高資料區域性並且能擁有較好的效能表現。

    With the rapid development of the Internet, the growth of the cloud environment, and the amount of explosive increasing network data, cloud computing, which was a new noun in distributed computing systems in recent years become a hot term. MapReduce is one of a very important cloud computing architecture, while the Apache Hadoop is one of the more well-known implement MapReduce and cloud computing platforms. The resources required for jobs executed in a large data center very according to the type of jobs. Gen-erally, there has two kinds of Jobs, CPU-bound jobs and I/O-bound jobs, which demand different resources but run simultaneously in the same cluster. The default job scheduler of Hadoop is first-come-first-served (FCFS) and thus, may cause unbalance resource uti-lization. Given various job workloads, the JAS categorizes jobs and then assigns tasks to a CPU-bound queue or an I/O-bound queue. However, the JAS exhibited a locality problem, which was addressed by developing a modified JAS called the job allocation scheduler with locality (JASL) and create dynamic job allocation scheduler with local-ity (DJASL) which exhibited better performance and reduce extra network traffic flow. But the drawback of (DJASL) is (DJASL) effectively enhance data locality but failed to have significant growth on job execution performance. Therefore, in this paper we proposes a job scheduler with dynamic grouping integrated neighboring search strategy called (DGNS), which designed to balance resource utilization and take performance and data locality improvement into account in heterogeneous computing environments. The DGNS algorithm exhibits more favorable performance and data locality compared with Hadoop, DMR, JAS, and DJASL.

    1 Introduction 1 2 Background 7 2.1 Job Workloads 7 2.2 Default Scheduler of Hadoop (FCFS) 8 2.3 The Problem of Hadoop 10 2.4 Dynamic Map Reduce Scheduler (DMR) 11 2.5 Job Allocation Scheduler (JAS) 14 2.6 Job Allocation Scheduler with Locality (JASL) 15 2.7 Related Work 16 3 The Proposed Algorithms 20 3.1 Job Classification 21 3.2 Ratio Table 23 3.2.1 Capability of TaskTracker and Slot Setting 24 3.2.2 Capability of DataNode 26 3.3 Grouping and Allocation 30 3.3.1 Grouping 30 3.3.2 Data Block Allocating 31 3.4 Neighboring Search 31 4 Performance Evaluation 34 4.1 Experiment Environment 34 4.2 Results 36 4.2.1 Individual Performance of Each Workloads 36 4.2.2 DGNS in Heterogeneous Computing Environments 37 4.2.3 Performance and Data Locality of DGNS 39 5 Conclusion 42 Appendices 43 Appendix A 43 A.1 Using VirtualBox to build cluster 43 Appendix B 47 B.1 Hadoop Setup 47 Bibliography 52

    [1] Apache Hadoop. http://hadoop.apache.org/

    [2] Apache Hadoop YARN. http://hadoop.apache.org/docs/current/hadoop-yarn/had- oop-yarn-site/YARN.html

    [3] Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2

    [4] Communications as a Service. http://caas.tmcnet.com

    [5] Google Compute Engine. https://cloud.google.com/products/compute-engine

    [6] Hadoop’s Capacity Scheduler. http://hadoop.apache.org/core/docs/current/capaci- ty scheduler.html

    [7] On Demand Self-Service. http://cloudstory.in/2012/07/top-10-reasons-why-start- ups-should-consider-cloud/

    [8] Kernel Based Virtual Machine. http://www.linux-kvm.org/page/Main Page

    [9] The Hadoop Fair Scheduler. http://developer.yahoo.net/blogs/hadoop/FairSharePr- es.ppt

    [10] Network as a service. http://searchsdn.techtarget.com/definition/Network-as-a-S- ervice-NaaS

    [11] Official site of the Mendix. http://www.mendix.com/

    [12] Official site of the Heroku. https://get.heroku.com/

    [13] Oracle Infrastructure as a Service. http://www.oracle.com/us/products/engineered-s- ystems/iaas/overview/index.html

    [14] Windows Azure. http://www.windowsazure.com/en-us/

    [15] Wikipedia Engine Yard. https://en.wikipedia.org/wiki/Engine Yard

    [16] Wikipedia Cloud Foundry. https://en.wikipedia.org/wiki/Cloud Foundry

    [17] Wikipedia Google App Engine. https://en.wikipedia.org/wiki/Google App Engine

    [18] Wikipedia OrangeScape. https://en.wikipedia.org/wiki/OrangeScape

    [19] Wikipedia AppScale. https://en.wikipedia.org/wiki/AppScale

    [20] Wikipedia OpenShift. https://en.wikipedia.org/wiki/OpenShift

    [21] Wikipedia Windows Azure Cloud Services. https://en.wikipedia.org/wiki/Azure Ser- vices Platform

    [22] Wikipedia Cloud Computing. https://en.wikipedia.org/wiki/Cloud computing

    [23] Xen Project. http://www.xenproject.org/

    [24] Ahmad, F., Chakradhar, S. T., Raghunathan, A., and Vijaykumar, T. N., “Tarazu: opti- mizing mapreduce on heterogeneous clusters,” In ACM SIGARCH Computer Architecture News, Vol. 40, No. 1, pp. 61–74, 2012.

    [25] Jadeja, Y., and Modi, K., “Cloud computing-concepts, architecture and challenges,” In Proceedings of IEEE International Conference on Computing, Electronics and Electrical Technologies (ICCEET), pp. 877–880), 2012.

    [26] Atallah, M. J., Lock, C., Marinescu, D. C., Siegel, H. J., and Casavant, T. L., “ Co- scheduling compute-intensive tasks on a network of workstations: model and algorithms,” In Proceedings of the 11th International Conference on Distributed Computing Systems, pp. 344–352, 1991.

    [27] Bezerra, A., Hernandez, P., Espinosa, A., Moure, J.C., “Job scheduling in Hadoop with Shared Input Policy and RAMDISK,” Cluster Computing (CLUSTER), 2014 IEEE Inter- national Conference , pp. 355–363, 2014.
    [28] Chen, Q., and Deng, Q., “Cloud computing and its key techniques,” In Journal of Computer Applications, Vol. 29, No.9, 2012.

    [29] Feitelson, D. G., and Rudolph, L., “Gang scheduling performance benefitsfor fine-grained synchronization,” Journal of Parallel and Distributed Computing, Vol. 16, No.4 , pp. 306–318, 1992.

    [30] Ghemawat, S., Gobioff, H., and Leung, S. T., “The Google file system,” In ACM SIGOPS Operating Systems Review, Vol. 37, No. 5, pp. 29–43, 2003.

    [31] Ghodsi, A., Zaharia, M., Shenker, S., and Stoica, I., “Choosy: max-min fair sharing for datacenter jobs with constraints,” In Proceedings of the 8th ACM European Conference

    on Computer Systems, pp. 365–378, 2013.

    [32] Ghoshal, D., Ramakrishnan, L.,“Provisioning, Placement and Pipelining Strategies for Data-Intensive Applications in Cloud Environments,” Cloud Engineering (IC2E), 2014 IEEE International Conference , pp. 325–330, 2014.
    [33] Hammoud, M., and Sakr, M. F. , “Locality-aware reduce task scheduling for MapReduce,” In Proceedings of IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576, 2011.

    [34] Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., and Qi, L, “Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud,” In Proceedings of IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 17–24, 2010.

    [35] Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A., “Quincy: fair scheduling for distributed computing clusters,” In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 261–276, 2009.

    [36] J.K. Ousterhout, “Scheduling techniques for concurrent systems,” In Proceedings of the third International Conference on Distributed Computing Systems, pp. 22–30, 1982.

    [37] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Man- zanares, and Xiao Qin “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters” Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pp 1–9, April 2010
    [38] Kavulya, S. ; Carnegie Mellon Univ., Pittsburgh, PA, USA ; Tan, J. ; Gandhi, R.
    ; Narasimhan, P. “An Analysis of Traces from a Production MapReduce Cluster” IEEE/ACM International Conference on Cluster, Cloud and Grid Computing(CCGrid), pp 94–103, May 2010
    [39] Lee, W., Frank, M., Lee, V., Mackenzie, K., and Rudolph, L., “Implications of I/O for Gang Scheduled Workloads,” In Proceedings of Springer Berlin Heidelberg on Job Scheduling Strategies for Parallel Processing, pp. 215–237, 1997.

    [40] Li, B. H., Zhang, L., Wang, S. L., Tao, F., Cao, J. W., Jiang, X. D., ... and Chai, X. D., “Cloud manufacturing: a new service-oriented networked manufacturing model,” In Computer Integrated Manufacturing Systems, Vol. 16, No. 1, pp. 1–7, 2010.

    [41] “ITU Focus Group on Cloud Computing - Part 1”. International Telecommunication Union (ITU) TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU. Retrieved
    16 December 2012.

    [42] Kousiouris, G., Cucinotta, T., and Varvarigou, T., “The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks,” In Journal of Systems and Software, Vol. 84, No. 8, pp. 1270–1291, 2011.

    [43] Lee, H., Lee, D., and Ramakrishna, R. S., “An Enhanced Grid Scheduling with Job Priority and Equitable Interval Job Distribution,” In Proceedings of the first International Conference on Grid and Pervasive Computing, Lecture Notes in Computer Science, pp. 53–62, 2006.

    [44] Zhang, Q., Cheng, L., and Boutaba, R., “Cloud computing: state-of-the-art and research challenges,” In Journal of internet services and applications, Vol. 1, No. 1, pp. 7–18, 2010.

    [45] “Network Virtualisation–Opportunities and Challenges,” Eurescom, Retrieved 16 Decem- ber 2012.
    [46] Page, A. J., and Naughton, T. J., “Dynamic task scheduling using genetic algorithms for heterogeneous distributed computing,” In Proceedings of 19th IEEE International on Parallel and Distributed Processing Symposium, pp. 189a–189a, 2005.

    [47] Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A., “Heterogeneity and dynamicity of clouds at scale: Google trace analysis,” In Proceedings of the Third ACM Symposium on Cloud Computing, pp. 7, 2012.

    [48] Rong, G., Xiaoliang, Y., Jinshuang, Y., Yuanhao, S., Bing, W.,Chunfeng, Y., and Yihua, H., “SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters,”Journal of Parallel and Distributed Computing, Vol. 74, Issue. 3, pp. 2166–2179, 2014.
    [49] Rosti, E., Serazzi, G., Smirni, E., and Squillante, M. S., “The Impact of I/O on Program Behavior and Parallel Scheduling,” In ACM SIGMETRICS Performance Evaluation Review, Vol. 26, No. 1, pp. 56–65, 1998.

    [50] Xu, L., Minyi, W., Xuan, J., and Minig, H., “An improved chaos immune algorithm based on Hadoop framework to solve job-shop scheduling problem,” Computer Science and Net- work Technology (ICCSNT), 2013 3rd International Conference, pp. 5–9, 2013.
    [51] Yintian, W., Ruonan, R., and Yinglin, W.,“A round robin with multiple feedback job scheduler in Hadoop,” Progress in Informatics and Computing (PIC), 2014 International Conference, pp. 471–475, 2014.
    [52] Rosti, E., Serazzi, G., Smirni, E., and Squillante, M. S., “Models of Parallel Applications with Large Computation and I/O Requirements,” In IEEE Transactions on Software Engineering, Vol. 28, No. 3, pp. 286–307, 2002.

    [53] Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes, J., “Omega: flexible, scalable schedulers for large compute clusters,” In Proceedings of the 8th ACM European Conference on Computer Systems, pp. 351–364, 2013.

    [54] “The role of virtualisation in future network architectures”. Change Project. Retrieved 16 December 2012.

    [55] Tian, C., Zhou, H., He, Y., and Zha, L., “A Dynamic MapReduce Scheduler for Heteroge- neous Workloads,” In Proceedings of the 8th IEEE International Conference on Grid and

    Cooperative Computing, pp. 218–224, 2009.

    [56] Tumanov, A., Cipar, J., Ganger, G. R., and Kozuch, M. A., “alsched: Algebraic scheduling of mixed workloads in heterogeneous clouds,” In Proceedings of the third ACM Symposium on Cloud Computing, pp. 25, 2012.

    [57] Joe Weinman,“Cloud Computing is NP-Complete” Working Paper, 2011.

    [58] Wiseman, Y., and Feitelson, D. G., “Paired Gang Scheduling,” In IEEE Transactions on Parallel and Distributed System, Vol. 14, No. 6, pp. 581–592, 2003.

    [59] Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I., “Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling,” In Proceedings of the 5th European conference on Computer systems, pp. 265–278, 2010.

    [60] Zhang, X., Zhong, Z., Feng, S., Tu, B., and Fan, J., “Improving data locality of mapreduce by scheduling in homogeneous computing environments,” In Proceedings of the 9th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 120–126, 2011.

    無法下載圖示 校內:2020-09-02公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE