簡易檢索 / 詳目顯示

研究生: 顏子翔
Yen, Tzu-Hsiang
論文名稱: JAS:Hadoop於異質環境之工作排程
JAS:A Job Scheduling in Hadoop on Heterogeneous Environments
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 60
中文關鍵詞: MapReduceHadoop異質環境異質工作量
外文關鍵詞: MapReduce, Hadoop, Heterogeneous workloads and Heterogeneous Environments
相關次數: 點閱:117下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 雲端運算是在分散式計算系統中一種新興的名詞,並且雲端運算已經在大型數據中心越來越流行,MapReduce是雲端運算的一個重要概念, 目前最廣為人知實現MapReduce概念的其中一套分散式運算系統為Hadoop。而Hadoop本身預設的任務排程為FSFC(先來先做),但工作本身所具有的工作量是不一樣的,所以會導致Hadoop無法好好的利用系統的資源。在本文中,我們設計了一個工作排程的演算法,我們稱之為工作分配排程,我們的演算法主要針對於如何將資源做完善的運用, 面對不同的工作,我們的演算法可以平行處理這些工作。我們提出的工作分配排程演算法可以改善節點的資源使用率,並且改善Hadoop系統的效能。

    Cloud computing was a novel noun in distributed computing system, and it becomes more and more popular in large data center. MapReduce was an important concept in cloud computing, and Hadoop was one of the familiar systems to implement MapReduce concept in cloud computing. When job ran in large data center, variety of job demanded different resources. Default job scheduler of Hadoop is First Come First Serve (FCFS), which does not concern to balance resource utilization. In this paper, we propose a job scheduler, called Job Allocation Scheduler (JAS), which aims at maximizing the resource usage. With variety of job workload, it could parallelize jobs into relative workload classification in Hadoop. The proposed scheduler could improve node usage of resource under heterogeneous environments, and it could improve the Hadoop performance under heterogeneous environments.

    1 Introduction 1 2 Related Works 4 2.1 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Hadoop Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Job Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Hadoop drawback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Method 20 3.1 The Proposed Job Allocation Scheduler Algorithm . . . . . . . . . . . . . 20 3.1.1 Job Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.2 Setting CPU Slots . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.3 I/O Slots Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.4 CPU Tasks Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.5 I/O Tasks Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 An enhanced Job Allocation Scheduler Algorithm . . . . . . . . . . . . . . 34 4 Experimental Results 38 4.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Performance of Job Allocation Scheduler . . . . . . . . . . . . . . . . . . . 41 4.3 The Performance of Dynamic Job Allocation Scheduler . . . . . . . . . . . 43 5 Conclusion 47 Appendices 48 Appendix A 48 A.1 Using VirtualBox to build cluster . . . . . . . . . . . . . . . . . . . . . . . 48 Appendix B 52 B.1 Hadoop Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Bibliography 58

    [1] Apache Hadoop - http://hadoop.apache.org/
    [2] Amazon Elastic Compute Cloud - http://aws.amazon.com/ec2/
    [3] Communications as a Service - http://caas.tmcnet.com/
    [4] oogle Compute Engine - https://cloud.google.com/products/compute-engine
    [5] Hadoop's Capacity Scheduler
    http://hadoop.apache.org/core/docs/current/capacity\_scheduler.html.
    [6] http://cloudstory.in/2012/07/top-10-reasons-why-startups-should-consider-cloud/
    [7] Kernel Based Virtual Machine - http://www.linux-kvm.org/page/Main_Page
    [8] Matei Zaharia, ``The Hadoop Fair Scheduler"
    http://developer.yahoo.net/blogs/hadoop/FairSharePres.ppt
    [9] Network as a service - http://searchsdn.techtarget.com/definition/Network-as-a-Service-NaaS
    [10] Official site of the Mendix - http://www.mendix.com/
    [11] Official site of the Heroku https://get.heroku.com/
    [12] Oracle Infrastructure as a Service - http://www.oracle.com/us/products/engineered-systems/iaas/overview/index.html
    [13] Windows Azure - http://www.windowsazure.com/en-us/
    [14] Wikipedia Engine Yard - https://en.wikipedia.org/wiki/Engine_Yard
    [15] Wikipedia Cloud Foundry - https://en.wikipedia.org/wiki/Cloud_Foundry
    [16] Wikipedia Google App Engine - https://en.wikipedia.org/wiki/Google_App_Engine
    [17] Wikipedia OrangeScape - https://en.wikipedia.org/wiki/OrangeScape
    [18] Wikipedia AppScale https://en.wikipedia.org/wiki/AppScale
    [19] Wikipedia OpenShift https://en.wikipedia.org/wiki/OpenShift
    [20] Wikipedia Windows Azure Cloud Services - https://en.wikipedia.org/wiki/Azure_Services_Platform
    [21] WWikipedia Cloud Computing - https://en.wikipedia.org/wiki/Cloud_computing
    [22] Xen Project - http://www.xenproject.org/
    [23] Amies, Alex; Sluiman, Harm; Tong, Qiang Guo; Liu, Guo Ning (July 2012). ``Infrastructure as a Service Cloud Concepts". Developing and Hosting Applications on the Cloud. IBM Press. ISBN 978-0-13-306684-5.
    [24] M.J. Atallah, C.L. Black, D.C. Marinescu, H.J. Siegel and T.L.Casavant, Models and algorithms for co-scheduling compute-intensive asks on a network of workstations," Journal of Parallel and Distributed Computing, 16 (4) , pp. 319-327, 1992.
    [25] ``Cloud computing in Telecommunications". Ericsson. Retrieved 16 December 2012.
    [26]D.G. Feitelson and L. Rudolph, ``Gang scheduling performance benefitsfor fine-grained synchronization," Journal of Parallel and Distributed Computing Vol. 16. No. 4. December 1992,pp.306-318
    [27] S.Ghemawat, H. Sobioff, and S.-T.Lenug. ``The Google file system," ACM SIGOPS Operating Systems Review, Vol. 37. No. 5. ACM, 2003.
    [28] Li, B. H., et al. ``Cloud manufacturing: a new service-oriented networked manufacturing model." Computer Integrated Manufacturing Systems 16.1 (2010): 1-7.
    [29]``ITU Focus Group on Cloud Computing - Part 1". International Telecommunication Union (ITU) TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU. Retrieved 16 December 2012.
    [30] Kousiouris, George, Tommaso Cucinotta, and Theodora Varvarigou. ``The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks." Journal of Systems and Software 84.8 (2011): 1270-1291.
    [31] W. Lee, M. Frank, V. Lee, K. Mackenzie, and L. Rudolph, ``Implications of I/O for Gang Scheduled Workloads," Job Scheduling Strategies for Parallel Processing, 1997, pp. 215-237
    [32]H. Lee, D. Lee and R.S. Ramakrishna, ``An Enhanced Grid Scheduling with Job Priority and Equitable Interval Job Distribution," The first International Conference on Grid and Pervasive Computing, Lecture Notes in Computer Science, vol. 3947, May 2006, pp. 53-62
    [33] National Institute of Standards and Technology ``The NIST Definition of Cloud Computing,' September, 2011
    [34] ``Network Virtualisation – Opportunities and Challenges". Eurescom. Retrieved 16 December 2012
    [35] A.J. Page and T.J. Naughton, ``Dynamic task scheduling using genetic algorithms for heterogeneous distributed computing," in 19th IEEE International Parallel and Distributed Processing Symposium, 2005.
    [36] J.K. Ousterhout, ``Scheduling techniques for concurrent systems," in Proc. of 3rd Int. Conf. on Distributed Computing Systems, May 1982, pp.22-30.
    [37] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, ``The Impact of I/O on Program Behavior and Parallel Scheduling," Proc. SIGMETRICS Conf. Measurement and Modeling of Computing Systems, 1998, pp. 56-65
    [38] E. Rosti, G. Serazzi, E. Smirni, and M.S. Squillante, ``Models of Parallel Applications with Large Computation and I/O Requirements," IEEE Trans. Software Eng., vol. 28, no. 3, Mar.2002, pp. 286-307
    [39] ``The role of virtualisation in future network architectures". Change Project. Retrieved 16 December 2012.
    [40] TIAN, Chao, et al. A dynamic mapreduce scheduler for heterogeneous workloads. In: Grid and Cooperative Computing, 2009. GCC'09. Eighth International Conference on. IEEE, 2009. p. 218-224.
    [41] Joe Weinman, ``Cloud Computing is NP-Complete" Working Paper, February 21, 2011
    [42] Yair Wiseman and Dror G. Feitelson,``Paired Gang Scheduling," IEEE Transactions on Parallel and Distributed System, vol. 14, no. 6, June 2003

    下載圖示 校內:2016-08-31公開
    校外:2016-08-31公開
    QR CODE