簡易檢索 / 詳目顯示

研究生: 謝光昱
Hsieh, Kuang-Yu
論文名稱: 使用動態資料配置策略改善於異質環境下之Hadoop效能
A Dynamic Data Placement Policy for Hadoop in Heterogeneous Environments
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 51
中文關鍵詞: MapReduceHadoop異質環境資料放置策略
外文關鍵詞: MapReduce, Hadoop, Heterogeneous, Data Placement
相關次數: 點閱:119下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 雲端運算是一種平行分散式的運算系統,在近幾年來越來越受歡迎。在雲端運算裡,MapReduce是一個很受歡迎的模組,同時Mapreduce對於大規模的資料平行應用也是一個重要的程式設計模組。Hadoop則是一個將MapReduce模型實作出來的平台,他是屬於開放原始碼的軟體,並且Hadoop經常被使用於資料密集的應用上,像是資料探勘以及網路索引。Hadoop在運行時會假設在叢集裡所有的機器節點都擁有相同的計算能力,並且每台節點執行工作所需的資料都是在本機上的,不需要進行資料的傳輸。然而,在一些私人的叢集或是計算中心並不會符合同質性,而在這樣的異質環境底下則可能會增加額外的開銷並且降低MapReduce的效能。因此在本篇論文中,我們設計了一個資料放置的演算法,用來解決節點會有工作量不平衡的問題。我們所提出的方法可以動態的調整以平衡在每台節點上資料的儲存,而調整的方式則是根據在異質環境的Hadoop叢集裡每台節點各自的運算能力來調配,這樣可以減少時間花在資料傳輸上,來達成改善Hadoop的效能。在實驗的結果顯示出,使用我們的演算法 - 動態資料放置策略在異質環境底下可以降低執行時間並且提升Hadoop的效能。

    Cloud computing is one kind of parallel distributed computing system that becomes very popular computer application. MapReduce is popular model in cloud computing and also an important programming model for large-scale data-parallel application. Hadoop is an open-source implementation of MapReduce model, it is usually used for data-intensive application such as data mining and web indexing. The current Hadoop implementation assumes that every node in cluster have same computing capacity and task are data-local. However, there are not satisfied that homogeneity and data locality in private cluster and virtualized data centers, which may increase extra overhead and reduce MapReduce performance. In this paper, we propose a data placement algorithm to resolve the unbalanced node workload problem. The proposed method can dynamic adaptive balances data stored in each node based on computing capacity of each node in a heterogeneous Hadoop cluster. It could reduce data transfer time to achieve improved Hadoop performance. Experimental results show that dynamic data placement policy could decrease the time of execution and improve Hadoop performance in a heterogeneous cluster.

    1 Introduction 1 2 Related Works 4 2.1 Cloud Computing.... 4 2.1.1 Cloud Computing Essential Characteristics.. 5 2.1.2 Service Models.... 6 2.1.3 Deployment models...10 2.2 Hadoop...12 2.3 MapReduce.....13 2.4 HDFS......15 2.5 Motivation...17 3 Dynamic Data Placement Policy 21 3.1 RatioTable.....21 3.2 Phase1....23 3.3 Phase2....25 4 Experimental Results 28 4.1 Environment.... 28 4.2 Result......... . . 30 5 Conclusion 38 Appendices 39 Appendix A 39 A.1 Building the Hadoop cluster by using VirtualBox... . . 39 Appendix B 43 B.1 Hadoop Setup...43 Bibliography 49

    [1] Amazon Elastic Compute Cloud - http://aws.amazon.com/ec2/
    [2] Amazon Elastic MapReduce
    http://aws.amazon.com/elasticmapreduce/
    [3] Apache
    http://httpd.apache.org/
    [4] AWS Elastic Beanstalk - http://aws.amazon.com/elasticbeanstalk/
    [5] Cloud Foundry - http://www.cloudfoundry.com/
    [6] Engine Yard https://www.engineyard.com/
    [7] Force.com - http://www.force.com/
    [8] Go Grid Cloud Servers - http://www.gogrid.com/products/infrastructure-cloud-servers
    [9] Google App Engine - https://appengine.google.com/start
    [10] Google Compute Engine - https://cloud.google.com/products/compute-engine
    [11] Hadoop
    http://hadoop.apache.org/
    [12] Hadoop MapReduce
    http://hadoop.apache.org/docs/stable/mapred tutorial.html
    [13] Hadoop Distributed File System
    http://hadoop.apache.org/docs/stable/hdfs design.html
    [14] Hadoop Yahoo
    http://www.ithome.com.tw/itadm/article.php?c=49410&s=4
    [15] Heroku - https://www.heroku.com/
    [16] HP Cloud Services - https://www.hpcloud.com/
    [17] Jelastic - http://jelastic.com/
    [18] Mendix - http://www.mendix.com/
    [19] OpenShift - https://www.openshift.com/
    [20] Oracle Infrastructure as a Service - http://www.oracle.com/us/products/engineered- systems/iaas/overview/index.html
    [21] Orange Scape - http://www.orangescape.com/
    [22] ReadySpace Cloud Services - http://www.readyspace.com/
    [23] Secure Shell Script http://en.wikipedia.org/wiki/Secure Shell
    [24] WhatIs.com CaaS - http://whatis.techtarget.com/de¯nition/Communications-as-a-
    Service-CaaS
    [25] Wikipedia Cloud Computing - https://en.wikipedia.org/wiki/Cloud computing
    [26] Wikipedia Converged infrastructure - http://en.wikipedia.org/wiki/Converged infrastructure
    [27] Windows Azure - http://www.windowsazure.com/en-us/
    [28] WindowsAzureCloudServices - http://www.windowsazure.com/en-
    us/documentation/services/cloud-services/?fb=zh-tw
    [29] Amies, Alex; Sluiman, Harm; Tong, Qiang Guo; Liu, Guo Ning (July 2012). Infrastructure as a Service Cloud Concepts". Developing and Hosting Applications on the Cloud. IBM Press. ISBN 978-0-13-306684-5.
    [30] D. Borthakur, K. Muthukkaruppan, K. Ranganathan, S. Rash, J.-S. Sarma, N. Spiegelberg, D. Molkov, R. Schmidt, J. Gray, H. Kuang, A. Menon, A. Aiyer,Apache Hadoop Goes Realtime at Facebook," In SIGMOD 11, June 12V16, 2011, Athens, Greece.
    [31] F. Chang, J. Dean, S. Ghemawat, W.-C. Hsieh Bigtable: A Distributed Storage System for Structured Data," In TOCS 2008, 2008, 26.2: 4.
    [32] Q. Chen, D. Zhang, M. Guo, Q. Deng and S. Guo, SAMR: A Self-Adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment," . Computer and Information
    Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010. p. 2736-2743.
    [33] J. Dean, and S. Ghemawat MapReduce: Simpli¯ed Data Processing on Large Clusters," In OSDI '04, pp 137{150, Dec 2004.
    [34] "Cloud Computing in Telecommunications". Ericsson. Retrieved 16 December 2012.
    [35] S. Ghemawat, H. Gobio®, and S.-T. Leung The Google File System," In Proc. SOSP 2003, pages 29V43, 2003.
    [36] B. He, W. Fang, Q. Luo, N. Govindaraju, and T. Wang Mars: A MapReduce Framework on Graphics Processors," In ACM, 2008., pages 260V269, 2008.
    [37] "ITU-T NEWSLOG - CLOUD COMPUTING AND STANDARDIZATION: TECHNICAL REPORTS PUBLISHED." International Telecommunication Union (ITU), Retrieved 16 December 2012.
    [38] "ITU Focus Group on Cloud Computing - Part 1." International Telecommunication Union (ITU) TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU. Retrieved
    16 December 2012.
    [39] Metzler, Jim; Taylor, Steve. (2010-08-23) "Cloud computing: Reality vs. ¯ction," Network World.
    [40] G. Lee, B. G. Chun and R. H Katz,. Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud." Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud. Vol. 11. 2011.
    [41] National Institute of Standards and Technology The NIST De¯nition of Cloud Computing," September, 2011
    [42] C. Tian, H. Zhou, Y. He and L. Zha, A Dynamic Mapreduce Scheduler for Heterogeneous Workloads." Grid and Cooperative Computing, 2009. GCC'09. Eighth International
    Conference on. IEEE, 2009.
    [43] Chou, Timothy. "Introduction to Cloud Computing: Business & Technology."
    [44] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica Improving MapReduce Performance in Heterogeneous Environments," . In Proc. OSDI, pages 29V42, San Diego,
    CA, December 2008.

    下載圖示 校內:2016-08-26公開
    校外:2016-08-26公開
    QR CODE