簡易檢索 / 詳目顯示

研究生: 呂學儒
Lyu, Syue-Ru
論文名稱: 一個用於MapReduce雲計算之簡易叢集規模調整策略
A Simple Cluster-Scaling Policy for MapReduce Clouds
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導教授: 黃祖基
Huang, Tzu-Chi
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 36
中文關鍵詞: MapReduce策略叢集規模調整
外文關鍵詞: MapReduce, policy, cluster-scaling
相關次數: 點閱:107下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於雲端計算的興起,帶起了非常多的服務之發展。Google提出了MapReduce這個用以處理大量資料之架構。而在YAHOO發表了他們的開源碼MapReduce之實作-Hadoop之後,很多公司、企業紛紛開始採用這種系統,並建立屬於自己的叢集去處理他們龐大的資料。
    一個叢集內的運算資源常常是不會全部被使用的。因此,很多調整叢集規模的研究也被提出。這些研究提出了如何降低叢集的規模來達到省電的效果,還有研究如何加入更多運算節點以獲得更好的效能。但是,這些研究並無法同時兼顧省電與效能。
    因此,為了同時兼顧效能與省電這兩種優點,我們提出了一個簡單的策略。透過分析MapReduce之特性,並且利用這這些特性發展出我們的叢集規模調整策略。此策略可以有效地找出一個叢集能夠將多少節點移除而不會影響到工作的運作時間。我們在多種情況下測試我們的策略能夠順利運作,並且兼顧了效能以及省電的目的。

    Due to the rise of cloud computing, many cloud services have been developing. Google proposed a programming model that is MapReduce for processing large amounts of data. After YAHOO! proposed Hadoop, the implementation of open source MapReduce, many companies, and enterprises have started using this programming model, and establish their own cluster to handle their large amounts of information. Many application
    Computing resources within a cluster are often not all be used. Therefore, a lot of studies of cluster-scaling are also presented. These studies proposed to reduce the size of the cluster to achieve power saving, and how to add more computing nodes in order to obtain better performance. However, these studies do not take the power-saving and performance into consideration.
    Therefore, taking the advantages of performance and energy saving into account, we propose a simple policy. We analyzed the features of MapReduce and used these features to develop our policy. This policy can effectively identify how many computing nodes can be removed from a cluster without affecting the execution time. We test our policy in many cases to prove it is well-performed in different configurations, and taking into account the purpose of performance and power saving.

    Outline Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Subset 3 2.2 All-In 5 Chapter 3 Proposed policy 6 3.1 System Overview 6 3.2 System analysis 8 Chapter 4 Emulation Verification 15 4.1 Emulator 15 4.2 Experiment Setup 17 4.3 Benchmarks 19 4.4 Different waves and various mappers 20 4.5 Different computational size 25 4.6 Evaluation on Hadoop 28 Chapter 5 Discussion 31 5.1 Block size 31 5.2 Execution time of reduce phase 31 5.3 Data replications 31 5.4 Straggler 32 Chapter 6 Conclusion and Future Work 33 Reference 34

    [1]. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, vol. 51, pp. 107-113, 2008.
    [2]. Yahoo! . Available: http://www.yahoo.com
    [3]. Facebook. Available: http://www.facebook.com/
    [4]. Dropbox. Available: https://www.dropbox.com/
    [5]. PowerBy – Hadoop Wiki. Available: http://wiki.apache.org/hadoop/PoweredBy
    [6]. Google. Available: http://www.google.com
    [7]. Hadoop. Available: http://hadoop.apache.org
    [8]. HDFS File System Shell Guide – get. Available: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#get
    [9]. HDFS File System Shell Guide – put. Available: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#put
    [10]. Gmail. Available: http://www.gmail.com
    [11]. Koomey, J. Gowth in Data center electricity use 2005 to 2010. Analytics Press, Oakland CA (2011).
    [12]. J. Leverich and C. Kozyrakis. On the Energy (In)efficiency of Hadoop Clusters. ACM SIGOPS Operating Systems Review, Volume 44 Issue 1, 2010,
    [13]. HiNet. Available: http://www.hinet.net/
    [14]. hicloud. Available: http://hicloud.hinet.net/
    [15]. W. Lang and J. M. Patel. Energy Management for MapReduce Clusters. VLDB, 2010
    [16]. Rini T. Kaushik and Milind Bhandarkar. GreenHDFS:towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In Proc. of HotPower'10, pp.1-9, 2010.
    [17]. Nitesh Maheshwari, Radheshyam Nanduri, and Vasudeva Varma. Dynamic energy efficient data placement and cluster reconfiguration algorithm for mapreduce framework. Future Generation Computer Systems, 28(1):119 – 127, 2012.
    [18]. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proc. OSDI, pages 29–42, San Diego, CA, December 2008.
    [19]. K. Shvachko, H. Huang, S. Radia, and R. Chansler. The HadoopDistributed File System. In Proceedings of the 26th IEEE Transactions on Computing Symposium on Mass Storage Systems andTechnologies (MSST ’10), Lake Tahoe NV, May 2010.
    [20]. Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. Map-Reduce for Data Intensive Scientific Analyses Proceedings of the IEEE International Conference on e-Science. Indianapolis. 2008. December 7-12 2008
    [21]. C. Chu, S. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Ng, and K. Olukotun. “Map-reduce for machine learning on multicore.” in Proceedings of Neural Information Processing Systems Conference (NIPS), pp. 281-288, 2006.
    [22]. J.H.C. Yeung et al. “Map-reduce as a programming model for custom computing machines,” in 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM'08), pp. 149-159, April 2008.
    [23]. J. Urbani, S. Kotoulas, E. Oren, and F. van Harmelen. “Scalable distributed reasoning using mapreduce,” in LNCS, vol. 5823, pp. 634---649. Springer, Heidelberg, 2009.
    [24]. T. Elsayed, J. Lin, and D. W. Oard. “Pairwise document similarity in large collections with mapreduce,” in Proc. of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT'08), pp. 265–268, 2008.

    下載圖示 校內:2017-08-31公開
    校外:2017-08-31公開
    QR CODE