成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱垂銘 Chiu, Chui-Ming
論文名稱：	一個適用於異質性MapReduce叢集之工作排班法則 A Task Scheduling Policy for Heterogeneous MapReduce Cluster
指導教授：	謝錫堃 Shieh, Ce-Kuen
共同指導:	黃祖基 Huang, Tzu-Chi
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering
論文出版年：	2012
畢業學年度：	100
語文別：	英文
論文頁數：	32
中文關鍵詞：	分散式處理、異質性環境、巨量運算、MapReduce
外文關鍵詞：	Distributed Computation, Heterogeneous Environment, Big Data Processing, MapReduce
相關次數：	點閱：204 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來由於網際網路的普及以及社群網路服務的盛行，為我們帶來了更多元的應用與服務，同時也讓資訊呈現爆炸性的成長。要在眾多的應用與服務中突顯出與競爭者的差異，藉由更進一步的資料分析、並針對各種不同的需求產出最佳化的結果已經成為目前最重要的議題。然而，當需要分析的資料擴張到TB甚至於PB等級時，傳統的計算模型已經無法負荷這種數量級的資料，此時我們稱這些資料為巨量資料(Big Data)。而由Google的員工所提出的MapReduce運算模型正是用來處理如此巨量資料的第一首選。
巨量運算發展至今已經將近八年的時間，採用MapReduce加入生產行列的單位也不佔少數。然而，由於MapReduce提出時主要是針對同質性平台做為設計的依據，當原始的叢集運算速度不足而需要添購新設備時，MapReduce往往無法讓異質性的設備盡最大的能力，造成效能不彰且浪費效能的情形。
有鑑於此，為了最大限度的利用所有不同的運算資源並將產能最大化，我們提出了一個適用於異質性MapReduce叢集之工作排班法則。藉由對MapReduce的分析，並利用了該運算模型的特性進行處理與改進，可以有效的改進MapReduce程式於異質性平台執行效能不彰的情形。

In recent years, Internet has been widely adopted and Social network services are getting popular and bringing various applications and services to our daily life. For those application developers, taking the advantages of analyzing those data explosion will be the only way to make their application to be unique.
However, when the data growing to Terabytes or Petabytes scale, it is very hard for legacy programming models to handle with such a big amount of data. People named these big amount of data as Big Data. MapReduce, which was proposed by Googlers in 2004[1], is the very programming model designed to process such Big Data.
Big Data processing has been proposed and has been researched for nearly eight years. Many organizations has adopt MapReduce for their data processing platform. However, the original version of MapReduce was designed for dedicate, homogeneous environments. When it comes to combine different computing nodes to getting higher performance, MapReduce act poorly and cannot make the best use of all computing nodes’ computation capacities.
In order to take the most advantage of different types of the computation resources. We proposed a task scheduling policy for heterogeneous MapReduce clusters. Improving the programming model by analysis the processing flow and getting support by its unique characteristic. With our scheduling policy, the unexpected performance degradation will no longer exist.

摘要	I
Abstract	II
Acknowledgements	IV
Table of Contents	V
Tables	VI
Figures	VII
Chapter 1: Introduction	1
Chapter 2: Background and Related Works	4
2.1. Background	4
2.1.1. MapReduce	4
2.1.2. Hadoop	6
2.2. Related Works	7
2.2.1. Enhanced Straggler Detection	7
2.2.2. Enhanced Task Dispatcher	10
Chapter 3: System Design	11
3.1. System Overview	11
3.2. Design Consideration	14
3.2.1. Assumptions and Shared Parameters	14
3.2.2. Co-Operating Phase	16
3.2.3. Phase Switcher	16
3.2.4. No-Delay Phase	18
3.3. System Architecture	21
Chapter 4: Simulation Result	22
4.1. Improvements by Using Our Scheduling Policy	23
Chapter 5: Conclusion and Future Works	30
Reference	31

                                    

[1]. J. Dean and S. Ghemawat, “MapReduce: Simplied Data Processing on Large Clusters,” presented at the OSDI’04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004., 2004.
[2]. J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
[3]. M. Y. Pingle, V. Kohli, S. Kamat, and N. Poladia, “Big Data Processing using Apache Hadoop in Cloud System.”
[4]. “PoweredBy - Hadoop Wiki.” [Online]. Available: http://wiki.apache.org/hadoop/PoweredBy. [Accessed: 03-Aug-2012].
[5]. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments,” in OSDI’08: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, Berkeley, CA, USA, 2008, pp. 29–42.
[6]. Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo, “SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment,” presented at the CIT’10: IEEE 10th International Conference on Computer and Information Technology, 2010, pp. 2736 –2743.
[7]. H.-H. You, C.-C. Yang, and J.-L. Huang, “A load-aware scheduler for MapReduce framework in heterogeneous cloud environments,” in Proceedings of the 2011 ACM Symposium on Applied Computing, New York, NY, USA, 2011, pp. 127–132.
[8]. “Welcome to ApacheTM HadoopTM!” [Online]. Available: http://hadoop.apache.org/. [Accessed: 03-Aug-2012].
[9]. Wikipedia contributors, “Apache Hadoop,” Wikipedia, the free encyclopedia. Wikimedia Foundation, Inc., 03-Aug-2012.
[10]. L. Lei, T. Wo, and C. Hu, “CREST: Towards Fast Speculation of Straggler Tasks in MapReduce,” in 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE), 2011, pp. 311 –316.
[11]. J. Dean and S. Ghemawat, “MapReduce: A flexible Data Processing Tool,” Commun. ACM, vol. 53, no. 1, pp. 72–77, Jan. 2010.
[12]. Wikipedia contributors, “MapReduce,” Wikipedia, the free encyclopedia. Wikimedia Foundation, Inc., 01-Aug-2012.

校外：不公開電子論文及紙本論文均尚未授權公開

簡易檢索 / 詳目顯示

相關論文