成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	徐瑞興 Hsu, Jui-Hsing
論文名稱：	一個可將MapReduce程式透通地執行在多個Hadoop平台之方法 A Transparent Approach to Run MapReduce Programs on Collaborative Hadoops
指導教授：	謝錫堃 Shieh, Ce-Kuen
共同指導教授:	張志標 Chang, Jyh-Biau
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering
論文出版年：	2012
畢業學年度：	100
語文別：	英文
論文頁數：	41
中文關鍵詞：	透通、分散式運算
外文關鍵詞：	MapReduce, Hadoop, transparent, distributed computing
相關次數：	點閱：108 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

MapReduce是一種分散式大量運算的處理框架，隨著大量資料之分散式運算的興起，已經有許多機構建立屬於自己的資料及運算中心來處理及分析資料。Hadoop是MapReduce的開源軟體，現今已有許多機構利用Hadoop創建自己的運算及資料中心並開發相關應用程式例如建立網頁索引、資料探勘等。
在某些情況下，將各機構的Hadoop資源聯合起來可獲許多好處。舉例像是藉由聯合各機構的運算資源我們可以縮短整體執行時間。關於聯合各機構Hadoop叢集的現有作法可能會導致許多問題。舉例來說，使用者必須要重新設計一個專門使用於多個Hadoop環境的MapReduce程式。或是當運算需求改變時，使用者必須要重新設定整體系統環境。兩者都對用戶造成使用上的不便且破壞了MapReduce的簡潔性。
我們提出了一個不用提供額外程式就能使原有程式在多個Hadoop環境上執行的作法。在我們的系統中，使用者可以在上層Hadoop執行原MapReduce程式，而我們的系統可以自動聯合下層各Hadoop包括工作的派送、資料的傳遞且不必修改該MapReduce程式。實驗結果顯示我們在WordCount 5G的case中可獲23%的效能改善。

MapReduce is a programming model for data-intensive applications while providing the simplicity of parallel programming. With the rapid growth of data-intensive applications in distributed computing, many organizations have built clusters with computing resources to store or to analyze data. Hadoop is an open-source implementation of MapReduce which have been widely used for many applications such as web indexing, data mining, etc.
In some cases, it is favorable to aggregate several Hadoop clusters’ resources. For example, we could minimize the job execution time with more computing resources by integrating computing nodes outside the local cluster together. However, existing solutions to aggregate Hadoop clusters have several problems. For example, users need to redesign the program for the collaborative use for each application. Or users need to reset the enivonments while compuation demand changes. Both of which causes inconvenience for users and thus breaks the property of simplicity in MapReduce.
We propose a transparent way which could make collaborative Hadoop clusters work together without redesigning programs for each application. In our system, users could execute jobs in cloud portal as the single Hadoop cluster does, and our system runtime will automatically handle the rest work including job dispatching, data transferring, program modification and program running. The experimental results also shows that our system could provide 23% performance gain in WordCount 5G case.

Chapter 1: Introduction	1
Chapter 2: Background & Related Work	4
1 MapReduce Programming Model	4
2 Related Works	7
2.1 CloudBLAST	7
2.2 Hierarchical MapReduce Framework	9
Chapter 3: Design	11
1 System Overview	11
2 Hierarchical Framework	14
3 Proxy Reducer	15
4 Proxy Mapper	16
5 Stage In	17
6 Stage Out	17
Chapter 4: Implementation	18
1 MapReduce Implementation – Hadoop	18
2 System Components	19
3 Proxy Selector	20
4 Global Job Dispatcher	21
5 Global Aggregator	22
6 Long Line Effect	24
Chapter 5: Performance Evaluation	25
1 Experimental Setup	25
2 Performance Comparison	28
3 Performance Breakdown and Compression Effect	30
4 Two methods of Stage Out	34
5 Long Line Effect	35
Chapter 6: Discussion	36
1 Effect of Combiner	36
2 Single Sign-On	37
Chapter 7: Conclusion & Future Work	38
Reference	39
                                    

[1] J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Communications of the ACM, vol. 51, pp. 107-113, 2008.
[2] Yahoo! . Available: http://www.yahoo.com
[3] Facebook. Available: http://www.facebook.com/
[4] RackSpace. Available: http://www.rackspace.com
[5] PowerBy – Hadoop Wiki. Available: http://wiki.apache.org/hadoop/PoweredBy
[6] Amazon EC2. Available: http://aws.amazon.com/ec2/
[7] GoGrid Available: http://www.gogrid.com/
[8] A. Matsunaga, et al., "Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications," 2008, pp. 222-229.
[9] M. Tsugawa and J. A. B. Fortes, "A virtual network (ViNe) architecture for grid computing," 2006, p. 10 pp.
[10] Y. Luo, et al., "A hierarchical framework for cross-domain MapReduce execution," 2011, pp. 15-22.
[11] K. Cardona, et al., "A grid based system for data mining using MapReduce," Technical Report TR-2007-02, AMALTHEA2007.
[12] C. T. Chu, et al., "Map-reduce for machine learning on multicore," Advances in neural information processing systems, vol. 19, p. 281, 2007.
[13] S. W. Jer´ ome Franc¸ois, Walter Bronzi, Radu State, Thomas Engel, "BotCloud: Detecting Botnets Using MapReduce," presented at the IEEE International Workshop on Information Forensics and Security, 2011.
[14] Google. Available: http://www.google.com
[15] W. Gropp, et al., "A high-performance, portable implementation of the MPI message passing interface standard," Parallel computing, vol. 22, pp. 789-828, 1996.
[16] S. Ghemawat, et al., "The Google file system," 2003, pp. 29-43.
[17] Nimbus (Virtual Workspace). Available: http://www.nimbusproject.org/
[18] Hadoop. Available: http://hadoop.apache.org
[19] HDFS File System Shell Guide – get. Available: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#get
[20] HDFS File System Shell Guide – put. Available: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#put
[21] DistCp. Available: http://hadoop.apache.org/common/docs/current/distcp.html
[22] M. Nambiar, et al., "WANem: The Wide Area Network Emulator," ed.
[23] WordCount. Available: http://wiki.apache.org/hadoop/WordCount
[24] BlockSearch. Available: http://github.com/apache/hadoop-mapreduce/blob/trunk/src/contrib/block_forensics/src/java/org/apache/hadoop/blockforensics/BlockSearch.java
[25] gzip. Available: http://www.gzip.org/
[26] To Use Or Not To Use A Combiner. Available: http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
[27] RSA (algorithm). Available: http://en.wikipedia.org/wiki/RSA_(algorithm)
[28] ssh – Linux command. Available: http://linux.about.com/od/commands/l/blcmdl1_ssh.htm
[29] H. Y. Huang, et al., "Identity Federation Broker for Service Cloud," 2010, pp. 115-120.
[30] Hbase. Available: http://hbase.apache.org/

校內：2015-02-15公開
校外：2015-02-15公開

簡易檢索 / 詳目顯示

相關論文