簡易檢索 / 詳目顯示

研究生: 徐瑞興
Hsu, Jui-Hsing
論文名稱: 一個可將MapReduce程式透通地執行在多個Hadoop平台之方法
A Transparent Approach to Run MapReduce Programs on Collaborative Hadoops
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導教授: 張志標
Chang, Jyh-Biau
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 41
中文關鍵詞: 透通分散式運算
外文關鍵詞: MapReduce, Hadoop, transparent, distributed computing
相關次數: 點閱:108下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • MapReduce是一種分散式大量運算的處理框架,隨著大量資料之分散式運算的興起,已經有許多機構建立屬於自己的資料及運算中心來處理及分析資料。Hadoop是MapReduce的開源軟體,現今已有許多機構利用Hadoop創建自己的運算及資料中心並開發相關應用程式例如建立網頁索引、資料探勘等。
    在某些情況下,將各機構的Hadoop資源聯合起來可獲許多好處。舉例像是藉由聯合各機構的運算資源我們可以縮短整體執行時間。關於聯合各機構Hadoop叢集的現有作法可能會導致許多問題。舉例來說,使用者必須要重新設計一個專門使用於多個Hadoop環境的MapReduce程式。或是當運算需求改變時,使用者必須要重新設定整體系統環境。兩者都對用戶造成使用上的不便且破壞了MapReduce的簡潔性。
    我們提出了一個不用提供額外程式就能使原有程式在多個Hadoop環境上執行的作法。在我們的系統中,使用者可以在上層Hadoop執行原MapReduce程式,而我們的系統可以自動聯合下層各Hadoop包括工作的派送、資料的傳遞且不必修改該MapReduce程式。實驗結果顯示我們在WordCount 5G的case中可獲23%的效能改善。

    MapReduce is a programming model for data-intensive applications while providing the simplicity of parallel programming. With the rapid growth of data-intensive applications in distributed computing, many organizations have built clusters with computing resources to store or to analyze data. Hadoop is an open-source implementation of MapReduce which have been widely used for many applications such as web indexing, data mining, etc.
    In some cases, it is favorable to aggregate several Hadoop clusters’ resources. For example, we could minimize the job execution time with more computing resources by integrating computing nodes outside the local cluster together. However, existing solutions to aggregate Hadoop clusters have several problems. For example, users need to redesign the program for the collaborative use for each application. Or users need to reset the enivonments while compuation demand changes. Both of which causes inconvenience for users and thus breaks the property of simplicity in MapReduce.
    We propose a transparent way which could make collaborative Hadoop clusters work together without redesigning programs for each application. In our system, users could execute jobs in cloud portal as the single Hadoop cluster does, and our system runtime will automatically handle the rest work including job dispatching, data transferring, program modification and program running. The experimental results also shows that our system could provide 23% performance gain in WordCount 5G case.

    Chapter 1: Introduction 1 Chapter 2: Background & Related Work 4 2.1 MapReduce Programming Model 4 2.2 Related Works 7 2.2.1 CloudBLAST 7 2.2.2 Hierarchical MapReduce Framework 9 Chapter 3: Design 11 3.1 System Overview 11 3.2 Hierarchical Framework 14 3.3 Proxy Reducer 15 3.4 Proxy Mapper 16 3.5 Stage In 17 3.6 Stage Out 17 Chapter 4: Implementation 18 4.1 MapReduce Implementation – Hadoop 18 4.2 System Components 19 4.3 Proxy Selector 20 4.4 Global Job Dispatcher 21 4.5 Global Aggregator 22 4.6 Long Line Effect 24 Chapter 5: Performance Evaluation 25 5.1 Experimental Setup 25 5.2 Performance Comparison 28 5.3 Performance Breakdown and Compression Effect 30 5.4 Two methods of Stage Out 34 5.5 Long Line Effect 35 Chapter 6: Discussion 36 6.1 Effect of Combiner 36 6.2 Single Sign-On 37 Chapter 7: Conclusion & Future Work 38 Reference 39

    [1] J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Communications of the ACM, vol. 51, pp. 107-113, 2008.
    [2] Yahoo! . Available: http://www.yahoo.com
    [3] Facebook. Available: http://www.facebook.com/
    [4] RackSpace. Available: http://www.rackspace.com
    [5] PowerBy – Hadoop Wiki. Available: http://wiki.apache.org/hadoop/PoweredBy
    [6] Amazon EC2. Available: http://aws.amazon.com/ec2/
    [7] GoGrid Available: http://www.gogrid.com/
    [8] A. Matsunaga, et al., "Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications," 2008, pp. 222-229.
    [9] M. Tsugawa and J. A. B. Fortes, "A virtual network (ViNe) architecture for grid computing," 2006, p. 10 pp.
    [10] Y. Luo, et al., "A hierarchical framework for cross-domain MapReduce execution," 2011, pp. 15-22.
    [11] K. Cardona, et al., "A grid based system for data mining using MapReduce," Technical Report TR-2007-02, AMALTHEA2007.
    [12] C. T. Chu, et al., "Map-reduce for machine learning on multicore," Advances in neural information processing systems, vol. 19, p. 281, 2007.
    [13] S. W. Jer´ ome Franc¸ois, Walter Bronzi, Radu State, Thomas Engel, "BotCloud: Detecting Botnets Using MapReduce," presented at the IEEE International Workshop on Information Forensics and Security, 2011.
    [14] Google. Available: http://www.google.com
    [15] W. Gropp, et al., "A high-performance, portable implementation of the MPI message passing interface standard," Parallel computing, vol. 22, pp. 789-828, 1996.
    [16] S. Ghemawat, et al., "The Google file system," 2003, pp. 29-43.
    [17] Nimbus (Virtual Workspace). Available: http://www.nimbusproject.org/
    [18] Hadoop. Available: http://hadoop.apache.org
    [19] HDFS File System Shell Guide – get. Available: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#get
    [20] HDFS File System Shell Guide – put. Available: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#put
    [21] DistCp. Available: http://hadoop.apache.org/common/docs/current/distcp.html
    [22] M. Nambiar, et al., "WANem: The Wide Area Network Emulator," ed.
    [23] WordCount. Available: http://wiki.apache.org/hadoop/WordCount
    [24] BlockSearch. Available: http://github.com/apache/hadoop-mapreduce/blob/trunk/src/contrib/block_forensics/src/java/org/apache/hadoop/blockforensics/BlockSearch.java
    [25] gzip. Available: http://www.gzip.org/
    [26] To Use Or Not To Use A Combiner. Available: http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
    [27] RSA (algorithm). Available: http://en.wikipedia.org/wiki/RSA_(algorithm)
    [28] ssh – Linux command. Available: http://linux.about.com/od/commands/l/blcmdl1_ssh.htm
    [29] H. Y. Huang, et al., "Identity Federation Broker for Service Cloud," 2010, pp. 115-120.
    [30] Hbase. Available: http://hbase.apache.org/

    下載圖示 校內:2015-02-15公開
    校外:2015-02-15公開
    QR CODE