簡易檢索 / 詳目顯示

研究生: 黃昱棠
Huang, Yu-Tang
論文名稱: 多階段資料密集運算上跨框架快取系統之研究
A Memcached-Based Inter-Framework Caching System for Multi-Layer Data-Intensive Computing
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導教授: 張志標
Chang, Jyh-Biau
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 43
中文關鍵詞: MapReduceHadoopYARN海量資料快取機制
外文關鍵詞: MapReduce, Hadoop, YARN, big data, cache mechanism
相關次數: 點閱:137下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資訊爆炸的時代,傳統的運算平台已經無法負荷太過龐大的資料量。MapReduce 是一個Google提出的處理大量運算的平行分散式處理框架,Hadoop實現了MapReduce運算框架以及設計分散式檔案系統HDFS建置運算叢集及設計程式來處理大量資料。
    現今許多研究單位或企業皆建立各自的Hadoop叢集當作處理海量資料的主要平台。在另一方面基於處理海量資料的需求不同,不同的平台一一被提出,例如:Storm用於處理連續不斷的串流資料、Spark用於處理互動式查詢等,然而在資料密集的運算中會產生頻繁的硬碟存取,造成效能瓶頸,如何將相同平台或者是不同平台間的資料快速存取、傳輸成為一個重要的議題。
    在這篇論文中,我們提出了一個改良Hadoop 2.0架構的系統稱為”Inter-Framework Caching” ,此論文目的為針對在跨平台運算間提供一個分散式快取儲存機制,使得在跨平台間的資料分享可減少經由IO存取的次數,進而提升整體系統的運算效能。

    In the age of information explosion, the conventional computing platforms cannot deal with the huge amount of data. MapReduce is a parallel distributed framework that is proposed by google. It is used for processing data-intensive computing. Hadoop implemented the MapReduce framework and Hadoop Distributed File System cluster to process large amounts of data.
    Nowadays, a lot of research organizations and enterprises each build their own Hadoop platform to process large-scale data. Various frameworks have been proposed according to different requirements. For example, Storm is used to deal with streaming data, Spark is used to interactive query. Therefore, fast data access and transport of same or different frameworks have become an important topic.
    In this thesis, we propose a system that improves the Hadoop 2.0 framework called ” Inter-Framework Caching ” .The purpose of this thesis is that we provide an inter-framework distributed cache storage system to speed up data access and transport , it can reduce the disk access frequency and improve the performance.

    CHAPTER 1. Introduction 1 CHAPTER 2. Backgrounds and Related Works 3 2.1 Backgrounds 3 2.1.1 MapReduce 3 2.1.2 Storm 5 2.1.3 YARN 7 2.2 Related Works 9 2.2.1 Intra-Framework Temporary mechanism 9 2.2.2 Single Inter-Framework Temporary mechanism 11 CHAPTER 3. System Design 13 3.1 System Overview 13 3.2 Design Issues 14 3.2.1 Cache System Structure Issue 14 3.2.2 Storage System Issue 15 3.2.3 System Scalability Issue 16 3.2.4 Data Access Issue 18 CHAPTER 4. Implementation 20 4.1 Distributed In-Memory Cache System 20 4.1.1 Cache System 20 4.1.2 Key Manager 23 4.2 Data Access Mechanism 24 4.2.1 Implicit Approach 24 4.2.2 Explicit Approach 27 CHAPTER 5. Performance Evaluation 29 5.1 Experimental Environment & Setup Environment 29 5.1.1 Experiment Environment 29 5.1.2 Applications 31 5.1.3 Data Sets 32 5.2 Performance 33 5.2.1 Combination Mechanism Comparison 35 5.2.2 Combination Mechanism Tuning 37 5.2.3 Comparison of Inter-Framework Caching and HDFS on RAMDisk 38 CHAPTER 6. Conclusion and Future Work 40 Reference 41

    [1] Wikipedia. Big data. Available from: http://en.wikipedia.org/w/index.php?title=Big_data&oldid=561937621.
    [2] Dean, J. and S. Ghemawat, “MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008. 51(1): p. 107-113.
    [3] Hadoop. Available from: http://hadoop.apache.org/.
    [4] Shvachko, K., et al. “The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. 2010. IEEE.
    [5] Facebook. Available from: www.facebook.com.
    [6] Yahoo!. Available from: http://www.yahoo.com/.
    [7] Amazon. Available from: http://www.amazon.com/.
    [8] Flickr. Available from: https://www.flickr.com/.
    [9] Wang, HaiLong, Jie Hou, and ZhengHu Gong. "Botnet detection architecture based on heterogeneous multi-sensor information fusion." Journal of Networks 6.12 (2011): 1655-1661.
    [10] V.Shandilya, F.Polash, S.Shiva “A MULTI-LAYER ARCHITECTURE FOR SPAM-DETECTION SYSTEM” University of Memphis, 2014
    [11] Storm. Available from: https://storm.incubator.apache.org/
    [12] Wikipedia. Real-time. Available from: http://en.wikipedia.org/wiki/Real-time_computing
    [13] Zookeeper. Available from: http://zookeeper.apache.org/
    [14] Vavilapalli, Vinod Kumar, et al. "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013. p. 5.
    [15] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proc. HotCloud ’10, 2010
    [16] Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.
    [17] ZHANG, Shubin, et al. "Accelerating MapReduce with distributed memory cache." In: Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on. IEEE, 2009. p. 472-478.
    [18] Bu, Yingyi, et al. "HaLoop: Efficient iterative data processing on large clusters." Proceedings of the VLDB Endowment 3.1-2 (2010): 285-296.
    [19] Page, L., et al., The PageRank citation ranking: bringing order to the web. 1999
    [20] Hartigan, J.A. and M.A. Wong, "Algorithm AS 136: A k-means clustering algorithm". Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979. 28(1): p.100-108.
    [21] Wikipedia. NoSQL key-value stores. Available from: http://en.wikipedia.org/wiki/NoSQL#Key.E2.80.93Value_or_KV_stores
    [22] Wikipedia. Relational database management system. Available from: http://en.wikipedia.org/wiki/Relational_database_management_system
    [23] Is the Relational Database Doomed? Available from: http://readwrite.com/2009/02/12/is-the-relational-database-doomed#awesm=~oImEmGaOs1vRDI
    [24] Memcached. Available from: http://memcached.org/
    [25] Spymemcached. Available from: https://code.google.com/p/spymemcached/
    [26] InputFormat. Available from:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputFormat.html
    [27] OutputFormat. Available from:http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapred/OutputFormat.html
    [28] White, Tom. Hadoop: The definitive guide. " O'Reilly Media, Inc.", 2012.
    [29] Leibiusky, Jonathan, Gabriel Eisbruch, and Dario Simonassi. "Getting started with storm. " O'Reilly Media, Inc., 2012.
    [30] 徐瑞興, 一個可將 MapReduce 程式透通地執行在多個 Hadoop 平台之方法. 成功大學電腦與通信工程研究所學位論文, 2012(2012 年).
    [31] 黃冠傑, 聯邦Hadoop疊代運算機制之研究. 成功大學電腦與通信工程研究所學位論文, 2013(2013 年).

    無法下載圖示 校內:2019-08-29公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE