成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃昱棠 Huang, Yu-Tang
論文名稱：	多階段資料密集運算上跨框架快取系統之研究 A Memcached-Based Inter-Framework Caching System for Multi-Layer Data-Intensive Computing
指導教授：	謝錫堃 Shieh, Ce-Kuen
共同指導:	張志標 Chang, Jyh-Biau
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering
論文出版年：	2014
畢業學年度：	102
語文別：	英文
論文頁數：	43
中文關鍵詞：	MapReduce 、Hadoop 、YARN 、海量資料、快取機制
外文關鍵詞：	MapReduce, Hadoop, YARN, big data, cache mechanism
相關次數：	點閱：248 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在資訊爆炸的時代，傳統的運算平台已經無法負荷太過龐大的資料量。MapReduce 是一個Google提出的處理大量運算的平行分散式處理框架，Hadoop實現了MapReduce運算框架以及設計分散式檔案系統HDFS建置運算叢集及設計程式來處理大量資料。
現今許多研究單位或企業皆建立各自的Hadoop叢集當作處理海量資料的主要平台。在另一方面基於處理海量資料的需求不同，不同的平台一一被提出，例如:Storm用於處理連續不斷的串流資料、Spark用於處理互動式查詢等，然而在資料密集的運算中會產生頻繁的硬碟存取，造成效能瓶頸，如何將相同平台或者是不同平台間的資料快速存取、傳輸成為一個重要的議題。
在這篇論文中，我們提出了一個改良Hadoop 2.0架構的系統稱為”Inter-Framework Caching” ，此論文目的為針對在跨平台運算間提供一個分散式快取儲存機制，使得在跨平台間的資料分享可減少經由IO存取的次數，進而提升整體系統的運算效能。

In the age of information explosion, the conventional computing platforms cannot deal with the huge amount of data. MapReduce is a parallel distributed framework that is proposed by google. It is used for processing data-intensive computing. Hadoop implemented the MapReduce framework and Hadoop Distributed File System cluster to process large amounts of data.
Nowadays, a lot of research organizations and enterprises each build their own Hadoop platform to process large-scale data. Various frameworks have been proposed according to different requirements. For example, Storm is used to deal with streaming data, Spark is used to interactive query. Therefore, fast data access and transport of same or different frameworks have become an important topic.
In this thesis, we propose a system that improves the Hadoop 2.0 framework called ” Inter-Framework Caching ” .The purpose of this thesis is that we provide an inter-framework distributed cache storage system to speed up data access and transport , it can reduce the disk access frequency and improve the performance.

CHAPTER 1.	Introduction	1
CHAPTER 2.	Backgrounds and Related Works	3
1	Backgrounds	3
1.1	MapReduce	3
1.2	Storm	5
1.3	YARN	7
2	Related Works	9
2.1	Intra-Framework Temporary mechanism	9
2.2	Single Inter-Framework Temporary mechanism	11
CHAPTER 3.	System Design	13
1	System Overview	13
2	Design Issues	14
2.1	Cache System Structure Issue	14
2.2	Storage System Issue	15
2.3	System Scalability Issue	16
2.4	Data Access Issue	18
CHAPTER 4.	Implementation	20
1	Distributed In-Memory Cache System	20
1.1	Cache System	20
1.2	Key Manager	23
2	Data Access Mechanism	24
2.1	Implicit Approach	24
2.2	Explicit Approach	27
CHAPTER 5.	Performance Evaluation	29
1	Experimental Environment & Setup Environment	29
1.1	Experiment Environment	29
1.2	Applications	31
1.3	Data Sets	32
2	Performance	33
2.1	Combination Mechanism Comparison	35
2.2	Combination Mechanism Tuning	37
2.3	Comparison of Inter-Framework Caching and HDFS on RAMDisk	38
CHAPTER 6.	Conclusion and Future Work	40
Reference	41

                                    

[1] Wikipedia. Big data. Available from: http://en.wikipedia.org/w/index.php?title=Big_data&oldid=561937621.
[2] Dean, J. and S. Ghemawat, “MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008. 51(1): p. 107-113.
[3] Hadoop. Available from: http://hadoop.apache.org/.
[4] Shvachko, K., et al. “The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. 2010. IEEE.
[5] Facebook. Available from: www.facebook.com.
[6] Yahoo!. Available from: http://www.yahoo.com/.
[7] Amazon. Available from: http://www.amazon.com/.
[8] Flickr. Available from: https://www.flickr.com/.
[9] Wang, HaiLong, Jie Hou, and ZhengHu Gong. "Botnet detection architecture based on heterogeneous multi-sensor information fusion." Journal of Networks 6.12 (2011): 1655-1661.
[10] V.Shandilya, F.Polash, S.Shiva “A MULTI-LAYER ARCHITECTURE FOR SPAM-DETECTION SYSTEM” University of Memphis, 2014
[11] Storm. Available from: https://storm.incubator.apache.org/
[12] Wikipedia. Real-time. Available from: http://en.wikipedia.org/wiki/Real-time_computing
[13] Zookeeper. Available from: http://zookeeper.apache.org/
[14] Vavilapalli, Vinod Kumar, et al. "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013. p. 5.
[15] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proc. HotCloud ’10, 2010
[16] Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.
[17] ZHANG, Shubin, et al. "Accelerating MapReduce with distributed memory cache." In: Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on. IEEE, 2009. p. 472-478.
[18] Bu, Yingyi, et al. "HaLoop: Efficient iterative data processing on large clusters." Proceedings of the VLDB Endowment 3.1-2 (2010): 285-296.
[19] Page, L., et al., The PageRank citation ranking: bringing order to the web. 1999
[20] Hartigan, J.A. and M.A. Wong, "Algorithm AS 136: A k-means clustering algorithm". Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979. 28(1): p.100-108.
[21] Wikipedia. NoSQL key-value stores. Available from: http://en.wikipedia.org/wiki/NoSQL#Key.E2.80.93Value_or_KV_stores
[22] Wikipedia. Relational database management system. Available from: http://en.wikipedia.org/wiki/Relational_database_management_system
[23] Is the Relational Database Doomed? Available from: http://readwrite.com/2009/02/12/is-the-relational-database-doomed#awesm=~oImEmGaOs1vRDI
[24] Memcached. Available from: http://memcached.org/
[25] Spymemcached. Available from: https://code.google.com/p/spymemcached/
[26] InputFormat. Available from:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputFormat.html
[27] OutputFormat. Available from:http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapred/OutputFormat.html
[28] White, Tom. Hadoop: The definitive guide. " O'Reilly Media, Inc.", 2012.
[29] Leibiusky, Jonathan, Gabriel Eisbruch, and Dario Simonassi. "Getting started with storm. " O'Reilly Media, Inc., 2012.
[30] 徐瑞興, 一個可將 MapReduce 程式透通地執行在多個 Hadoop 平台之方法. 成功大學電腦與通信工程研究所學位論文, 2012(2012 年).
[31] 黃冠傑, 聯邦Hadoop疊代運算機制之研究. 成功大學電腦與通信工程研究所學位論文, 2013(2013 年).

校外：不公開電子論文及紙本論文均尚未授權公開

簡易檢索 / 詳目顯示

相關論文