| 研究生: |
孫苙達 Sun, Li-Da |
|---|---|
| 論文名稱: |
一個以拓樸為單位之Storm延展機制 A Topology-based Scaling Mechanism for Storm |
| 指導教授: |
謝錫堃
Shieh, Ce-Kuen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 英文 |
| 論文頁數: | 41 |
| 中文關鍵詞: | 資料串流處理 、Storm 、擴展性 |
| 外文關鍵詞: | Stream Data Processing, Storm, Scalability |
| 相關次數: | 點閱:104 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著現今越來越多知名的企業開始重視大數據即時的應用,像是twitter,yahoo以及阿里巴巴,如何建構一個可以應付大數據即時性的平台已是重要的課題,其中以Storm最具代表性,Storm 是一個open-source 的即時運算系統,具有分散性 (distributed) 與容錯性 (fault-tolerance),在Storm上面執行的運算叫做一個由邊 (edge) 與節點 (component) 構成的Topology,Storm擴展機制rebalance可以讓既有的程式達到擴展,但不能自動化,擴展資源的使用限制以及擴展中需要使原有的topology暫停已成為rebalance的缺點,因此針對rebalance的不足,我們提出新的擴展機制,從topology監控開始,當topology處理資料流的速度變差時,我們將資料流做拆解給其他新的topology做處理,減少遠有topology的負荷,透過監控topology處理效能有效控制資料流量,我們希望開發出來的擴展機制能夠改善現今Storm做rebalance的缺點,以彌補現今Storm的不足。
As more and more well-known companies starting to focus on real-time big data applications such as twitter, yahoo and Alibaba, how to build a platform for processing real-time data becomes an important issue. Among all the real-time processing systems, Storm is the most well-known and representative open-source distributed real-time computation system. In Storm, the computation is implemented as a topology such as a graph where nodes are operators and edges represent the data flows between operators. In Big Data processing and analysis systems, scalability is an important issue in order to process the large-scale data. Storm provides rebalance mechanism for its scalability property. It adjusts the parallelism of a running topology. However, there are some drawbacks in rebalance command, such as resource restriction, topology execution suspension and need to be executed manually. In this paper, we propose a topology-based scaling mechanism for Storm. When a topology is overloaded, it scales by adjusting the number of the cloned topologies. When scaling by topology-based mechanism, it eliminates resource restriction and execution suspension in the topology and the procedure is automatically launched. We hope that our topology-based scaling mechanism can cover the disadvantages of scalability about Storm.
1.Susan Gunelius, The Data Explosion in 2014 Minute by Minute. http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/
2.Apache Hadoop, http://hadoop.apache.org/
3.J. Dean and S. Ghemawat, MapReduce: simplified data processing onlarge clusters, Proceedings of USENIX OSDI’2004.
4.Apache Storm, https://storm.apache.org
5.P. Taylor Goetz,Brian O'Neill, Storm Blueprints: Patterns for Distributed Real-time Computation, 2014
6.Apache ZooKeeper, http://zookeeper.apache.org/
7.Owen O’Malley, Kan Zhang, Sanjay Radia, Ram Marti, and Christopher Harrell Yahoo!, Hadoop Security Design. 2009
8.Jayati Tiwari, Extracting Storm Web UI Parameter values. 2013
9.Guaranteeing message processing (Storm), https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
10.Jay Kreps,Neha Narkhede and Jun Rao, Kafka: a Distributed Messaging System for Log Processing. 2011
11.L. Aniello, R. Baldoni and L. Querzoni, Adaptive online scheduling inStorm.Proceedings of ACM DEBS’2013.
12.Jielong Xu, Zhenhua Chen, Jian Tang and Sen Su, T-Storm: Traffic-aware Online Scheduling in Storm. IEEE 34th International Conference on Distributed Computing Systems. 2014
13.Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel*,1, Karthik Ramasamy, Siddarth Taneja, Twitter Heron: Stream Processing at Scale. 2015 ACM SIGMOD International Conference on Management of Data. 2015
14.Ivan Bedini, Sherif Sakr, Bart Theeten, Alessandra Sala and Peter Cogan,Modeling performance of a parallel streaming engine: bridging theory and costs.4th ACM/SPEC International Conference on Performance EngineeringPages 173-184, 2013
15.Apache Software Foundation, Thrift, http://thrift.apache.org/
16.Storm 0.9.3 on Apache Software Foundation,http://apache.stu.edu.tw/storm/apache-storm-0.9.3/apache-storm-0.9.3.tar.gz
17. Kafka-2.10-0.8.2-beta on Apache Software Foundation, http://ftp.twaren.net/Unix/Web/apache/kafka/0.8.2-beta/kafka_2.10-0.8.2-beta.tgz