| 研究生: |
曾學正 Tzeng, Shiue-Jheng |
|---|---|
| 論文名稱: |
一個適用於以拓樸為單位之Storm延展機制的資源調度策略 A Queue Length-Based Resource Management Policy for Topology- Based Scaling Mechanism on Storm |
| 指導教授: |
謝錫堃
Shieh, Ce-Kuen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 英文 |
| 論文頁數: | 40 |
| 中文關鍵詞: | 分散式運算 、資源調度 、佇列長度 |
| 外文關鍵詞: | Storm, Resource Management, Kafka, Real-time |
| 相關次數: | 點閱:74 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今,愈來愈多的應用需要用到即時系統。而 Apache Storm 是一個
相當有名且被許多大公司所廣泛使用的一套分散式的即時處理系統。而在
Storm 的延展性上有些缺點,有篇論文提出了一個新的延展機制來避免這
些缺點,但這個機制目前只有增加資源運算的功能,並且用來增加資源的
依據並不能完全算是具有代表性。除此之外,這個機制並沒有考量到根據
負載的不同,增加資源的程度也要不一樣。
因此,基於這個延展機制,我們提出了一個適用於以拓樸為單位之
Storm 延展機制的資源調度策略。這個策略可以依據系統目前的負載來考
量應該增加或是減少多少運算資源。在我們的模擬結果上,確實可以根據
資料進來以及系統處理的速度來調整資源量的大小。除此之外,根據設定
值的不同,我們的策略也可以迅速地讓系統達到穩定狀態。
在未來,我們會將這個策略實作在這個延展機制上,並且量測我們所
提出的策略在實際上的叢集運作效果如何,除此之外,修改我們的策略,
讓其能用在異質性的叢集上。
In the last few years, there are more and more systems and applications using a large volume of continuous data streams. Apache Storm is a well-known and distributed real-time computation system for processing unbounded and large volumes of stream data with high throughput and low latency. On scalability of Storm, there are some drawbacks. And a paper proposed a new scaling mechanism to avoid these drawbacks. But, this mechanism is not complete, it only has simple policy to add resource. When the high loading condition is different, the adding resource is also different. Besides, when high loading condition disappeared, it also need to release redundant resource.
Based on this mechanism, we proposed a queue length-based resource management policy for a topology-based scaling mechanism on Storm and made some modify on this mechanism. In our simulation result, our resource management policy
is effective and its response time is very small.
In the future, we will implement this policy on this mechanism, and do some performance evaluations. Besides, we will improve our policy so that it can accommodate to the heterogeneous cluster.
[1]Twitter record, https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
[2]Shvachko, Konstantin, et al. "The hadoop distributed file system." Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. IEEE, 2010.
[3]Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
[4]Apache Hadoop. https://hadoop.apache.org/
[5]Apache Storm. https://storm.apache.org/
[6]孫苙達, A Topology-Based Scaling Mechanism for Storm, 2015
[7]Apache Kafka. http://kafka.apache.org/
[8]Kreps, Jay, Neha Narkhede, and Jun Rao. "Kafka: A distributed messaging system for log processing." Proceedings of the NetDB. 2011.
[9]Xu, Jielong, et al. "T-Storm: traffic-aware online scheduling in storm." Distributed Computing Systems (ICDCS), 2014 IEEE 34th International Conference on. IEEE, 2014.
[10]Apache Zookeeper. https://zookeeper.apache.org/
[11]Aniello, Leonardo, Roberto Baldoni, and Leonardo Querzoni. "Adaptive online scheduling in storm." Proceedings of the 7th ACM international conference on Distributed event-based systems. ACM, 2013.
[12]Jain, Ankit, and Anand Nalya. Learning Storm. Packt Publ., 2014.
[13]Rychly, Marek, Petr Koda, and P. Smrz. "Scheduling decisions in stream processing on heterogeneous clusters." Complex, Intelligent and Software Intensive Systems (CISIS), 2014 Eighth International Conference on. IEEE, 2014.
[14]Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.
[15]Ranjan, Rajiv. "Streaming Big Data Processing in Datacenter Clouds." Cloud Computing, IEEE 1.1 (2014): 78-83.
[16]Abadi, Daniel J., et al. "The Design of the Borealis Stream Processing Engine."CIDR. Vol.5. 2005.
[17]Neumeyer, Leonardo, et al. "S4: Distributed stream computing platform." Data Mining Workshops (ICDMW), 2010 IEEE International Conference on. IEEE, 2010.
[18]Zaharia, Matei, et al. "Discretized streams: Fault-tolerant streaming computation at scale." Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013.
[19]Bedini, Ivan, et al. "Modeling performance of a parallel streaming engine: bridging theory and costs." Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering. ACM, 2013.
[20]Abadi, Daniel, et al. "Aurora: a data stream management system." Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM, 2003.
[21]Lohrmann, Björn, Daniel Warneke, and Odej Kao. "Massively-parallel stream processing under QoS constraints with Nephele." Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing. ACM, 2012.
[22]Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., ... & Ryaboy, D. (2014, June). Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 147-156). ACM.
校內:2020-08-26公開