| 研究生: |
陶俊穎 Tao, Chun-Ying |
|---|---|
| 論文名稱: |
於Kubernetes上設計容器化HBase集群 Containerizing HBase Cluster On Kubernetes |
| 指導教授: |
謝錫堃
Shieh, Ce-Kuen |
| 共同指導教授: |
張志標
Chang, Jyh-Biau |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 31 |
| 中文關鍵詞: | 分散式資料庫 、HBase 、容器化技術 、Docker 、Kubernetes 、效能評測 |
| 外文關鍵詞: | HBase, HDFS, Docker, Container, Kubernetes, Performance Evaluation, YCSB |
| 相關次數: | 點閱:49 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著大數據趨勢的興起,對大容量資料庫存儲的需求已成為一個重要問題。 Hadoop作為一個大數據分析平台,提供了HBase作為大數據資料庫。 HBase 是一個Column-oriented的 NoSQL 數據庫。它使用HDFS作為存儲,適合與Hadoop環境中的應用整合。但是設定並部署一個HBase 集群非常複雜且耗時。隨著近年容器化技術的發展,我們可以利用它對 HBase 提供更高的移動性,使其易於部署。一些研究案例已經使用了容器化的 HBase,但大多數只是處於應用階段。在面對不同的應用程序使用時,不同的部屬方式會產生不同的影響。在這項研究中,我們提出了 2 種部署方法。Container-dedicated approach和Container-shared approach並透過 Docker 和 Kubernetes 來實現。同時這兩種方法都將透過 HBase PE 和 YCSB 進行各方面的測試。經過HBase PE實驗,我們發現Container-dedicated approach適合應用於大量寫入的應用,Container-shared approach則適合用於大量讀取的應用。在 YCSB 實驗中,Container-dedicated approach適用於包含大量寫入操作或混合操作的應用。另一方面,Container-shared approach在讀取比例較大的操作下具有更好的性能。
As the big data trend rises, the demand for large-volume database storage has become an important issue. Hadoop, as a big data analysis platform, comes with a NoSQL database solution called HBase. HBase is a column-oriented NoSQL database. It uses HDFS as its storage and is suitable for integrating Hadoop ecosystem applications. However, deploying an HBase cluster could be quite complicated and time-consuming. As container technology developed these years, we can take advantage of its portability to containerized HBase and make it easy to deploy. Some research cases use containerized HBase already but most of them are only in the application phase. Different approaches can make different impacts when facing various application usages. In this research, we proposed 2 deployment approaches. Container-dedicated approach and Container shared approach. We will implement our approaches with Docker container and Kubernetes. After implementation, both approaches will be tested with HBase PE and YCSB. Through the HBase PE experiment, we found out that Container-dedicated approach is suitable for write heavy application usage and Container-shared approach is suitable for read heavy application usage. On YCSB experiment, Container-dedicated approach are suitable for workloads that contain large write proportions and mixed operation workload. On the other hand, Container-shared has better performance with workloads that has large read proportions.
[1] Big Growth Forecasted for Big Data ttps://www.datanami.com/2022/01/11/biggrowth-forecasted-for-big-data/
[2] Apache HBase https://hbase.apache.org/
[3] Docker https://www.docker.com/
[4] Kubernetes https://kubernetes.io/
[5] Mehul Nalin Vora, "Hadoop-HBase for large-scale data," Proceedings of 2011
International Conference on Computer Science and Network Technology, 2011, pp.
601-605, doi: 10.1109/ICCSNT.2011.6182030.
[6] X. Cui and W. Chen, "Performance Comparison Test of HBase and Cassandra
Based on YCSB," 2021 IEEE/ACIS 19th International Conference on Computer and
Information Science (ICIS), 2021, pp. 70-77, doi: 10.1109/ICIS51600.2021.9516864.
[7] E. S. Pramukantoro, D. Primanita Kartikasari and R. A. Siregar, "Performance
Evaluation of MongoDB, Cassandra, and HBase for Heterogenous IoT Data Storage,"
2019 2nd International Conference on Applied Information Technology and
Innovation (ICAITI), 2019, pp. 203-206, doi: 10.1109/ICAITI48442.2019.8982159.
[8] E. Tang and Y. Fan, "Performance Comparison between Five NoSQL
Databases," 2016 7th International Conference on Cloud Computing and Big Data
(CCBD), 2016, pp. 105-109, doi: 10.1109/CCBD.2016.030.
[9] V. D. Jogi and A. Sinha, "Performance evaluation of MySQL, Cassandra and
HBase for heavy write operation," 2016 3rd International Conference on Recent
Advances in Information Technology (RAIT), 2016, pp. 586-590, doi:
10.1109/RAIT.2016.7507964.31
[10] Deepak Vohra Kubernetes Microservices with Docker - Using Apache Hadoop
Ecosystem
[11] Keita, Moussa (2021): Big Data et Technologies de Stockage et de Traitement
des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS,
MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK).
[12] S. Wu, X. Wang, B. Tang, X. Li, J. Zhu and K. Deng, "A Cloud Storage
Framework for Massive Meteorological and Oceanographic Data and the Application
of Virtualization Technology," 2020 International Conference on Space-Air-Ground
Computing (SAGC), 2020, pp. 25-32, doi: 10.1109/SAGC50777.2020.00016.