簡易檢索 / 詳目顯示

研究生: 陶俊穎
Tao, Chun-Ying
論文名稱: 於Kubernetes上設計容器化HBase集群
Containerizing HBase Cluster On Kubernetes
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導教授: 張志標
Chang, Jyh-Biau
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 31
中文關鍵詞: 分散式資料庫HBase容器化技術DockerKubernetes效能評測
外文關鍵詞: HBase, HDFS, Docker, Container, Kubernetes, Performance Evaluation, YCSB
相關次數: 點閱:49下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著大數據趨勢的興起,對大容量資料庫存儲的需求已成為一個重要問題。 Hadoop作為一個大數據分析平台,提供了HBase作為大數據資料庫。 HBase 是一個Column-oriented的 NoSQL 數據庫。它使用HDFS作為存儲,適合與Hadoop環境中的應用整合。但是設定並部署一個HBase 集群非常複雜且耗時。隨著近年容器化技術的發展,我們可以利用它對 HBase 提供更高的移動性,使其易於部署。一些研究案例已經使用了容器化的 HBase,但大多數只是處於應用階段。在面對不同的應用程序使用時,不同的部屬方式會產生不同的影響。在這項研究中,我們提出了 2 種部署方法。Container-dedicated approach和Container-shared approach並透過 Docker 和 Kubernetes 來實現。同時這兩種方法都將透過 HBase PE 和 YCSB 進行各方面的測試。經過HBase PE實驗,我們發現Container-dedicated approach適合應用於大量寫入的應用,Container-shared approach則適合用於大量讀取的應用。在 YCSB 實驗中,Container-dedicated approach適用於包含大量寫入操作或混合操作的應用。另一方面,Container-shared approach在讀取比例較大的操作下具有更好的性能。

    As the big data trend rises, the demand for large-volume database storage has become an important issue. Hadoop, as a big data analysis platform, comes with a NoSQL database solution called HBase. HBase is a column-oriented NoSQL database. It uses HDFS as its storage and is suitable for integrating Hadoop ecosystem applications. However, deploying an HBase cluster could be quite complicated and time-consuming. As container technology developed these years, we can take advantage of its portability to containerized HBase and make it easy to deploy. Some research cases use containerized HBase already but most of them are only in the application phase. Different approaches can make different impacts when facing various application usages. In this research, we proposed 2 deployment approaches. Container-dedicated approach and Container shared approach. We will implement our approaches with Docker container and Kubernetes. After implementation, both approaches will be tested with HBase PE and YCSB. Through the HBase PE experiment, we found out that Container-dedicated approach is suitable for write heavy application usage and Container-shared approach is suitable for read heavy application usage. On YCSB experiment, Container-dedicated approach are suitable for workloads that contain large write proportions and mixed operation workload. On the other hand, Container-shared has better performance with workloads that has large read proportions.

    摘要 I Abstract II Tables IV Figures V Charts VI Chapter 1 : 1 Chapter 2 : Background & Related Works 3 2.1 Background 3 2.1.1 HBase 3 2.1.2 Docker Container 4 2.1.3 Kubernetes 6 2.2 Related Works 8 Chapter 3 : Methodology 10 3.1 Container-dedicated approach 10 3.2 Container-shared approach 11 3.3 Approaches Comparison 12 Chapter 4 : Implementation 14 4.1 Overview 14 4.2 Docker container image design 14 4.3 Container orchestration with Kubernetes 16 4.4 Pod distribution 18 Chapter 5 : Experiments 20 5.1 Experimental Environment 20 5.2 Experiment 1 – HBase PE 22 5.3 Experiment 2 – YCSB 24 5.4 Discussion 28 Chapter 6 : Conclusion and Future work 29

    [1] Big Growth Forecasted for Big Data ttps://www.datanami.com/2022/01/11/biggrowth-forecasted-for-big-data/
    [2] Apache HBase https://hbase.apache.org/
    [3] Docker https://www.docker.com/
    [4] Kubernetes https://kubernetes.io/
    [5] Mehul Nalin Vora, "Hadoop-HBase for large-scale data," Proceedings of 2011
    International Conference on Computer Science and Network Technology, 2011, pp.
    601-605, doi: 10.1109/ICCSNT.2011.6182030.
    [6] X. Cui and W. Chen, "Performance Comparison Test of HBase and Cassandra
    Based on YCSB," 2021 IEEE/ACIS 19th International Conference on Computer and
    Information Science (ICIS), 2021, pp. 70-77, doi: 10.1109/ICIS51600.2021.9516864.
    [7] E. S. Pramukantoro, D. Primanita Kartikasari and R. A. Siregar, "Performance
    Evaluation of MongoDB, Cassandra, and HBase for Heterogenous IoT Data Storage,"
    2019 2nd International Conference on Applied Information Technology and
    Innovation (ICAITI), 2019, pp. 203-206, doi: 10.1109/ICAITI48442.2019.8982159.
    [8] E. Tang and Y. Fan, "Performance Comparison between Five NoSQL
    Databases," 2016 7th International Conference on Cloud Computing and Big Data
    (CCBD), 2016, pp. 105-109, doi: 10.1109/CCBD.2016.030.
    [9] V. D. Jogi and A. Sinha, "Performance evaluation of MySQL, Cassandra and
    HBase for heavy write operation," 2016 3rd International Conference on Recent
    Advances in Information Technology (RAIT), 2016, pp. 586-590, doi:
    10.1109/RAIT.2016.7507964.31
    [10] Deepak Vohra Kubernetes Microservices with Docker - Using Apache Hadoop
    Ecosystem
    [11] Keita, Moussa (2021): Big Data et Technologies de Stockage et de Traitement
    des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS,
    MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK).
    [12] S. Wu, X. Wang, B. Tang, X. Li, J. Zhu and K. Deng, "A Cloud Storage
    Framework for Massive Meteorological and Oceanographic Data and the Application
    of Virtualization Technology," 2020 International Conference on Space-Air-Ground
    Computing (SAGC), 2020, pp. 25-32, doi: 10.1109/SAGC50777.2020.00016.

    下載圖示 校內:2025-08-31公開
    校外:2025-08-31公開
    QR CODE