簡易檢索 / 詳目顯示

研究生: 鄭煌穆
Zheng, Huang-Mu
論文名稱: Apache Ozone一體適用之資料管理其效能議題探討
The One-Size-Fits-All Metadata Management in Apache Ozone: Its Performance Issues
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 55
中文關鍵詞: 分散式物件儲存系統嵌入式資料庫效能分析資料倉儲
外文關鍵詞: Apache Ozone, RocksDB, Performance Analysis, TPC-DS, YCSB
相關次數: 點閱:134下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,人工智慧的火紅、雲端運算及大數據的興起,對於資料的儲存規模以及使用方式開始有了改變,Apache Ozone做為Apache Hadoop社群推出的新一代分散式物件儲存系統,正是為了因應當前的潮流並試圖解決Apache HDFS(Hadoop Distributed File System)遇到的瓶頸而做出的改變。
    Apache Ozone的所有操作,都需要經過RocksDB的寫入或讀取,當其面向大數據的應用時RocksDB之性能的優劣將很大程度關乎Apache Ozone之性能表現。這裡,RocksDB是Facebook基於Google開發的LevelDB修改而來的embeddable persistent key-value store,是當今一個被普遍應用於各種資料庫的儲存引擎之一。
    本研究選擇業界廣泛使用的TPC-DS Benchmark及Spark做為應用場景,透過Apache Ozone及RocksDB監測工具,本研究觀測並結論TPC-DS對Apache Ozone的操作行為。接著藉由Yahoo Cloud Serving Benchmark模擬未來Apache Ozone面對大數據應用下可能面臨的場景來外插其效能表現。本研究發現Apache Ozone裡並非所有的元件皆適合使用RocksDB,Apache Ozone其中部分元件偏重對於資料的操作讀取,此時使用傳統之MySQL InnoDB反而更適切;相對偏重寫入的操作,則RocksDB明顯優於InnoDB。

    Apache Ozone is the next generation distributed storage substrate of Apache Hadoop distributed file system (HDFS). Unlike HDFS, Ozone provides the notion of objects. Objects are the fundamental data entity. While applications in Ozone refer to objects in terms of key and value pairs, buckets akin the directories in typical file systems are also offered to applications, that serve to manage a number of objects and can be organized in a hierarchical manner. In contrast to HDFS, Ozone distributes its metadata management, thus eliminating the performance bottleneck due to metadata store.
    Ozone relies on multiple and distributed stores for metadata management. The core component of these distributed stores is heavily based on RocksDB’s that are log-structured merged trees (LSM), essentially. Consequently, in addition to internal protocols, RocksDB uniquely determines the performance of data accesses in Ozone.
    We in this study investigate whether the design choice of highly depending on RocksDB by Ozone can meet the demands of applications that Ozone intends to serve. Specifically, we mainly consider the data warehousing application. Our study concludes that the one-size-fits-all solution based on RocksDB in Ozone can be evolutionary. Particularly, it is favorable that some components in Ozone shall take advantage of LSM due to RocksDB; on the other hand, there exist components that shall exploit the typical balanced search tree data structure adopted by database engines such as MySQL InnoDB.

    摘要 i Extended Abstract ii 誌謝 vi 目錄 vii 圖目錄 x 表格目錄 xii CHAPTER 1 緒論 1 1.1 研究背景 1 1.2 相關研究 3 1.3 研究動機與目的 4 1.4 主要研究結果 4 1.5 論文結構 5 CHAPTER 2 Our Methodology 6 2.1 研究方法 6 2.2 研究步驟 7 CHAPTER 3 Apache Ozone, Facebook RocksDB, TPC-DS and Yahoo! YCSB 10 3.1 Apache Ozone 10 3.1.1 Ozone Manager 11 3.1.2 Storage Container Manager 12 3.1.3 Datanode 13 3.1.4 Apache Ozone寫入流程 13 3.1.5 Apache Ozone讀取流程 15 3.2 Facebook RocksDB 16 3.2.1 RocksDB寫入流程 17 3.2.2 RocksDB讀取流程 18 3.3 TPC-DS 18 3.4 Yahoo Cloud Serving Benchmark (YCSB) 19 CHAPTER 4 前置實驗 20 4.1 前置實驗環境 20 4.2 實驗環境設定 21 4.2.1 Apache Ozone程式修改 22 4.2.2 Apache Ozone部署設定 24 4.2.3 Apache Hadoop部署設定 26 4.2.4 Apache Spark部署設定 27 4.2.5 TPC-DS 部署設定 28 4.3 前置實驗1:TPC-DS 產生資料 30 4.3.1 Ozone Manager與RocksDB Statistics的觀測數據 30 4.3.2 Ozone Manager Audit Log分析 32 4.3.3 Container與RocksDB Statistics的觀測數據 35 4.3.4 Datanode Log分析 35 4.4 前置實驗2:TPC-DS 查詢資料 38 4.4.1 Ozone Manager與RocksDB Statistics的觀測數據 38 4.4.2 Ozone Manager Audit Log分析 38 4.4.3 Container與RocksDB Statistics的觀測數據 40 4.4.4 Datanode Log分析 40 4.5 Storage Container Manager情境 42 CHAPTER 5 YCSB實驗 43 5.1 實驗環境 43 5.2 YCSB實驗對象 44 5.3 YCSB Workload設定 44 5.3.1 共同設定 44 5.3.2 TPC-DS產生資料:Ozone Manager Workload 45 5.3.3 TPC-DS產生資料:Container Workload 45 5.3.4 TPC-DS查詢資料:Ozone Manager Workload 45 5.3.5 TPC-DS查詢資料:Container Workload 46 5.4 實驗流程 46 5.5 實驗結果:TPC-DS產生資料階段Ozone Manager 47 5.6 實驗結果:TPC-DS產生資料階段Container 49 5.7 實驗結果:TPC-DS查詢資料階段Ozone Manager 51 5.8 實驗結果:TPC-DS查詢資料階段Container 52 CHAPTER 6 結論與未來展望 53 參考文獻 54

    [1] Apache Hadoop. Available from: https://hadoop.apache.org/.
    [2] Ghemawat, S., H. Gobioff, and S.-T. Leung. The Google file system. in Proceedings of the nineteenth ACM symposium on Operating systems principles. 2003.
    [3] Dean, J. and S. Ghemawat, MapReduce: Simplified data processing on large clusters. 2004.
    [4] Apache Spark. Available from: https://spark.apache.org/.
    [5] Apache Ozone. Available from: https://ozone.apache.org/.
    [6] Amazon S3. Available from: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html.
    [7] Ceph. Available from: https://ceph.io/en/.
    [8] openstack Swift. Available from: https://docs.openstack.org/swift/latest/.
    [9] Kubernetes. Available from: https://kubernetes.io/.
    [10] RocksDB. Available from: https://rocksdb.org/.
    [11] LevelDB. Available from: https://github.com/google/leveldb.
    [12] O’Neil, P., et al., The log-structured merge-tree (LSM-tree). Acta Informatica, 1996. 33(4): p. 351-385.
    [13] MyRocks. Available from: http://myrocks.io/.
    [14] TiKV. Available from: https://tikv.org/.
    [15] Apache Kafka. Available from: https://kafka.apache.org/.
    [16] RocksDB in Apache Kafka. Available from: https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/.
    [17] Apache Flink. Available from: https://flink.apache.org/.
    [18] RocksDB in Apache Flink. Available from: https://flink.apache.org/2021/01/18/rocksdb.html.
    [19] HammerDB. Available from: https://www.hammerdb.com/.
    [20] Armstrong, T.G., et al., LinkBench: a database benchmark based on the Facebook social graph, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, Association for Computing Machinery. p. 1185–1196.
    [21] Ray, S., B. Simion, and A.D. Brown. Jackpine: A benchmark to evaluate spatial database performance. in 2011 IEEE 27th International Conference on Data Engineering. 2011: IEEE.
    [22] TPC. Available from: http://tpc.org/.
    [23] TPC-DS. Available from: http://www.tpc.org/tpcds/.
    [24] YCSB. Available from: https://github.com/brianfrankcooper/YCSB.
    [25] The YCSB announcement from Yahoo! ; Available from: https://research.yahoo.com/news/yahoo-cloud-serving-benchmark/.
    [26] Cooper, B.F., et al., Benchmarking cloud serving systems with YCSB, in Proceedings of the 1st ACM symposium on Cloud computing. 2010, Association for Computing Machinery. p. 143–154.
    [27] Apache Spark 3.0.0 Release Note. Available from: https://spark.apache.org/releases/spark-release-3-0-0.html.
    [28] How MySQL Uses Indexes. Available from: https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html.

    無法下載圖示 校內:2026-08-20公開
    校外:2026-08-20公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE