| 研究生: |
鄭煌穆 Zheng, Huang-Mu |
|---|---|
| 論文名稱: |
Apache Ozone一體適用之資料管理其效能議題探討 The One-Size-Fits-All Metadata Management in Apache Ozone: Its Performance Issues |
| 指導教授: |
蕭宏章
Hsiao, Hung-Chang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 分散式物件儲存系統 、嵌入式資料庫 、效能分析 、資料倉儲 |
| 外文關鍵詞: | Apache Ozone, RocksDB, Performance Analysis, TPC-DS, YCSB |
| 相關次數: | 點閱:134 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,人工智慧的火紅、雲端運算及大數據的興起,對於資料的儲存規模以及使用方式開始有了改變,Apache Ozone做為Apache Hadoop社群推出的新一代分散式物件儲存系統,正是為了因應當前的潮流並試圖解決Apache HDFS(Hadoop Distributed File System)遇到的瓶頸而做出的改變。
Apache Ozone的所有操作,都需要經過RocksDB的寫入或讀取,當其面向大數據的應用時RocksDB之性能的優劣將很大程度關乎Apache Ozone之性能表現。這裡,RocksDB是Facebook基於Google開發的LevelDB修改而來的embeddable persistent key-value store,是當今一個被普遍應用於各種資料庫的儲存引擎之一。
本研究選擇業界廣泛使用的TPC-DS Benchmark及Spark做為應用場景,透過Apache Ozone及RocksDB監測工具,本研究觀測並結論TPC-DS對Apache Ozone的操作行為。接著藉由Yahoo Cloud Serving Benchmark模擬未來Apache Ozone面對大數據應用下可能面臨的場景來外插其效能表現。本研究發現Apache Ozone裡並非所有的元件皆適合使用RocksDB,Apache Ozone其中部分元件偏重對於資料的操作讀取,此時使用傳統之MySQL InnoDB反而更適切;相對偏重寫入的操作,則RocksDB明顯優於InnoDB。
Apache Ozone is the next generation distributed storage substrate of Apache Hadoop distributed file system (HDFS). Unlike HDFS, Ozone provides the notion of objects. Objects are the fundamental data entity. While applications in Ozone refer to objects in terms of key and value pairs, buckets akin the directories in typical file systems are also offered to applications, that serve to manage a number of objects and can be organized in a hierarchical manner. In contrast to HDFS, Ozone distributes its metadata management, thus eliminating the performance bottleneck due to metadata store.
Ozone relies on multiple and distributed stores for metadata management. The core component of these distributed stores is heavily based on RocksDB’s that are log-structured merged trees (LSM), essentially. Consequently, in addition to internal protocols, RocksDB uniquely determines the performance of data accesses in Ozone.
We in this study investigate whether the design choice of highly depending on RocksDB by Ozone can meet the demands of applications that Ozone intends to serve. Specifically, we mainly consider the data warehousing application. Our study concludes that the one-size-fits-all solution based on RocksDB in Ozone can be evolutionary. Particularly, it is favorable that some components in Ozone shall take advantage of LSM due to RocksDB; on the other hand, there exist components that shall exploit the typical balanced search tree data structure adopted by database engines such as MySQL InnoDB.
[1] Apache Hadoop. Available from: https://hadoop.apache.org/.
[2] Ghemawat, S., H. Gobioff, and S.-T. Leung. The Google file system. in Proceedings of the nineteenth ACM symposium on Operating systems principles. 2003.
[3] Dean, J. and S. Ghemawat, MapReduce: Simplified data processing on large clusters. 2004.
[4] Apache Spark. Available from: https://spark.apache.org/.
[5] Apache Ozone. Available from: https://ozone.apache.org/.
[6] Amazon S3. Available from: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html.
[7] Ceph. Available from: https://ceph.io/en/.
[8] openstack Swift. Available from: https://docs.openstack.org/swift/latest/.
[9] Kubernetes. Available from: https://kubernetes.io/.
[10] RocksDB. Available from: https://rocksdb.org/.
[11] LevelDB. Available from: https://github.com/google/leveldb.
[12] O’Neil, P., et al., The log-structured merge-tree (LSM-tree). Acta Informatica, 1996. 33(4): p. 351-385.
[13] MyRocks. Available from: http://myrocks.io/.
[14] TiKV. Available from: https://tikv.org/.
[15] Apache Kafka. Available from: https://kafka.apache.org/.
[16] RocksDB in Apache Kafka. Available from: https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/.
[17] Apache Flink. Available from: https://flink.apache.org/.
[18] RocksDB in Apache Flink. Available from: https://flink.apache.org/2021/01/18/rocksdb.html.
[19] HammerDB. Available from: https://www.hammerdb.com/.
[20] Armstrong, T.G., et al., LinkBench: a database benchmark based on the Facebook social graph, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, Association for Computing Machinery. p. 1185–1196.
[21] Ray, S., B. Simion, and A.D. Brown. Jackpine: A benchmark to evaluate spatial database performance. in 2011 IEEE 27th International Conference on Data Engineering. 2011: IEEE.
[22] TPC. Available from: http://tpc.org/.
[23] TPC-DS. Available from: http://www.tpc.org/tpcds/.
[24] YCSB. Available from: https://github.com/brianfrankcooper/YCSB.
[25] The YCSB announcement from Yahoo! ; Available from: https://research.yahoo.com/news/yahoo-cloud-serving-benchmark/.
[26] Cooper, B.F., et al., Benchmarking cloud serving systems with YCSB, in Proceedings of the 1st ACM symposium on Cloud computing. 2010, Association for Computing Machinery. p. 143–154.
[27] Apache Spark 3.0.0 Release Note. Available from: https://spark.apache.org/releases/spark-release-3-0-0.html.
[28] How MySQL Uses Indexes. Available from: https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html.
校內:2026-08-20公開