| 研究生: |
宋可易 Song, Ke-Yi |
|---|---|
| 論文名稱: |
Apache Ozone之異構儲存探討 Exploiting Heterogeneous Storage in Apache Ozone |
| 指導教授: |
蕭宏章
Hsiao, Hung-Chang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | Apache Ozone 、RocksDB 、列族 、TPC-DS 、YCSB 、效能分析 |
| 外文關鍵詞: | Apache Ozone, RocksDB, Column Family, TPC-DS, YCSB, Performance Analysis |
| 相關次數: | 點閱:88 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
21世紀以來,隨著互聯網技術的快速發展,人們獲取、儲存、分析數據的能力不斷增强,全球數據呈現爆發式增長的趨勢。Facebook作爲世界上最受歡迎的社交媒體平臺現擁有超過22億活躍使用者,這些使用者每天花數個小時進行發佈貼文、評論貼文、點選廣告等操作,以至於每天產生的資料量難以計數。面對龐大的資料量,如何儲存巨量資料已成爲當下值得關注的議題。
Apache Hadoop Distributed File System(HDFS)目的為巨量資料提供儲存,能提供高吞吐量的資料訪問。但隨著網路時代的來臨,越來越多的小檔案需要被儲存,以至於HDFS在面對巨量小檔案的情境下出現效能瓶頸。Apache Ozone的誕生目的解決此問題,將元資料不採用像Namenode一樣統一存放於記憶體,而是由各自元件的RocksDB進行管理。
本研究設計將TPC-DS運行於Apache Ozone之上,模擬資料倉儲應用情境。在資料處理階段,通過觀察Apache Ozone對Ozone Manager RocksDB中Column Family的操作行爲,收集更加細粒度的統計,分析建議將Ozone Manager儲存於何種儲存體,並使用Yahoo! Cloud Serving Benchmark驗證建議。最終研究發現,在TPC-DS應用情境下,Ozone Manager適合儲存於SSD,而Container適合儲存於HDD。
With the rapid development of Internet technology in the 21st century, people's ability to access, store and analyse data has increased and the world is experiencing an explosion of data. The amount of data generated each day is untold. With such a huge amount of data, how to store it has become an issue of concern.
The Apache Hadoop Distributed File System (HDFS) is designed to provide storage for huge amounts of data, providing high throughput access to data. Apache Ozone was created to solve this problem by storing metadata not in memory like Namenode, but in a separate component of RocksDB.
In this study, TPC-DS was designed to run on top of Apache Ozone to simulate a data warehousing application. During the data processing phase, more granular statistics were collected by observing the operation behaviour of Apache Ozone on the Column Family in the Ozone Manager RocksDB, analysing the recommended storage for Ozone Manager and validating the recommendations using Yahoo! Cloud Serving Benchmark. In the end, the study found that Ozone Manager is suitable for storage on SSD and Container is suitable for storage on HDD in the TPC-DS application scenario.
[1] Apache Hadoop. [Online]. Available: https://hadoop.apache.org/
[2] Apache Lucene. [Online]. Available: https://lucene.apache.org/
[3] Apache Nutch. [Online]. Available: https://nutch.apache.org/
[4] Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," presented at the Proceedings of the Symposium on Operating Systems Design and Implementation, 2004.
[5] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, "The Google file system," presented at the Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003.
[6] HDFS. [Online]. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
[7] Yarn. [Online]. Available: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html
[8] Apache Spark. [Online]. Available: https://spark.apache.org/
[9] Apache Ozone. [Online]. Available: https://ozone.apache.org/
[10] Amazon S3. [Online]. Available: https://aws.amazon.com/s3/
[11] Google Cloud Storage. [Online]. Available: https://cloud.google.com/storage
[12] Microsoft Azure Blob Storage. [Online]. Available: https://azure.microsoft.com/en-us/products/storage/blobs/
[13] Kubernetes. [Online]. Available: https://kubernetes.io/
[14] Docker. [Online]. Available: https://www.docker.com/
[15] Apache Hive. [Online]. Available: https://hive.apache.org/
[16] RocksDB. [Online]. Available: http://rocksdb.org/
[17] LevelDB. [Online]. Available: https://github.com/google/leveldb
[18] Patrick O’Neil et al., "The log-structured merge-tree (LSM-tree)," presented at the Proceedings of the Acta Informatica, 1996, 33(4): p. 351-385.
[19] Apache Hbase. [Online]. Available: https://hbase.apache.org/
[20] E.F. Codd, S.B. Codd and C.T. Salley, Providing OLAP (On-line Analytical Processing) to User-analysts: An IT Mandate, Codd & Associates, 1993.
[21] QI Gui-jie et al., "Data warehouse based OLAP Application Research——Applied to an Appliance Company," presented at the Proceedings of the Chinese Journal of Management Science, 2006, (2): p. 112-116.
[22] TPC. [Online]. Available: http://tpc.org/
[23] TPC-DS. [Online]. Available:http://www.tpc.org/tpcds/
[24] YCSB. [Online]. Available:https://github.com/brianfrankcooper/YCSB
[25] Redis. [Online]. Available: https://redis.io/
[26] MongoDB. [Online]. Available: https://www.mongodb.com/
[27] Brian F. Cooper et al., "Benchmarking Cloud Serving Systems with YCSB," presented at the Proceedings of the 1st ACM symposium on Cloud computing, 2010, p. 143-154.
[28] Column Family in RocksDB. Available: https://github.com/facebook/rocksdb/wiki/Column-Families
[29] Benchmarking Ozone: Cloudera’s next generation Storage for CDP. [Online]. Available: https://blog.cloudera.com/benchmarking-ozone-clouderas-next-generation-storage-for-cdp/
[30] Apache Impala. [Online]. Available: https://impala.apache.org/
[31] Apache Phoenix. [Online]. Available: https://phoenix.apache.org/
校內:2027-12-02公開