簡易檢索 / 詳目顯示

研究生: 宋可易
Song, Ke-Yi
論文名稱: Apache Ozone之異構儲存探討
Exploiting Heterogeneous Storage in Apache Ozone
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 48
中文關鍵詞: Apache OzoneRocksDB列族TPC-DSYCSB效能分析
外文關鍵詞: Apache Ozone, RocksDB, Column Family, TPC-DS, YCSB, Performance Analysis
相關次數: 點閱:88下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 21世紀以來,隨著互聯網技術的快速發展,人們獲取、儲存、分析數據的能力不斷增强,全球數據呈現爆發式增長的趨勢。Facebook作爲世界上最受歡迎的社交媒體平臺現擁有超過22億活躍使用者,這些使用者每天花數個小時進行發佈貼文、評論貼文、點選廣告等操作,以至於每天產生的資料量難以計數。面對龐大的資料量,如何儲存巨量資料已成爲當下值得關注的議題。
    Apache Hadoop Distributed File System(HDFS)目的為巨量資料提供儲存,能提供高吞吐量的資料訪問。但隨著網路時代的來臨,越來越多的小檔案需要被儲存,以至於HDFS在面對巨量小檔案的情境下出現效能瓶頸。Apache Ozone的誕生目的解決此問題,將元資料不採用像Namenode一樣統一存放於記憶體,而是由各自元件的RocksDB進行管理。
    本研究設計將TPC-DS運行於Apache Ozone之上,模擬資料倉儲應用情境。在資料處理階段,通過觀察Apache Ozone對Ozone Manager RocksDB中Column Family的操作行爲,收集更加細粒度的統計,分析建議將Ozone Manager儲存於何種儲存體,並使用Yahoo! Cloud Serving Benchmark驗證建議。最終研究發現,在TPC-DS應用情境下,Ozone Manager適合儲存於SSD,而Container適合儲存於HDD。

    With the rapid development of Internet technology in the 21st century, people's ability to access, store and analyse data has increased and the world is experiencing an explosion of data. The amount of data generated each day is untold. With such a huge amount of data, how to store it has become an issue of concern.
    The Apache Hadoop Distributed File System (HDFS) is designed to provide storage for huge amounts of data, providing high throughput access to data. Apache Ozone was created to solve this problem by storing metadata not in memory like Namenode, but in a separate component of RocksDB.
    In this study, TPC-DS was designed to run on top of Apache Ozone to simulate a data warehousing application. During the data processing phase, more granular statistics were collected by observing the operation behaviour of Apache Ozone on the Column Family in the Ozone Manager RocksDB, analysing the recommended storage for Ozone Manager and validating the recommendations using Yahoo! Cloud Serving Benchmark. In the end, the study found that Ozone Manager is suitable for storage on SSD and Container is suitable for storage on HDD in the TPC-DS application scenario.

    摘要 i Extended Abstract ii 誌謝 viii 目錄 ix 圖目錄 xi 表格目錄 xii CHAPTER 1. 緒論 1 1.1 研究背景 1 1.2 相關研究 4 1.3 研究動機與目的 5 1.4 主要研究結果 6 1.5 論文結構 7 CHAPTER 2. Our Proposed Methodology 8 2.1 研究方法 8 2.2 研究步驟 9 CHAPTER 3. 研究對象簡介 11 3.1 Apache Ozone 11 3.1.1 Ozone Manager 12 3.1.2 Storage Container Manager 13 3.1.3 Datanode 13 3.1.4 Apache Ozone寫入流程 14 3.1.5 Apache Ozone讀取流程 15 3.2 RocksDB 16 3.2.1 RocksDB之Column Family 17 3.2.2 RocksDB寫入流程 18 3.2.3 RockDB讀取流程 19 3.3 TPC-DS 20 3.4 YCSB 21 CHAPTER 4. 前置實驗 22 4.1 前置實驗環境配置 22 4.2 前置實驗軟體環境設定 23 4.2.1 Apache Ozone部署設定 24 4.2.2 Apache Ozone程式修改 26 4.2.3 Apache Hadoop部署設定 31 4.2.4 Apache Spark部署設定 32 4.2.5 TPC-DS 部署設定 33 4.3 前置實驗結果與分析 35 4.3.1 前置實驗結果 35 4.3.2 前置實驗分析 38 CHAPTER 5. YCSB實驗 41 5.1 YCSB實驗對象 41 5.2 YCSB實驗環境配置 41 5.3 YCSB Workload設定 42 5.3.1 共同設定 42 5.3.2 TPC-DS資料產生:bucketTable Workload 43 5.3.3 TPC-DS資料產生:keyTable Workload 43 5.4 YCSB實驗流程 43 5.5 YCSB實驗結果:TPC-DS資料產生階段bucketTable 44 5.6 YCSB實驗結果:TPC-DS資料產生階段keyTable 45 CHAPTER 6. 結論與未來展望 46 参考文献 47

    [1] Apache Hadoop. [Online]. Available: https://hadoop.apache.org/
    [2] Apache Lucene. [Online]. Available: https://lucene.apache.org/
    [3] Apache Nutch. [Online]. Available: https://nutch.apache.org/
    [4] Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," presented at the Proceedings of the Symposium on Operating Systems Design and Implementation, 2004.
    [5] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, "The Google file system," presented at the Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003.
    [6] HDFS. [Online]. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
    [7] Yarn. [Online]. Available: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html
    [8] Apache Spark. [Online]. Available: https://spark.apache.org/
    [9] Apache Ozone. [Online]. Available: https://ozone.apache.org/
    [10] Amazon S3. [Online]. Available: https://aws.amazon.com/s3/
    [11] Google Cloud Storage. [Online]. Available: https://cloud.google.com/storage
    [12] Microsoft Azure Blob Storage. [Online]. Available: https://azure.microsoft.com/en-us/products/storage/blobs/
    [13] Kubernetes. [Online]. Available: https://kubernetes.io/
    [14] Docker. [Online]. Available: https://www.docker.com/
    [15] Apache Hive. [Online]. Available: https://hive.apache.org/
    [16] RocksDB. [Online]. Available: http://rocksdb.org/
    [17] LevelDB. [Online]. Available: https://github.com/google/leveldb
    [18] Patrick O’Neil et al., "The log-structured merge-tree (LSM-tree)," presented at the Proceedings of the Acta Informatica, 1996, 33(4): p. 351-385.
    [19] Apache Hbase. [Online]. Available: https://hbase.apache.org/
    [20] E.F. Codd, S.B. Codd and C.T. Salley, Providing OLAP (On-line Analytical Processing) to User-analysts: An IT Mandate, Codd & Associates, 1993.
    [21] QI Gui-jie et al., "Data warehouse based OLAP Application Research——Applied to an Appliance Company," presented at the Proceedings of the Chinese Journal of Management Science, 2006, (2): p. 112-116.
    [22] TPC. [Online]. Available: http://tpc.org/
    [23] TPC-DS. [Online]. Available:http://www.tpc.org/tpcds/
    [24] YCSB. [Online]. Available:https://github.com/brianfrankcooper/YCSB
    [25] Redis. [Online]. Available: https://redis.io/
    [26] MongoDB. [Online]. Available: https://www.mongodb.com/
    [27] Brian F. Cooper et al., "Benchmarking Cloud Serving Systems with YCSB," presented at the Proceedings of the 1st ACM symposium on Cloud computing, 2010, p. 143-154.
    [28] Column Family in RocksDB. Available: https://github.com/facebook/rocksdb/wiki/Column-Families
    [29] Benchmarking Ozone: Cloudera’s next generation Storage for CDP. [Online]. Available: https://blog.cloudera.com/benchmarking-ozone-clouderas-next-generation-storage-for-cdp/
    [30] Apache Impala. [Online]. Available: https://impala.apache.org/
    [31] Apache Phoenix. [Online]. Available: https://phoenix.apache.org/

    無法下載圖示 校內:2027-12-02公開
    校外:2027-12-02公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE