簡易檢索 / 詳目顯示

研究生: 孫崇恩
Sun, Chung-En
論文名稱: 基於Hadoop之輕量級資料轉傳系統
Hadoop Data Service Lite in a Server Box
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 26
中文關鍵詞: HadoopHDFSHBase分散式儲存資料一致性
外文關鍵詞: Hadoop, HDFS, HBase, Distributed Storage System, Data Consistency
相關次數: 點閱:125下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Hadoop是目前被世界許多大公司與研究機構廣泛使用與討論的大型分散式儲存與運算框架的系統,其儲存系統HDFS(Hadoop Distributed File System),旨在處理TB級別以上的巨量資料,然而目前除了Hadoop原生檔案系統的操作與使用,還有許多元件與功能,是依循其框架或設計理念,而被研發出能增進HDFS管理效能,或是亦於讓使用者更方便操作HDFS。目前的HDFS主要使用物件導向程式(Object-oriented programming) 程式JAVA撰寫,在使用上,除了系統本身各元件之架構與資料流的運行,需花一定時間了解,其撰寫所使用的程式語言,也是學習上的一項門檻,對於使用Hadoop有更進階需求的開發者,無疑需要累積相當的實力與開發經驗,才能順利在系統上增添功能,惟HDFS在不斷的增加、修正版本後,也陸續提供了許多的框架與API(Application Programming Interface),讓開發者能省略部分底層設計原型,進而達到快速開發與方便上手的設計模式與技巧。
    本篇論文中,我們基於HDFS之分散式儲存系統,以及其他常見可用來儲存資料的工具,如FTP(File Transfer Protocol)、HTTP(HyperText Transfer Protocol)、Linux,設計出目的在省略巨量資料平台底層的複雜操作,以目前被廣泛使用且簡易的API,來設計跨不同檔案伺服器的資料轉傳系統,降低HDFS對於使用者入門的障礙,並探討Hadoop與此轉傳系統,建置於單一伺服器的極限與讀寫議題。

    The collection and analysis of data has always been the trend of current technology development. How to accommodate huge amounts of data through high-speed and efficient systems, which can be handled by non-traditional stand-alone database, in the face of such level of data, It is necessary to use multi-node or even decentralized systems to achieve this goal. In addition to the scalability of the storage space, such systems have also made a certain amount of consideration and optimization for a large number of read and write requests, and the most The highly respected system is the Hadoop project. This system achieves users' huge amounts of data through the advantages of master-slave architecture, multi-node backup mechanism, control of read-write locks, and the ability to arbitrarily expand related analysis and calculation projects. demand. In this paper, we have a distributed storage system based on HDFS, and other tools that can be used to store data, such as FTP (File Transfer Protocol), HTTP (HyperText Transfer Protocol), Linux, designed to omit huge data platforms. The underlying complex operation, with the currently widely used and simple API, to design a data transfer system across different file servers, to reduce the barriers to HDFS entry for users, and to explore Hadoop and this transfer system, built in a single Server limits and read and write issues.

    目錄 摘要 i Extended Abstract ii 致謝 v 目錄 vi 表目錄 viii 圖目錄 ix Chapter 1. 簡介 1 Chapter 2. 研究背景 5 2.1 HDFS 5 2.2 HBase 6 2.3 Phoenix 6 2.4 Zookeeper 6 Chapter 3. 動機與議題 7 3.1動機 7 3.2議題 8 3.2.1 Network 8 3.2.2 Heavy software components 8 3.2.3 Load balancing 8 3.2.4 Failure 9 3.2.5 Transparency 9 3.2.6 Maintenance 9 3.2.7 Authentication 9 Chapter 4. HDSLite技術架構 10 4.1 使用者角度 10 4.1.1 HTTP APIs 11 4.1.2傳輸檔案 11 4.1.3 其他服務 12 4.1.4 啟動/關閉 HDSLite 與 config 設定 13 4.2 HDSLite系統架構 13 4.2.1 HTTP Server 13 4.2.2 Load Balance 14 4.2.3 Lock Manager 15 4.2.4 Transfer 16 Chapter 5. 實驗 17 5.1測試環境 17 5.2 Stress Test for Write’s 18 5.3 Stress Test for Read’s 21 Chapter 6. 結論 24 參考資料 25

    [1] Hadoop. Available: https://hadoop.apache.org/
    [2] HDFS. Available: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
    [3] HBase. Available: https://hbase.apache.org/
    [4] MapReduce. Available: J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
    [5] Hive. Available: https://hive.apache.org/
    [6] VNIC. Available: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.2/html/Technical_Reference_Guide/Virtual_Network_Interface_Controller_VNIC.html
    [7] Load Balance. Available: H. C. Hsiao, H. Y. Chung, H. Shen, and Y. C. Chao, “Load Rebalancing for Distributed File Systems in Clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 5, pp. 951–962, May 2013.
    [8] Transparency. Available: https://www.techopedia.com/definition/30732/protocol-transparent
    [9] Memstore. Available: https://hbase.apache.org/book.html
    [10] Datanode. Available: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html
    [11] Namenode. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

    [12] Zookeeper. Available: P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper: Wait-free Coordination for Internet-scale Systems,” in Proc. of the 2010 USENIX Conference on USENIX Annual Technical Conference, Berkeley, CA, USA, 2010, pp. 11–11.
    [13] SQL. Available: https://www.mysql.com/
    [14] JDBC. Available: https://docs.oracle.com/javase/7/docs/api/java/sql/package-summary.html
    [15] Google File System. Available: S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” in Proc. of the Nineteenth ACM Symposium on Operating Systems Principles, New York, NY, USA, 2003, pp. 29–43.
    [16] Secondary Namenode. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
    [17] BigTable. Available: F. Chang et al., “Bigtable: A Distributed Storage System for Structured Data,” ACM Trans. Comput. Syst., vol. 26, no. 2, p. 4:1–4:26, Jun. 2008.
    [18] RegionServer. Available: https://hbase.apache.org/book.html
    [19] FTP. Available: FTP. Available: https://www.ietf.org/rfc/rfc959.txt
    [20] Spark. Available: https://spark.apache.org/

    下載圖示 校內:2023-12-31公開
    校外:2023-12-31公開
    QR CODE