| 研究生: |
孫崇恩 Sun, Chung-En |
|---|---|
| 論文名稱: |
基於Hadoop之輕量級資料轉傳系統 Hadoop Data Service Lite in a Server Box |
| 指導教授: |
蕭宏章
Hsiao, Hung-Chang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 26 |
| 中文關鍵詞: | Hadoop 、HDFS 、HBase 、分散式儲存 、資料一致性 |
| 外文關鍵詞: | Hadoop, HDFS, HBase, Distributed Storage System, Data Consistency |
| 相關次數: | 點閱:125 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Hadoop是目前被世界許多大公司與研究機構廣泛使用與討論的大型分散式儲存與運算框架的系統,其儲存系統HDFS(Hadoop Distributed File System),旨在處理TB級別以上的巨量資料,然而目前除了Hadoop原生檔案系統的操作與使用,還有許多元件與功能,是依循其框架或設計理念,而被研發出能增進HDFS管理效能,或是亦於讓使用者更方便操作HDFS。目前的HDFS主要使用物件導向程式(Object-oriented programming) 程式JAVA撰寫,在使用上,除了系統本身各元件之架構與資料流的運行,需花一定時間了解,其撰寫所使用的程式語言,也是學習上的一項門檻,對於使用Hadoop有更進階需求的開發者,無疑需要累積相當的實力與開發經驗,才能順利在系統上增添功能,惟HDFS在不斷的增加、修正版本後,也陸續提供了許多的框架與API(Application Programming Interface),讓開發者能省略部分底層設計原型,進而達到快速開發與方便上手的設計模式與技巧。
本篇論文中,我們基於HDFS之分散式儲存系統,以及其他常見可用來儲存資料的工具,如FTP(File Transfer Protocol)、HTTP(HyperText Transfer Protocol)、Linux,設計出目的在省略巨量資料平台底層的複雜操作,以目前被廣泛使用且簡易的API,來設計跨不同檔案伺服器的資料轉傳系統,降低HDFS對於使用者入門的障礙,並探討Hadoop與此轉傳系統,建置於單一伺服器的極限與讀寫議題。
The collection and analysis of data has always been the trend of current technology development. How to accommodate huge amounts of data through high-speed and efficient systems, which can be handled by non-traditional stand-alone database, in the face of such level of data, It is necessary to use multi-node or even decentralized systems to achieve this goal. In addition to the scalability of the storage space, such systems have also made a certain amount of consideration and optimization for a large number of read and write requests, and the most The highly respected system is the Hadoop project. This system achieves users' huge amounts of data through the advantages of master-slave architecture, multi-node backup mechanism, control of read-write locks, and the ability to arbitrarily expand related analysis and calculation projects. demand. In this paper, we have a distributed storage system based on HDFS, and other tools that can be used to store data, such as FTP (File Transfer Protocol), HTTP (HyperText Transfer Protocol), Linux, designed to omit huge data platforms. The underlying complex operation, with the currently widely used and simple API, to design a data transfer system across different file servers, to reduce the barriers to HDFS entry for users, and to explore Hadoop and this transfer system, built in a single Server limits and read and write issues.
[1] Hadoop. Available: https://hadoop.apache.org/
[2] HDFS. Available: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
[3] HBase. Available: https://hbase.apache.org/
[4] MapReduce. Available: J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
[5] Hive. Available: https://hive.apache.org/
[6] VNIC. Available: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.2/html/Technical_Reference_Guide/Virtual_Network_Interface_Controller_VNIC.html
[7] Load Balance. Available: H. C. Hsiao, H. Y. Chung, H. Shen, and Y. C. Chao, “Load Rebalancing for Distributed File Systems in Clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 5, pp. 951–962, May 2013.
[8] Transparency. Available: https://www.techopedia.com/definition/30732/protocol-transparent
[9] Memstore. Available: https://hbase.apache.org/book.html
[10] Datanode. Available: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html
[11] Namenode. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
[12] Zookeeper. Available: P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper: Wait-free Coordination for Internet-scale Systems,” in Proc. of the 2010 USENIX Conference on USENIX Annual Technical Conference, Berkeley, CA, USA, 2010, pp. 11–11.
[13] SQL. Available: https://www.mysql.com/
[14] JDBC. Available: https://docs.oracle.com/javase/7/docs/api/java/sql/package-summary.html
[15] Google File System. Available: S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” in Proc. of the Nineteenth ACM Symposium on Operating Systems Principles, New York, NY, USA, 2003, pp. 29–43.
[16] Secondary Namenode. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
[17] BigTable. Available: F. Chang et al., “Bigtable: A Distributed Storage System for Structured Data,” ACM Trans. Comput. Syst., vol. 26, no. 2, p. 4:1–4:26, Jun. 2008.
[18] RegionServer. Available: https://hbase.apache.org/book.html
[19] FTP. Available: FTP. Available: https://www.ietf.org/rfc/rfc959.txt
[20] Spark. Available: https://spark.apache.org/