簡易檢索 / 詳目顯示

研究生: 陳昱霖
Chen, Yu-Lin
論文名稱: DRS之有效資源管理
Efficient Resource Management in DRS
指導教授: 蕭宏章
Hisao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 25
中文關鍵詞: Hadoop YARNRDistributed processing
外文關鍵詞: Hadoop YARN, R, Distributed processing
相關次數: 點閱:212下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • DRS (Distributed R Service)是一個將R程式分散執行的框架,用分佈式運算來解決R語言在單機環境下面臨大量資料運算時資源的不足,如記憶體大小、運算資源的限制,並對R使用者隱藏了分散式執行的細節,降低使用者的學習門檻。DRS是建立於Hadoop YARN上的一個應用服務,利用YARN提供的叢集資源管理與分配,建構出適合R執行的工作流程,也進一步提供分散式支援功能如動態分配任務、本地性任務排程、錯誤回復、R使用者自定義函數、等等。
    在本文中,我們將進一步探討DRS的資源管理,討論在現有YARN架構下,如何在管理有限的資源以同時提供多個租戶使用者,以及如何對分散執行的R程式做更進一步的資源監控與評測。透過這些的支援功能,使達到DRS裡資源能被有效運用。

    R is a free, open source script language for data manipulation, calculation, statistical computing and graphical display. As R becoming common choice for data mining and analyzing. R users need a distributed solution to solve the problem of single environmental performance bottlenecks. DRS (Distributed R Service) is a framework for the distributing computing R program on Hadoop cluster, providing simple user interface and hiding distributed issues to user. In this paper, we will introduce DRS’s architecture and application working flow, then explain how it utilized the cluster resource management of Hadoop YARN. The fundamental idea of YARN is to split up the functionalities of resource manager and job controlling and scheduling into separate component. We will discuss how to share resource between DRS and other YARN application, and discuss how to adjust efficient resource setting on DRS. In the end, we also discuss the future work of DRS.

    摘要 i ABSTRACT ii Extended Abstract iii 致謝 vi 目錄 vii 表目錄 ix 圖目錄 x CHAPTER 1 簡介 1 CHAPTER 2 研究背景 4 2.1 YARN : Hadoop 運算框架 4 2.1.1 ResourceManager (RM) 5 2.1.2 NodeManager (NM) 5 2.1.3 運算模型(Programming Model) 5 2.2 DRS 6 2.2.1 Client 6 2.2.2 ApplicationMaster (AM) 6 2.2.3 DRS Container 7 CHAPTER 3 多租戶資源管理 8 3.1 問題敘述 8 3.1.1 不同 YARN 租戶之間的資源競爭 8 3.1.2 不同 DRS 租戶之間的資源競爭 9 3.2 研究方法 9 3.2.1 YARN Hierarchical Queue 設計 9 3.2.2 使DRS 支援Resource Preemption 10 3.2.3 實作DRS 運算排隊機制 11 CHAPTER 4 資源監控、自動化評測 13 4.1 資源監控 13 4.1.1 問題敘述 13 4.1.2 研究方法 14 4.2 自動化評測 15 4.2.1 問題敘述 15 4.2.2 研究方法 16 CHAPTER 5 實驗 17 5.1 實驗環境 17 5.2 多租戶的資源管理方式之效益 18 5.3 DRS資源監控、自動化評測之效益 19 CHAPTER 6 結論 22 參考資料 24

    [1] Hadoop. https://hadoop.apache.org/
    [2] HDFS. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” in Proc. of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
    [3] MapReduce. Jeffrey Dean and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters”. Commun. ACM,2008.
    [4] YARN. V. K. Vavilapalli et al., “Apache Hadoop YARN: Yet Another Resource Negotiator,” in Proc. of the 4th Annual Symposium on Cloud Computing, New York, NY, USA, 2013, p. 5:1–5:16.
    [5] Spark. https://spark.apache.org/
    [6] Storm. http://storm.apache.org/
    [7] R. Ihaka and R. Gentleman. “R: A Language for Data Analysis and Graphics”. in Proc. of Journal of Computational and Graphical Statistics, 5(3):299--314, 1996
    [8] DRS. 黃彥周. “基於Hadoop之非MapReduce的大資料R平行運算”,成功大學分散式系統實驗室。
    [9] Rmr2. https://github.com/RevolutionAnalytics/rmr2
    [10] SparkR. Venkataraman1, Z. Yang1, D. Liu, E. Liang, Hossein , Falaki, X. Meng, R. Xin, A. Ghodsi, M. Franklin, I. Stoica, and M. Zaharia. “SparkR: Scaling R Programs with Spark.” In Proceeding of SIGMOD’16, 2014.
    [11] JRI. https://rforge.net/JRI/
    [12] Zookeeper. https://zookeeper.apache.org/
    [13] JNI. http://docs.oracle.com/javase/8/docs/technotes/guides/jni/index.html
    [14] JDBC. http://docs.oracle.com/javase/8/docs/technotes/guides/jdbc/
    [15] MySQL. https://www.mysql.com/
    [16] Phoenix. https://phoenix.apache.org/
    [17] cgroups. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
    [18] Docker. https://www.docker.com/

    無法下載圖示
    校外:不公開
    電子論文及紙本論文均尚未授權公開
    QR CODE