| 研究生: |
陳昱霖 Chen, Yu-Lin |
|---|---|
| 論文名稱: |
DRS之有效資源管理 Efficient Resource Management in DRS |
| 指導教授: |
蕭宏章
Hisao, Hung-Chang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 25 |
| 中文關鍵詞: | Hadoop YARN 、R 、Distributed processing |
| 外文關鍵詞: | Hadoop YARN, R, Distributed processing |
| 相關次數: | 點閱:212 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
DRS (Distributed R Service)是一個將R程式分散執行的框架,用分佈式運算來解決R語言在單機環境下面臨大量資料運算時資源的不足,如記憶體大小、運算資源的限制,並對R使用者隱藏了分散式執行的細節,降低使用者的學習門檻。DRS是建立於Hadoop YARN上的一個應用服務,利用YARN提供的叢集資源管理與分配,建構出適合R執行的工作流程,也進一步提供分散式支援功能如動態分配任務、本地性任務排程、錯誤回復、R使用者自定義函數、等等。
在本文中,我們將進一步探討DRS的資源管理,討論在現有YARN架構下,如何在管理有限的資源以同時提供多個租戶使用者,以及如何對分散執行的R程式做更進一步的資源監控與評測。透過這些的支援功能,使達到DRS裡資源能被有效運用。
R is a free, open source script language for data manipulation, calculation, statistical computing and graphical display. As R becoming common choice for data mining and analyzing. R users need a distributed solution to solve the problem of single environmental performance bottlenecks. DRS (Distributed R Service) is a framework for the distributing computing R program on Hadoop cluster, providing simple user interface and hiding distributed issues to user. In this paper, we will introduce DRS’s architecture and application working flow, then explain how it utilized the cluster resource management of Hadoop YARN. The fundamental idea of YARN is to split up the functionalities of resource manager and job controlling and scheduling into separate component. We will discuss how to share resource between DRS and other YARN application, and discuss how to adjust efficient resource setting on DRS. In the end, we also discuss the future work of DRS.
[1] Hadoop. https://hadoop.apache.org/
[2] HDFS. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” in Proc. of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
[3] MapReduce. Jeffrey Dean and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters”. Commun. ACM,2008.
[4] YARN. V. K. Vavilapalli et al., “Apache Hadoop YARN: Yet Another Resource Negotiator,” in Proc. of the 4th Annual Symposium on Cloud Computing, New York, NY, USA, 2013, p. 5:1–5:16.
[5] Spark. https://spark.apache.org/
[6] Storm. http://storm.apache.org/
[7] R. Ihaka and R. Gentleman. “R: A Language for Data Analysis and Graphics”. in Proc. of Journal of Computational and Graphical Statistics, 5(3):299--314, 1996
[8] DRS. 黃彥周. “基於Hadoop之非MapReduce的大資料R平行運算”,成功大學分散式系統實驗室。
[9] Rmr2. https://github.com/RevolutionAnalytics/rmr2
[10] SparkR. Venkataraman1, Z. Yang1, D. Liu, E. Liang, Hossein , Falaki, X. Meng, R. Xin, A. Ghodsi, M. Franklin, I. Stoica, and M. Zaharia. “SparkR: Scaling R Programs with Spark.” In Proceeding of SIGMOD’16, 2014.
[11] JRI. https://rforge.net/JRI/
[12] Zookeeper. https://zookeeper.apache.org/
[13] JNI. http://docs.oracle.com/javase/8/docs/technotes/guides/jni/index.html
[14] JDBC. http://docs.oracle.com/javase/8/docs/technotes/guides/jdbc/
[15] MySQL. https://www.mysql.com/
[16] Phoenix. https://phoenix.apache.org/
[17] cgroups. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[18] Docker. https://www.docker.com/