成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	施韋銨 Shih, Wei-An
論文名稱：	Hadoop分散式R運算服務之智慧及動態資源配置 Intelligent, Adaptive Resource Allocation for Distributed R Computing Service over Hadoop
指導教授：	蕭宏章 Hsiao, Hung-Chang
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	中文
論文頁數：	31
中文關鍵詞：	Hadoop YARN 、R 、分散式
外文關鍵詞：	Hadoop YARN, R, distributed
相關次數：	點閱：84 下載：4
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

R語言，是當前在資料統計、繪圖常用的腳本語言之一，但R在本身得設計上，卻是以單執行緒運行，雖然目前已經有許多平行化的套件，但此類型平行套間皆以核心數為單位來平行運算，使得面對大量資料下的R使用者需要伺服器等級的環境才得以解決問題。Distributed R Service (DRS)，顧名思義就是將R程式分散的一個服務，是相關研究人員提出的分散式運算框架，用以解決R程式在單機上遇到大量資料處理的問題，簡單來說，就是資源的不足導致運算效率差或工作執行失敗。對R使用者來說，使用DRS只需要懂得定義好一個DRS工作內容，因為DRS對R使用者隱藏了分散式執行的細節，不需要特別去撰寫原本R程式以外的部分，以降低使用者的學習門檻。DRS是建立於Hadoop YARN上的一個應用服務，就像Spark與MR一樣依賴YARN的資源管理分配，DRS也依賴YARN提供的叢集資源管理與分配功能，去建構出適合R執行的工作流程，也進一步提供分散式支援功能如動態分配任務、任務排程、錯誤回復、R使用者自定義函數等等。
本文中，將會說明在一個固定資源的工作下，如何解決R任務因記憶體不足而導致工作失敗的問題，與原本資源利用不佳的問題。透過這些設計，讓DRS能更彈性的被使用，給使用者依自己環境的需求，調整相對應的設置。

R, is a programming language for statistical computing and graphics. It is widely used among statisticians and data miners for developing statistical software and data analysis. The Distributed R Service (DRS) is a service for R language to distribute on Hadoop compute platform. Unlike R-Hadoop, spark-R, and Distributed-R, DRS is friend-ly for users to use. DRS hides the distributed implements to users. So, DRS's users do not need to modify the logic in your R code. Just set up the configuration and R code and you will get the benefit of distributing.
In this paper, we will show how to schedule the fixed resources in a DRS job and how to solve the failed task owing to the memory problem. By our new design, DRS get more elasticity in resources. DRS’s users can set up some configuration about resources. Otherwise, DRS also adjusts the resource to appropriate combination of container. Finally, we will present how big performance improvements of DRS.

摘要	1
ABSTRACT	2
EXTEND ABSTRACT	3
致謝	5
目錄	6
表目錄	8
圖目錄	9
CHAPTER 1 簡介	10
CHAPTER 2 研究背景	12
1 Hadoop YARN	12
1.1 ResourceManager (RM)	13
1.2 ApplicationMaster (AM)	13
1.3 NodeManager (NM)	14
1.4 Container	14
1.5 YARN Client	14
1.6 YARN的工作流程	14
2 DRS	15
2.1 DRSClient	15
2.2 ApplicationMaster (AM)	16
2.3 DRSContainer	16
3 HDS	17
3.1 檔案傳輸	17
3.2 檔案查詢	17
3.3 供DRS專用的RESTful API	17
CHAPTER 3 動態任務分發與動態資源調度	18
1 問題敘述	18
1.1 任務資源的浪費	18
1.2 任務資源的不足	18
2 研究方法	19
2.1 Resource Pool	19
2.2 Multiple Task Queues	19
2.3 Matchmaker	20
3 研究方法細節說明	21
3.1 ContainerManager	21
3.2 Task Queues	21
3.3 新的議題敘述	22
4 動態資源調整	23
4.1 動態調整資源的因子	23
4.2 動態調整的條件與時機	23
4.3 研究方法範例說明	24
CHAPTER 4 實驗	25
1 實驗環境	25
2 動態任務分發下對失敗率的影響	26
3 動態資源調整下對整體執行時間的影響	26
CHAPTER 5 相關研究	28
CHAPTER 6 結論	29
參考資料	30
                                    

[1] Hadoop. https://hadoop.apache.org/
[2] K. Shvachko, et al. HDFS: The Hadoop Distributed File System. In Proceedings of
the IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010.
[3] V. K. Vavilapalli, et al. YARN: Apache Hadoop YARN: Yet Another Resource
Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing, 2013.
[4] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on
Large Clusters. Commun. Association for Computing Machinery, 2008.
[5] Spark. https://spark.apache.org/
[6] Storm. http://storm.apache.org/
[7] Flink. https://flink.apache.org/
[8] Ihaka and R. Gentleman. R: A Language for Data Analysis and Graphics. In
Journal of Computational and Graphical Statistics, 1996.
[9] rmr2. https://github.com/RevolutionAnalytics/rmr2
[10] Venkataraman1, et al. SparkR: Scaling R Programs with Spark. In
Proceedings of SIGMOD’16, 2014.
[11] Distributed R. http://www.hpl.hp.com/research/distributedr.htm
[12] 黃彥周. DRS: Massively R Data Parallel Computation over Hadoop without
MapReduce. 成功大學分散式系統實驗室, 2016.
[13] 曾冠博. HDS: The Web-based Data Service over Hadoop. 成功大學分散式系統實驗室, 2017.
[14] B. Hindman, et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In Networked Systems Design and Implementation, 2011.
[15] Malte Schwarzkopf, et al. Omega: flexible, scalable schedulers for large compute
clusters. In Proceedings of EuroSys’13, 2013.
[16] Corona. https://docs.coronalabs.com/
[17] Python. https://www.python.org/
[18] TensorFlow. https://www.tensorflow.org/

校內：2023-08-07公開
校外：2023-08-07公開

簡易檢索 / 詳目顯示

相關論文