成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	謝亦凡 Xie, Yi-Fan
論文名稱：	基於Hadoop之GPU叢集的大資料Python平行運算 Massively Data Parallel Computation with Python over GPU-Enabled Hadoop
指導教授：	蕭宏章 Hsiao, Hung-Chang
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2017
畢業學年度：	105
語文別：	中文
論文頁數：	30
中文關鍵詞：	Hadoop YARN 、Python 、GPU
外文關鍵詞：	Hadoop YARN, Python, GPU
相關次數：	點閱：104 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

DRS (Distributed R Service) 是一個將R程式分散執行的框架，通過分散式計算的方式解決單機環境效能瓶頸的問題,尤其是針對大量資料的計算。在生產環境中，Python也是資料分析人員廣泛使用的腳本語言，DRS分散式計算框架的服務能否也支援Python便成為一個議題。根據過去平行化R程式的經驗，我們知道平行化執行會帶來效能的提升；同時，近年來GPU在高速計算領域的發展與應用 (例如人工智慧建模計算) 也多有突破，這讓我們有了更深層次的思考，即GPU能否為基於DRS分散式環境下的分析計算帶來更進一步的效能提升。
在本研究裡，我們通過對YARN資源監控的研究，將DRS框架擴展，提供Python平行化之叢集計算服務。這個服務包含了GPU資源管理及GPU資源監控等模組。我們透過移植一生產環境下使用的統計程式Indicator於我們所開發的平臺來調查：在具有GPU計算資源的叢集下，探討Indicator程式的以Python語言撰寫的複雜度；同時，也探討這樣的程式如何有效的開發GPU計算上的優勢。我們也討論可能會有的效能障礙。

Python is one of the most favorite language in the world. More and more data analyst choose Python as tool for data analysis. Many big data process frameworks such as Spark and Hadoop Streaming let user manipulate them by using Python. Although it is very convenient, it requires the capability on basic distributed system knowledge of developer.
DRS(distributed R service) is a distributed data processing framework based on Hadoop YARN. It has three main component: Application Master, Client and Container. In this paper, we summarize the design, current state and implementation of our application which support distributed Python service and GPU resource management based on DRS. Our approach is providing another distributed solution for Python users and exploring what should be prepared in building GPU cluster. Considering of providing simple user interface and hiding distributed issues to user. We extends the functionality of Application Master and Container. So AM has ability of allocating and managing GPU resource and Container can execute Python program. Then we introduce the CPU algorithm of Indicator and design the algorithm of Indicator for using GPU.
The result of application design are display by performance experiments. And we got some conclusion as we expected.

中文摘要	i
ABSTRACT	ii
Extended Abstract	iii
致謝	vi
目錄	vii
表目錄	ix
圖目錄	x
CHAPTER 1 簡介	1
CHAPTER 2 研究背景	4
1 Hadoop及YARN簡介	4
2 GPU簡介	5
2.1 GPU/CPU架構比較	5
2.2 浮點計算能力	6
3 CUDA程式設計模型	7
3.1 內核函數	7
3.2 執行緒層次結構	7
3.3 計算內積的實例	8
CHAPTER 3 系統架構	10
1 功能需求	10
2 DRS架構	10
2.1 Client端	11
2.2 AM端	11
2.3 Container端	11
3 YARN資源監控實現原理	11
3.1 基於執行緒監控的記憶體隔離方案	12
4 Python on DRS	13
5 GPU資源管控	14
5.1 資源表示	14
5.2 資源申請	14
5.3 GPU資源監控	15
5.4 GPU資訊的正確性	15
CHAPTER 4 統計實例Indicator	17
1 Indicator介紹	17
2 單執行緒CPU演算法	18
3 多執行緒／多進程CPU演算法	20
4 GPU演算法	20
5 程式複雜度對比	23
CHAPTER 5 實驗	24
1 單機實驗	24
1.1 實驗環境	24
1.2 實驗內容	24
1.3 實驗結果	24
2 叢集實驗	25
2.1 實驗環境	25
2.2 實驗內容	26
2.3 實驗結果	27
CHAPTER 6 結論	29
參考資料	30

                                    

[1]DRS. 黃彥周. “基於Hadoop之非MapReduce的大資料R平行運算”，成功大學分散式系統實驗室。
[2]Hadoop. https://hadoop.apache.org/
[3]NVIDIA CUDA Introduction
http://www.nvidia.com/object/cuda_home_new.html
[4]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4nVppUO4T
[5]V. K. Vavilapalli et al.,“Apache Hadoop YARN: Yet Another Resource Negotiator,” in Proc. of the 4th Annual Symposium on Cloud Computing, New York, NY, USA, 2013, p. 5:1–5:16.
[6]MapReduce. Jeffrey Dean and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters”. Commun. ACM,2008.
[7]Spark. https://spark.apache.org/
[8]NVIDIA System Management Interface http://developer.nvidia.com/nvidia-system-management-interface
[9]PyCUDA http://documen.tician.de/pycuda
[10]Theano deeplearning.net/software/theano
[11]Python https://www.python.org/
[12]http://www.hangge.com/blog/cache/detail_1676.html
[13]R https://www.r-project.org/
[14]Python GIL https://wiki.python.org/moin/GlobalInterpreterLock
[15]CUDA Programming Guide docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

校內：2022-07-01公開
校外：不公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文