| 研究生: |
張弘諭 Chang, Hung-Yu |
|---|---|
| 論文名稱: |
適應性多應用程序MapReduce處理框架於圖形處理器之研究與實現 Adaptive MapReduce Framework for Multi-Application Processing on GPU |
| 指導教授: |
黃悅民
Huang, Yeuh-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 74 |
| 中文關鍵詞: | MapReduce 、GPU 、GPGPU 、Mars 、操作重疊性 |
| 外文關鍵詞: | MapReduce, GPU, GPGPU, Mars, Overlapped GPU Operations |
| 相關次數: | 點閱:89 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於近年來電子資訊科技之迅速進展,各大企業所需處理之資料量也與日俱增。隨著分散式處理框架MapReduce之發展與演進,大量資料之處理也不再是難題。各領域之應用可藉由普遍運行於大量CPU群上之MapReduce框架,對資料進行平行與分散式運算,以提高處理效率。而隨著Graphics Processing Unit硬體技術之提升,其大量之運算核心數量及其強大之運算能力使之足以勝任更多工作負載之處理。有許多MapReduce運算框架也逐漸於GPU上以GPGPU之概念設計與實現,更進一步提升計算效能。
而目前普遍運作於GPU之MapReduce框架主要以單應用程序為主,無法同時處理多應用程序,對於多應用程序之服務需求僅能以序列式之方式處理,且欠缺具效率之資料分割及資源排程管理方式,使得在多應用程序之處理下,未能有效發揮其硬體之效能。
本研究基於現有GPU之MapReduce框架-Mars設計一多應用程序平行處理機制。依據當前所有應用程序之處理需求、硬體資源需求量及資料處理量,分割由多應用程序所造成之大量資料,並依據硬體負荷能力傳送適合之工作負載片段予以處理。同時考量相關硬體控制之重疊性,以求較高之硬體運作重疊性,增進其處理效能。本研究以普遍應用於MapReduce框架計算之應用程式作為實驗之工作負載,並以執行時間做為其效能改善之指標。其整體多應用程序於此平行處理機制下之平均速率增進約為1.3倍。
With the improvements in electronic and computer technology, the amount of data to be processed by each enterprise is getting larger. Handling such amount of data is not a big challenge with the help of MapReduce framework anymore. Many applications from every field can take advantage of MapReduce framework on large amount of CPUs for efficient distributed and parallel computing. On the other hand, graphics processing unit (GPU) technology is also improving. The multi-cores GPU provides stronger computing power that is capable of handling more workloads and data processing. Many MapReduce frameworks are gradually designed and implemented in general purpose graphics processing unit concept on GPU hardware to achieve better performance.
However, most GPU MapReduce frameworks are focusing single application processing so far. In other words, no more methodologies or mechanisms are provided for multi-application execution and only can be processed in sequential order. The GPU hardware resources may not be fully utilized and distributed that result in the decrease of computing performance.
This study designs and implements a multi-application execution mechanism based on the state-of-the-art GPU MapReduce framework, Mars. It not only provides problem partitioning utility, by considering the data size and hardware resources requirements of each application, but also feeds appropriate amount of workloads into GPU with overlapped GPU operations for efficient parallel execution. Finally, several common applications are used to verify the applicability of this mechanism. The time cost is the main evaluation metric in this study. The overall 1.3 speedup for random application combinations is achieved with the proposed method.
[1] “Apache Hadoop”, http://hadoop.apache.org, retrieved on March 2013.
[2] D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods. Mass: Athena Scientific, 1996.
[3] K. E. Batcher, “Sorting Networks and their Applications,” Proceedings of the AFIPS Spring Joint Computer Conference, vol. 32, pp. 307-314, 1968.
[4] “CUDA”, http://developer.nvidia.com/category/zone/cuda-zone, retrieved on March 2013.
[5] “Compute Capability”, http://www.geeks3d.com/20100606/gpu-computing-nvidia-cuda-compute-capability-comparative-table/, retrieved on March 2013.
[6] L. Chen, and G. Agrawal, “Optimizing MapReduce for GPUs with Effective Shared Memory Usage,” Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, pp. 199-210, 2012.
[7] L. Chen, X. Huo, and G. Agrawal, “Accelerating MapReduce on a Coupled CPU-GPU Architecture,” Proceedings of the international conference on High Performance Computing, Networking, Storage and Analysis, no. 25, 2012.
[8] B. Catanzaro, N. Sundaram, and K. Keutzer, “A Map Reduce Framework for Programming Graphics Processors,” Workshop on Software Tools for MultiCore Systems, 2008.
[9] T. Chen, R. Raghavan, J. N. Dale, and E. Iwata, “Cell Broadband Engine Architecture and its First Implementation – A Performance View,” IBM Journal of Research and Development, vol. 51, no. 5, pp. 559-572, 2007.
[10] R. M. Chen, S. T. Lo, Y. M. Huang, and C.M. Wang, "Solve Multiprocessor Real-Time Scheduling Using Competitive Slack Neural Networks," International Computer Symposium, 2004.
[11] S. C. Cheng, and Y. M. Huang, "Scheduling Multi-Processor Tasks with Resource and Timing Constraints Using Genetic Algorithm," Proceedings of 5th IEEE International Symposium on Computational Intelligence in Robotics and Automation, 2003.
[12] R. M. Chen, and Y. M. Huang, "Multiprocessor Task Assignment with Fuzzy Hopfield Neural Network Clustering Technique," Neural Computing & Applications, vol. 10, pp. 12-21, 2001.
[13] R. M. Chen, and Y. M. Huang, "Multiconstraint Task Scheduling in Multiprocessor System by Neural Network," Proceedings of 10th IEEE International Conference on Tools with Artificial Intelligence, 1998.
[14] A. Dou, V. Kalogeraki, D. Gunopulos, T. Mielikainen, and V. H. Tuulos, “Misco: A MapReduce Framework for Mobile Systems,” Proceedings of the 3rd international conference on Pervasive Technologies Related to Assistive Environments, no. 32, 2010.
[15] J. Dean, and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, pp. 107-113, 2008.
[16] “Flynn分類法”, http://computing.llnl.gov/tutorials/parallel_comp, retrieved on March 2013.
[17] “GPGPU”, http://gpgpu.org/about, retrieved on March 2013.
[18] D. Gross, and C. M. Harris, Fundamentals of Queueing Theory, 3rd ed. New York: Wiley, 1998.
[19] B. V. Gnedenko, and I. N. Kovalenko, Introduction to Queueing Theory, 2nd ed. Boston: Birkhauser, 1989.
[20] C. Hong, D. Chen, W. Chen, W. Zheng, and H. Lin, “MapCG: Writing Parallel Program Portable between CPU and GPU,” Proceedings of the 19th international conference on Parallel Architecture and Compilation Techniques, pp. 217-226, 2010.
[21] B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang, “Mars: A MapReduce Framework on Graphics Processors,” Proceedings of the 17th international conference on Parallel Architectures and Compilation Techniques, pp. 260-269, 2008.
[22] Y. M. Huang, and R. M. Chen, "Scheduling Multiprocessor Job with Resource and Timing Constraints using Neural Networks," IEEE Transactions on Systems, Man, and Cybernetics on, vol. 29, pp. 490-502, 1999.
[23] F. Ji, and X. Ma, “Using Shared Memory to Accelerate MapReduce on Graphics Processing Units,” Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 805-816, 2011.
[24] S. T. Lo, R. M. Chen, Y. M. Huang, and C. L. Wu "Multiprocessor System Scheduling with Precedence and Resource Constraints Using an Enhanced Ant Colony System," Expert Systems With Applications, vol. 34, pp. 2071-2081, 2008.
[25] “Mars”, http://www.cse.ust.hk/gpuqp/Mars.html, retrieved on December 2012.
[26] NVIDIA, “NVIDIA CUDA C Programming Guide, version 4.2,” NVIDIA Cooperation, retrieved on January 2013.
[27] NVIDIA, “NVIDA’s Next Generation CUDATM Compute Architecture: KeplerTM GK110,” NVIDIA Cooperation, retrieved on January 2013.
[28] NVIDIA, “The CUDA Compiler Driver NVCC,” NVIDIA Cooperation, retrieved on January 2013.
[29] G. F. Newell, Applications of Queueing Theory, 2nd ed. New York: Chapman and Hall, 1982.
[30] “OpenCL”, http://www.khronos.org/opencl/, retrieved on March 2013.
[31] “Overlapping”, http://developer.nvidia.com/content/how-overlap-data-transfers-cuda-cc, retrieved on March 2013.
[32] “Phoenix”, http://mapreduce.stanford.edu, retrieved on March 2013.
[33] A. Papagiannis, and D. S. Nikolopoulos, “Rearchitecturing MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories,” Proceedings of the 39th international conference on Parallel Processing, pp. 121-130, 2010.
[34] H. Peters, O. Schulz-Hildebrandt, and N. Luttenberger, “Parallel External Sorting for CUDA-enabled GPUs with Load Balancing and Low Transfer Overhead,” Proceedings of the IEEE International Parallel & Distributed Processing Workshops and Phd Forum, pp. 1-8, 2010.
[35] J. A. Stuart, and J. D. Owens, “Multi-GPU MapReduce on GPU Clusters,” Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 1068-1079, 2011.
[36] N. Sundaram, A. Raghunathan, and S. T. Chakradhar, “A Framework for Efficient and Scalable Execution of Domain-specific Templates on GPUs,” Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 1-12, 2009.
[37] M. Tanner, Practical Queueing Analysis. New York: McGraw-Hill, 1995.
校內:2018-08-12公開