| 研究生: |
王曦涓 Wang, Hsi-Chuan |
|---|---|
| 論文名稱: |
使用Petri Net估算在MapReduce模型下的程式執行時間 Using Petri Net to Estimate Job Execution Time in MapReduce Model |
| 指導教授: |
鄭憲宗
Cheng, Sheng-Tzong |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 雲端運算 、MapReduce 、派翠網路 、程式執行時間 |
| 外文關鍵詞: | Cloud computing, MapReduce, Petri Net, Job execution time |
| 相關次數: | 點閱:75 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著海量資料的時代來臨,大規模資料量的處理已成為資訊領域的重點發展項目,並引領雲端運算技術在近年來蓬勃成長,其中MapReduce可稱為雲端運算的關鍵技術,目的在當開發人員了解平行處理的概念後,即能快速應用MapReduce框架撰寫平行處理程式,藉此應用叢集的高效能迅速完成大量資料處理,而Hadoop即為其中一種最為廣泛使用的MapReduce實現,本研究亦使用Hadoop作為測試平台。當MapReduce程式開發完成後,開發人員並無從得知該程式在測試叢集環境中應有的效能表現,尤其Hadoop中有許多與效能有關的參數,因此常在效能校調上花費許多時間,或需為此深入研究MapReduce的運作細節,再經多次測試以找出較適合該MapReduce程式的效能相關參數。
本研究探討MapReduce中各階段的運作細節,為第一篇提出應用隨機程序派翠網路設計MapReduce模型的研究,稱為SPN-MR,並定義各timed transition的平均延遲時間計算公式,藉此在數百毫秒內模擬出該MapReduce程式處理多少資料量所對應的執行時間,減少開發人員在效能校調時的時間耗費。在實驗結果中,亦比較本系統所估算的執行時間與實測數據的準確性,在10GB的輸入資料內的平均誤差可達到百分之五內,可為MapReduce開發人員提供有效的程式執行時間評估數據。
Handling the vast amounts of data being generated currently requires large-scale data-processing techniques and has led to the recent growth of cloud computing. MapReduce is a key technique among several new concepts in cloud computing. After understanding how to develop parallel processing, programmers can use the MapReduce framework rapidly to complete parallel processing programs and exploit the high performance of computer clusters. One of the MapReduce implementations used most widely is Hadoop, which is the test platform used in this thesis. However, when running a MapReduce job, programmers cannot obtain information about how the performance of their application will be in their own test environment. Moreover, because several parameters affect the performance of Hadoop, programmers must spend a substantial amount of time identifying the most suitable parameters or studying their details in MapReduce.
In this thesis, execution details in MapReduce are examined in depth and described using Stochastic Petri Net (SPN) to then develop the SPN-MapReduce (SPN-MR) model. To analyze the performance of SPN-MR, mean delay time formulas are defined for each timed transition. SPN-MR is the first proposed which can estimate the execution time of MapReduce jobs with a known input data size in hundreds of milliseconds and reduce the time spent by programmers in tuning performance. The experimental results of SPN-MR are compared with two benchmarks of actual tests and the average error range is found to be within 5% under 10GB input data. Therefore, SPN-MR can enable MapReduce programmers to evaluate performance effectively.
[1] F. Ahmad, S. Chakradhar, A. Raghunathan, T. N. Vijaykumar, “Tarazu: Optimizing MapReduce On Heterogeneous Clusters,” ASPLOS XVII Proc. of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, pages 61-74, 2012.
[2] F. Ahmad, S. Lee, M. Thottethodi, and T. N. Vijaykumar, “PUMA: Purdue MapReduce Benchmarks Suite,” 2012.
[3] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” OSDI ’04, pages 137–150, 2004.
[4] N. J. Dingle, W. J. Knottenbelt, and T. Suto, “PIPE2: A Tool for the Performance Evaluation of Generalised Stochastic Petri Nets,” ACM SIGMETRICS Performance Evaluation Review, 36(4):34–39, 2009.
[5] A. Ferscha, “A Petri Net Approach for Performance Oriented Parallel Program Design,” Journal of Parallel and Distributed Computing, 15(3):188–206, Special Issue on Petri Net Modelling of Parallel Computers, 1992.
[6] A. Ganapathi, Y. Chen, A. Fox, R. Katz, and D. Patterson, “Statistics-Driven Workload Modeling for the Cloud,” IEEE 26th International Conference on Data Engineering Workshops (ICDEW), 2010.
[7] H. Khazaei, J. Misic, and V. B. Misic, “Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Queuing Systems,” IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No. 5, 2012.
[8] G. Leey, B. Chunz, R. H. Katz, “Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud,” HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing, 2011.
[9] Y. Liu, M. Li, N. K. Alham, and S. Hammoud, “HSim: A MapReduce simulator in enabling Cloud Computing,” Future Generation Computer Systems, 2011.
[10] M. A. Marsan, G. Conte, and G. Balbo, “A Class of Generalized Stochastic Petri Nets for the Performance Evaluation of Multiprocessor Systems,” ACM Transactions on Computer Systems, Vol. 2, No. 2, pages 93-122, 1984.
[11] T. Murata, “Petri Nets: Properties, Analysis and Applications,” Proc. of the IEEE, Vol. 77, No. 4, pages 541-580, 1989.
[12] V. S. Martha, W. Zhao, X. Xu, “h-MapReduce: A Framework for Workload Balancing in MapReduce,” IEEE 27th International Conference on Advanced Information Networking and Applications, 2013.
[13] R. H. Saavedra-Barrera, D. E. Culler, and T. V. Eicken, “Analysis of Multithreaded Architectures for Parallel Computing,” 2nd Annual ACM Symposium on Parallel Algorithms and Architectures, 1990.
[14] F. Tian and K. Chen, “Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds,” IEEE International Conference on Cloud Computing (CLOUD), 2011.
[15] A. Verma, L. Cherkasova, and R. H. Campbell, “Play It Again, SimMR!,” Proc. of the IEEE International Conference on Cluster Computing, pages 253-261, 2011.
[16] G. Wang, A. R. Butt, P. Pandey, and K. Gupta, “A Simulation Approach to Evaluating design decisions in MapReduce setups,” IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009.
[17] T. White, “Hadoop: The Definitive Guide,” Chapter 6. How MapReduce Works, O’REILLY Media, 2009.
[18] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares, and X. Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters,” IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.
[19] H. Yang, Z. Luan, W. Li, and D. Qian, “MapReduce Workload Modeling with Statistical Approach,” J Grid Computing 10:279-310, 2012.
[20] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments,” OSDI’08: 8th USENIX Symposium on Operating Systems Design and Implementation, 2008.
校內:2018-08-23公開