簡易檢索 / 詳目顯示

研究生: 阮有淨江
Nguyen, Huu Tinh Giang,
論文名稱: 設計與實作一個將單機環境軟體轉換到Hadoop基礎分散式環境的MapReduce框架
Design and Implement a MapReduce Framework for Converting Standalone Software Packages to Hadoop-based Distributed Environments
指導教授: 陳朝鈞
Chen, Chao-Chun
共同指導教授: 洪敏雄
Hung, Min-Hsiung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 65
外文關鍵詞: Mapreduce, Hadoop, Cloudizing, multi-users scheduling.
相關次數: 點閱:95下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • The Hadoop MapReduce is the programming model of designing the auto scalable distributed computing applications. It provides developer an effective environment to attain automatic parallelization. However, most existing manufacturing systems are arduous and restrictive to migrate to MapReduce private cloud, due to the platform incompatible and tremendous complexity of system reconstruction. For increasing the efficiency of manufacturing systems with minimum modification of existing systems, we design a framework in this thesis, called MC-Framework: Multi-users-based Cloudizing-Application Framework. It provides the simple interface to users for fairly executing requested tasks worked with traditional standalone software packages in MapReduce-based private cloud environments. Moreover, this thesis focuses on the multi-users workloads, but the default Hadoop scheduling scheme, i.e., FIFO, would increase delay under multiuser scenarios. Hence, we also propose a new scheduling mechanism, called Job-Sharing Scheduling, to explore and fairly share the jobs to machines in the MapReduce-based private cloud. This study uses an experimental design to verify and analysis the proposed MC-Framework with two case studies: (1) independent model systems include the stochastic Petri nets mode, and (2) dependence model systems include the virtual-metrology module of a manufacturing system. The results of our experiments indicate that our proposed framework enormously improved the time performance compared with the original package.

    Contents i List of Figures iv List of Tables vi Chapter 1 Introduction 1 1.1 Background 1 1.2 Research Problems 3 1.3 Research Motivations and Objectives 4 1.4 Organization of the Thesis 6 Chapter 2 Literature Review 7 2.1 Previous Simulation Frameworks 7 2.2 Scheduling 7 Chapter 3 Technologies Behind 9 3.1 The Hadoop Cluster Architecture 9 3.2 The Hadoop MapReduce Engine 11 3.2.1 The MapReduce Programming Model 11 3.2.2 The MapReduce Execution 12 3.3 The MapReduce Example - WordCount 14 3.4 The Hadoop Default Scheduling - FIFO 16 Chapter 4 MC-Framework: Multi-users-based Cloudizing-Application Framework 18 4.1 Architecture System Design 18 4.1.1 Variable Specification Component in MC-Framework 19 4.1.2 The Design Details of MC-Framework 20 4.2 Operation Analysis 22 4.2.1 Job-Sharing Scheduling: a Mechanism for Impartial Distributed the Requested Tasks. 22 4.2.2 MC- Map Executor: a Mechanism for Executing the Requested Tasks-based Cloudizing. 25 4.3 The High Performance Scheduling for Multi-users Scenarios 28 4.3.1 The Disadvantage of Hadoop Default Scheduling 28 4.3.2 Queue-Sharing: a Mechanism for Multi-users Workload 28 4.4 The MC-Framework Diagram Flowcharts 33 4.4.1 The Diagram Flowchart of Queue-Sharing 33 4.4.2 The Diagram Flowchart of Job-Sharing Scheduler for Task Parallel Jobs 34 4.4.3 The MC-Map Executor Stage Workflow 36 4.4.4 The Irreducible Reduce Stage in MapReduce Framework 36 Chapter 5 MC-Framework Evaluation 38 5.1 The Implement Environment 38 5.1.1 The Metric 38 5.1.2 The Core Hardware and Infrastructure 38 5.2 Adapting MC-Framework to Multiple Independence Models System - Stochastic Petri Nets 39 5.2.1 The SPNP – Stochastic Petri Nets Model 39 5.2.2 The Input Data Specification 41 5.2.3 The Design Adaptation Petri Nets Model to MC-Framework 41 5.3 Adapting MC-Framework to Multiple Dependence Models System – Model Creation in Manufacturing Execution System. 43 5.3.1 The Manufacturing Execution System (MES) 43 5.3.2 The Input Data Specification 44 5.4 Design Adaptation Dependence Models System to MC-Framework. 44 Chapter 6 System Implementation and Results 47 6.1 System Setup 47 6.2 The Integrated Testing Results 49 6.2.1 The Petri Nets Model in SPNP 49 6.2.2 The Model Creation in Virtual-metrology System 51 6.3 The Performance Evaluation 53 6.4 The Petri Nets model with MC-Framework 53 6.4.1 The Creation Model with MC-Framework 55 Chapter 7 Conclusions 61 7.1 The Research Summary 61 7.2 Contributions 61 7.3 Future Works 62 References 63

    [1] Cloud Computing. Available: http://en.wikipedia.org/wiki/Cloud_computing.
    [2] Q. ChenHao, "Research of Cloud Computing based on the Hadoop platform," International Conference on Computational and Information Sciences, vol. 2011, 2011.
    [3] Web service. Available: http://en.wikipedia.org/wiki/Web_service
    [4] L. C. Q. Zhang, R. Boutaba, "Cloud Computing: State-of-the-Art and Research Challenges," Journal of Internet Services and Applications, pp. 7-18, 2010.
    [5] (November 2, 2009). Apache Hadoop! Available. Available: http://hadoop.apache.org/[Accessed
    [6] A. K. Jerome Boulon, Runping Qi, Ariel Rabkin, Eric Yang, Mac Yang, "Chukwa: A large-scale monitoring system," International conference on Large installation system administration.
    [7] HBase. Available: http://en.wikipedia.org/wiki/HBase
    [8] (2009). Hadoop Distributed File System HDFS at site. Available: http://hadoop.apache.org/hdfs/.
    [9] H. K. Konstantin Shvachko, Sanjay Radia, Robert Chansler, Yahoo!, "The Hadoop Distributed File System," IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
    [10] S. G. Jeffrey Dean, "MapReduce: Simplified Data Processing on Large Clusters," Commun. of the ACM, vol. 107–113, 2008.
    [11] A. F. D. I. T. Thusoo, CA, USA ; Sarma, J.S. ; Jain, N. ; Zheng Shao "Hive - a petabyte scale data warehouse using Hadoop," IEEE 26th International Conference on Data Engineering (ICDE), 2010
    [12] Apache Hadoop NextGen MapReduce (YARN). Available: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
    [13] Apache Pig. Available: http://pig.apache.org/
    [14] Apache Ambari. Available: http://incubator.apache.org/ambari/
    [15] Zookeeper at site. Available: http://hadoop.apache.org/zookeeper
    [16] Cassandra Wiki. Available: http://wiki.apache.org/cassandra/HadoopSupport
    [17] (2009). Apache Software Foundation: The Apache Cassandra Project. Available: http: //incubator.apache.org/cassandra/
    [18] Apache Mahout. Available: http://mahout.apache.org/
    [19] Avro at site. Available: http://blog.cloudera.com/blog/2011/07/avro-data-interop/
    [20] R. Lammel, "Googles MapReduce Programming Model," Revisited. Science of Computer Programming, July 2007.
    [21] H.-C. H. Fan-Tien Cheng, Chi-An Kao, "Developing an Automatic Virtual metrology System," IEEE Transactions on Automation Science and Engineeing 2012.
    [22] C.-F. C. Min-Hsiung Hung, Hsien-Cheng Huang, Haw-Ching Yang, Fan-Tien Cheng, "Development of an AVM System Implementation Framework," IEEE Transactions on Semiconductor Manufacturing, pp. 598 - 613, 2012.
    [23] Distributed Computing. Available: http://en.wikipedia.org/wiki/Distributed_computing.
    [24] K. Q. Hui Jin, Xian-He Sun, YingLi, "Performance under Failures of MapReduce Applications," Cluster Computing and the Grid - CCGRID, 2011.
    [25] N. Y. Bogdan Ghit, Dick Epema, "Resource Management for Dynamic MapReduce Clusters in Multicluster Systems," 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1252-1259.
    [26] A. R. Randy Katz, Chukwa, "A system for reliable large-scale log collection," International conference on Large installation system administration.
    [27] G. Balbo, "Introduction to Stochastic Petri Nets," Lecture Notes in Computer Science, pp. 84-155, 2001.
    [28] D. X. Junbo Zhang, Tianrui Li, and Yi Pan, "M2M: A Simple Matlab-to-MapReduce Translator for Cloud Computing," Tsinghua Science and Technology, pp. 1-9, 2013.
    [29] W. J. K. Oliver J Haggarty, Jeremy T Bradley, "Distributed Response Time Analysis of GSPN Models with MapReduce," Performance Evaluation of Computer and Telecommunication Systems, pp. 82 - 90, 2008.
    [30] D. L. S. S. R. B.Thirumala Rao, "Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments," International Journal of Computer Applications, 2011.
    [31] A. K. Matei Zaharia, Anthony D. Joseph, Randy Katz, Ion Stoica, "Improving MapReduce Performance on Heterogeneous Environments," 8th USENIX conference on Operating systems design and implementation, pp. 29-42, 2008
    [32] Hadoop’s Capacity Scheduler. Available: http://hadoop.apache.org/core/docs/current/capacity_scheduler.html.
    [33] I. D. E. John McPherson, IBM Almaden Research Center, "Data Intensive Analytics with Hadoop: A Look Inside," ed, 2010.
    [34] J. S. Sarma, "Hive as a Contrib Project."
    [35] C. D. Jimmy Lin, Data-Intensive Text Processing with MapReduce: Morgan & Claypool, 2010.
    [36] T. White, Hadoop: The Definitive Guide: O'Reilly Media / Yahoo Press, 2010.
    [37] Hadoop Tutorial Wiki. Available: http://hadooptutorial.wikispaces.com/MapReduce
    [38] A. S. Donald Miner, MapReduce Design Parterns: O'Reilly Media, 2012.
    [39] C. Lam, Hadoop in Action, 2010.
    [40] Durham, SPNP User’s Manual Version 6.0, 1999.
    [41] F.-T. C. Jonathan Chang Yung-Cheng, "Application development of virtual metrology in semiconductor industry," IEEE Conference on Industrial Electronics Society, 2005. IECON 2005. , 2005.
    [42] J. M. Gianfranco Ciardo, Kishor T rivedi, "SPNP: Stochastic Petri Net Package," Petri Nets and Performance Models, PNPM89, pp. 142 - 151, 1989.
    [43] C.-C. C. Ding-Chau Wang, "Moving Object Location Management with Forwarding Link Scheme based on Tree Structure in Wireless Sensor Networks," Cross-Strait Conference on Information Science and Technology, p. 840, 2010.
    [44] Y. K. Y.-J. Chang, C.-L. Hsu, C.-T. Chang, T. Y. Chan, "Virtual Metrology Technique for Semiconductor Manufacturing," International Joint Conference on Neural Networks (IJCNN’06), 2006.
    [45] Install Hadoop in Windows. Available: http://hadoop.apache.org/docs/stable/single_node_setup.html
    [46] Install Hadoop on Ubuntu/Linux Mint. Available: http://www.rohitmenon.com/index.php/how-to-install-hadoop-on-ubuntulinux-mint/

    下載圖示 校內:2016-07-11公開
    校外:2016-07-11公開
    QR CODE