| 研究生: |
阮有淨江 Nguyen, Huu Tinh Giang, |
|---|---|
| 論文名稱: |
設計與實作一個將單機環境軟體轉換到Hadoop基礎分散式環境的MapReduce框架 Design and Implement a MapReduce Framework for Converting Standalone Software Packages to Hadoop-based Distributed Environments |
| 指導教授: |
陳朝鈞
Chen, Chao-Chun |
| 共同指導教授: |
洪敏雄
Hung, Min-Hsiung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 製造資訊與系統研究所 Institute of Manufacturing Information and Systems |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 65 |
| 外文關鍵詞: | Mapreduce, Hadoop, Cloudizing, multi-users scheduling. |
| 相關次數: | 點閱:95 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
The Hadoop MapReduce is the programming model of designing the auto scalable distributed computing applications. It provides developer an effective environment to attain automatic parallelization. However, most existing manufacturing systems are arduous and restrictive to migrate to MapReduce private cloud, due to the platform incompatible and tremendous complexity of system reconstruction. For increasing the efficiency of manufacturing systems with minimum modification of existing systems, we design a framework in this thesis, called MC-Framework: Multi-users-based Cloudizing-Application Framework. It provides the simple interface to users for fairly executing requested tasks worked with traditional standalone software packages in MapReduce-based private cloud environments. Moreover, this thesis focuses on the multi-users workloads, but the default Hadoop scheduling scheme, i.e., FIFO, would increase delay under multiuser scenarios. Hence, we also propose a new scheduling mechanism, called Job-Sharing Scheduling, to explore and fairly share the jobs to machines in the MapReduce-based private cloud. This study uses an experimental design to verify and analysis the proposed MC-Framework with two case studies: (1) independent model systems include the stochastic Petri nets mode, and (2) dependence model systems include the virtual-metrology module of a manufacturing system. The results of our experiments indicate that our proposed framework enormously improved the time performance compared with the original package.
[1] Cloud Computing. Available: http://en.wikipedia.org/wiki/Cloud_computing.
[2] Q. ChenHao, "Research of Cloud Computing based on the Hadoop platform," International Conference on Computational and Information Sciences, vol. 2011, 2011.
[3] Web service. Available: http://en.wikipedia.org/wiki/Web_service
[4] L. C. Q. Zhang, R. Boutaba, "Cloud Computing: State-of-the-Art and Research Challenges," Journal of Internet Services and Applications, pp. 7-18, 2010.
[5] (November 2, 2009). Apache Hadoop! Available. Available: http://hadoop.apache.org/[Accessed
[6] A. K. Jerome Boulon, Runping Qi, Ariel Rabkin, Eric Yang, Mac Yang, "Chukwa: A large-scale monitoring system," International conference on Large installation system administration.
[7] HBase. Available: http://en.wikipedia.org/wiki/HBase
[8] (2009). Hadoop Distributed File System HDFS at site. Available: http://hadoop.apache.org/hdfs/.
[9] H. K. Konstantin Shvachko, Sanjay Radia, Robert Chansler, Yahoo!, "The Hadoop Distributed File System," IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
[10] S. G. Jeffrey Dean, "MapReduce: Simplified Data Processing on Large Clusters," Commun. of the ACM, vol. 107–113, 2008.
[11] A. F. D. I. T. Thusoo, CA, USA ; Sarma, J.S. ; Jain, N. ; Zheng Shao "Hive - a petabyte scale data warehouse using Hadoop," IEEE 26th International Conference on Data Engineering (ICDE), 2010
[12] Apache Hadoop NextGen MapReduce (YARN). Available: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
[13] Apache Pig. Available: http://pig.apache.org/
[14] Apache Ambari. Available: http://incubator.apache.org/ambari/
[15] Zookeeper at site. Available: http://hadoop.apache.org/zookeeper
[16] Cassandra Wiki. Available: http://wiki.apache.org/cassandra/HadoopSupport
[17] (2009). Apache Software Foundation: The Apache Cassandra Project. Available: http: //incubator.apache.org/cassandra/
[18] Apache Mahout. Available: http://mahout.apache.org/
[19] Avro at site. Available: http://blog.cloudera.com/blog/2011/07/avro-data-interop/
[20] R. Lammel, "Googles MapReduce Programming Model," Revisited. Science of Computer Programming, July 2007.
[21] H.-C. H. Fan-Tien Cheng, Chi-An Kao, "Developing an Automatic Virtual metrology System," IEEE Transactions on Automation Science and Engineeing 2012.
[22] C.-F. C. Min-Hsiung Hung, Hsien-Cheng Huang, Haw-Ching Yang, Fan-Tien Cheng, "Development of an AVM System Implementation Framework," IEEE Transactions on Semiconductor Manufacturing, pp. 598 - 613, 2012.
[23] Distributed Computing. Available: http://en.wikipedia.org/wiki/Distributed_computing.
[24] K. Q. Hui Jin, Xian-He Sun, YingLi, "Performance under Failures of MapReduce Applications," Cluster Computing and the Grid - CCGRID, 2011.
[25] N. Y. Bogdan Ghit, Dick Epema, "Resource Management for Dynamic MapReduce Clusters in Multicluster Systems," 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1252-1259.
[26] A. R. Randy Katz, Chukwa, "A system for reliable large-scale log collection," International conference on Large installation system administration.
[27] G. Balbo, "Introduction to Stochastic Petri Nets," Lecture Notes in Computer Science, pp. 84-155, 2001.
[28] D. X. Junbo Zhang, Tianrui Li, and Yi Pan, "M2M: A Simple Matlab-to-MapReduce Translator for Cloud Computing," Tsinghua Science and Technology, pp. 1-9, 2013.
[29] W. J. K. Oliver J Haggarty, Jeremy T Bradley, "Distributed Response Time Analysis of GSPN Models with MapReduce," Performance Evaluation of Computer and Telecommunication Systems, pp. 82 - 90, 2008.
[30] D. L. S. S. R. B.Thirumala Rao, "Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments," International Journal of Computer Applications, 2011.
[31] A. K. Matei Zaharia, Anthony D. Joseph, Randy Katz, Ion Stoica, "Improving MapReduce Performance on Heterogeneous Environments," 8th USENIX conference on Operating systems design and implementation, pp. 29-42, 2008
[32] Hadoop’s Capacity Scheduler. Available: http://hadoop.apache.org/core/docs/current/capacity_scheduler.html.
[33] I. D. E. John McPherson, IBM Almaden Research Center, "Data Intensive Analytics with Hadoop: A Look Inside," ed, 2010.
[34] J. S. Sarma, "Hive as a Contrib Project."
[35] C. D. Jimmy Lin, Data-Intensive Text Processing with MapReduce: Morgan & Claypool, 2010.
[36] T. White, Hadoop: The Definitive Guide: O'Reilly Media / Yahoo Press, 2010.
[37] Hadoop Tutorial Wiki. Available: http://hadooptutorial.wikispaces.com/MapReduce
[38] A. S. Donald Miner, MapReduce Design Parterns: O'Reilly Media, 2012.
[39] C. Lam, Hadoop in Action, 2010.
[40] Durham, SPNP User’s Manual Version 6.0, 1999.
[41] F.-T. C. Jonathan Chang Yung-Cheng, "Application development of virtual metrology in semiconductor industry," IEEE Conference on Industrial Electronics Society, 2005. IECON 2005. , 2005.
[42] J. M. Gianfranco Ciardo, Kishor T rivedi, "SPNP: Stochastic Petri Net Package," Petri Nets and Performance Models, PNPM89, pp. 142 - 151, 1989.
[43] C.-C. C. Ding-Chau Wang, "Moving Object Location Management with Forwarding Link Scheme based on Tree Structure in Wireless Sensor Networks," Cross-Strait Conference on Information Science and Technology, p. 840, 2010.
[44] Y. K. Y.-J. Chang, C.-L. Hsu, C.-T. Chang, T. Y. Chan, "Virtual Metrology Technique for Semiconductor Manufacturing," International Joint Conference on Neural Networks (IJCNN’06), 2006.
[45] Install Hadoop in Windows. Available: http://hadoop.apache.org/docs/stable/single_node_setup.html
[46] Install Hadoop on Ubuntu/Linux Mint. Available: http://www.rohitmenon.com/index.php/how-to-install-hadoop-on-ubuntulinux-mint/