簡易檢索 / 詳目顯示

研究生: 陳浩駿
Chen, Hao-Jun
論文名稱: 基於ANSYS模擬應用的大資料平臺搭建
Experiences on Building a Big Data Platform for Real Applications: The Case of ANSYS Simulations
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 37
中文關鍵詞: HadoopSpark檔案系統行為
外文關鍵詞: Hadoop, Spark, File System I/O
相關次數: 點閱:91下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 開源軟體由於其開放性,吸引了全世界眾多的開發者對各自感興趣的開源社區進行開發、維護和升級。其中最為突出的開源軟體為Hadoop、HBase、Spark,它們的興起引起了各行業對大資料平台的重視。
    然而,由於利益關係,仍然存在着許多閉源軟體,即不對外開放原始碼,一般用于商業用途。從而軟體的升級和維護需要花費高金額向服務提供商購買,成本巨大的同時,也阻礙了開發者對其優化的可能。可以說無論是企業還是開發者對闭源軟體很多時候都是束手無策。
    在台灣,台灣工業技術研究院走在企業的前沿。作為技術服務商,台灣工業技術研究院購買了大量企業開發常用的閉源軟體及其使用憑證,以租賃的方式,通過架設伺服器叢集並利用虛擬化技術,向台灣各企業提供相應的軟體使用服務和所需的運行環境。此舉不僅減少了企業運作成本,同時也促進台灣生產總值提升。
    然而如何有效利用叢集資源和憑證資源成了一個很有意義的議題。目前,工研院以通過MPI技術將叢集的計算資源使用效率調整到最佳。而作為技術支持方的我們,將以i/o資源使用作為切入點進行議題研究,以瞭解i/o資訊、發掘i/o特徵為目的,為工業技術研究院提供技術支援。
    本論文的實作,將通過搭建了大資料平台及編寫i/o分析程式,以直觀圖表的顯示和傳統的SQL方式,給工研院提供i/o資訊特徵圖表和查詢分析介面,借此幫助工研院瞭解軟體在運行過程中的i/o特點。希望通過整套應用的幫助和利用i/o資訊的分析,台灣技術研究院可以找到有效提升i/o效率和分配資源的方法。

    Because of collaborative public manner, open source software has attracted many developers around the world to make contributions to their interested open source community. Few most famous open source software among them are Hadoop, HBase and Spark, raising great attention among industries.
    However, there are still many close source software that those don’t share their source code and always are used for business. Thus, if users want to upgrade software or maintain them, they have to pay service provider huge price for those services. In the meantime, close source software blocks developer from secondary developing.
    In Taiwan, Industrial Technology Research Institute (ITRI) is the lead of enterprises. As a technical service provider, ITRI has brought many software and their respective licenses. Then it rents out these licenses and execution environment to provide computing services to enterprises by using virtualization technology and setting up server cluster. This move greatly reduce the cost for enterprises, especially those small and medium size enterprises (SMEs).
    However, issue raises that how to make full use of the resources available among the server machines. So far, ITRI had successfully solved the resources requirements for computation by Message Passing Interface (MPI) technique. And as technical support team of ITRI, we choose to survey resource requirements for i/o operation, and aim to discover the i/o pattern or i/o feature by analysis.
    The outcomes of this thesis will be an application with a set of tools, including analysis and features visualization. We hope to help ITRI find the best way to make improvement on i/o performance as well as make best use of cluster’s resources.

    摘要 i ABSTRACT iii ACKNOWLEDGEMENTS iv TABLE OF CONTENTS v LIST OF TABLES vii LIST OF FIGURES viii CHAPTER 1 Introduction 1 1.1 Background 1 1.2 Issues 1 1.3 Our solution 2 1.4 Outcomes and Shortcoming 3 CHAPTER 2 Architecture of our Application 5 CHAPTER 3 File System I/O Extractor 6 3.1 Brief Introduction of Minifilter Drivers 6 3.2 Minifilter Driver in our Application 8 3.2.1 Kernel-mode Process 8 3.2.2 User-mode Process 8 3.3 Memory Issue of I/O Extractor 9 3.4 Introduction of I/O Log 11 CHAPTER 4 File System I/O Emulator 14 4.1 Introduction of I/O Emulator 14 4.2 The Correctness of I/O Emulator 15 4.3 Components of I/O Emulator 16 4.3.1 Log Cleaner 16 4.3.2 Log Flusher 18 4.3.3 Files Generator 19 4.3.4 I/O Simulator 19 4.4 Configurations 20 CHAPTER 5 File System I/O Analyzer 23 5.1 Storage Layer --- Hadoop-HDFS 23 5.1.1 Reasons for choosing Hadoop-HDFS 23 5.1.2 Architecture of HDFS 24 5.2 Computation Layer --- Spark 25 5.2.1 Reasons for running Analyzer distributed 25 5.2.2 Introduction of Spark 26 5.2.3 Compared to Hadoop --- MapReduce 26 5.2.4 Architecture of Spark 27 5.3 Introduction of Analyzer 28 CHAPTER 6 Outcomes of our Application 31 CHAPTER 7 Summary and Future Work 35 REFERENCES 36

    [1] ANSYS http://www.ansys.com/
    [2] Minifilter Drivers https://msdn.microsoft.com/en-us/windows/hardware/drivers/ifs/file-system-minifilter-drivers
    [3] Shared Memory https://en.wikipedia.org/wiki/Shared_memory
    [4] K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In Proceeding of MSST’10, 2010, pages 1–10, Washington, DC, USA, 2010. IEEE Computer Society.
    [5] M. Zaharia, M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceeding of HotCloud, 2010.
    [6] J. Chen, J. Ye. Research on the file encryption system based on minifilter driver. In: Long, S., Dhillon, B.S. (eds.) Proceedings of the 13th International Conference on Man-Machine-Environment System Engineering. Lecture Notes in Electrical Engineering, pp. 175–182. Springer, Heidelberg (2014)
    [7] S. Qiu, G. Tang, Y. Wang. Research of file backup method based on double cache and minifilter driver. In: 2015 International Conference on Advances in Mechanical Engineering and Industrial Informatics. Atlantis Press (2015)
    [8] Apache Hadoop http://hadoop.apache.org/
    [9] S. Ghemawat, H. Gobioff, S. Leung. The Google file system. In Proceeding of SOSP’03, 2003.
    [10] S. Weil, S. Brandt, E. Miller, D. Long, C. Maltzahn. Ceph: A Scalable, High-Performance Distributed File System. In Proceeding of OSDI’06, 2006.
    [11] Lustre File System. http://www.lustre.org
    [12] M. Armbrust, R. Xin, C. Lian, Y. Yuai, D. Liu, J. Bradley, X. Meng, T. Kaftan, M. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational data processing in spark. In ACM Special Interest Group on Management of Data, 2015.
    [13] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Made, S. Owen, D. Xin, R. Xin, J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine Learning in Apache Spark. In Proceeding of CoRR’15, 2015.
    [14] E. Joseph, R. Xin, A. Dave, D. Crankshaw, M. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In Conference on Operating Systems Design and Implementation, 2014.
    [15] M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proceeding of SOSP’13, 2013.
    [16] Apache Spark http://spark.apache.org/
    [17] J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceeding of OSDI’04, 2004.
    [18] Apache Hadoop MapReduce https://hadoop.apache.org/mapreduce/
    [19] M. Zaharia et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceeding of NSDL’12, 2012.
    [20] Cluster Manager http://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types
    [21] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. Technical Report UCB/EECS-2010-87, EECS Department, University of California, Berkeley, 2010.
    [22] V. K. Vavilapallih, A. C. Murthyh, C. Douglasm, S. Agarwali, M. Konarh, R. Evansy, T. Gravesy, J. Lowey, H. Shahh, S. Sethh, B. Sahah, C. Curinom, O. O’Malleyh, S. Radiah, B. Reedf, and E. Baldeschwielerh. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceeding of SoCC’13, 2013.

    無法下載圖示 校內:2021-07-27公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE