簡易檢索 / 詳目顯示

研究生: 廖哲慶
Liao, Che-Ching
論文名稱: 於簡化階段進行聯結且採用資料轉置模式之線上分析處理系統
On-line Analytical Processing System based on Reduce-Phase Aggregation with Inverted Data Model
指導教授: 謝錫堃
Shieh, Ce-Kuen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 59
中文關鍵詞: 即時分析處理MapReduceHadoop星狀結構多維度資料模組
外文關鍵詞: OLAP, MapReduce, Hadoop, Star Schema, Multi-dimension Model
相關次數: 點閱:81下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 即時分析處理(OLAP)已經成功地被用來分析行銷、金融、保險、工業製造等領域的資料。OLAP提供了良好的分析概念來分析資料庫當中的多維度資料。當收集到來自應用端的大量資料時,企業或組織使用OLAP的動作來分析資料庫中資料。將即時分析組裡的概念結合MapReduce的分散式運算框架,我們可以在具擴展性的分散式運算環境中透過OLAP的動作來分析資料庫中的巨量資料。然而,當前的OLAP分析系統的處理流程無法滿足所有OLAP分析動作中的代數運算,使得針對即時分析處理的核心-「即時」的表現仍然有成長的空間;現今的資料倉儲中,Dimension Attribute或Dependent Attribute資料數量也持續的增大,無法滿足「即時」需求的問題會變得更嚴重。為了能夠將巨量資料進行OLAP分析動作的處理,本篇論文提出一個有效的方法來滿足即時分析處理的核心-「即時」需求。我們針對OLAP分析動作的兩種基本數學代數運算式配合不同的資料量大小下進行了實驗,結果顯示效能皆優於其他系統。我們相信仍然有方法可以持續提升我們系統的處理效能,期許未來能夠繼續精進系統中各個元件的方法,成為一個更完整且更實用的系統。

    On-line analytical processing (OLAP) provides analysis of multi-dimensional data stored in a database and achieves great success in many applications such as sales, marketing, manufacturing, insurance and financial data analysis. OLAP operation is a dominant part of data analysis for enterprises and organizations, especially when addressing a large amount of data collected from these applications. With the emergence of the MapReduce paradigm, OLAP operation can be processed on big data that resides in scalable, distributed storage. However, current MapReduce implementations of OLAP operation processing have a major performance drawback caused by improper processing procedure. This is crucial when dimension or dependent attributes are large, which is a common case for most data warehouses hold nowadays. To tackle this issue, this paper proposes a powerful methodology to accelerate the performance of OLAP operation processing on big data. We have conducted the experiments on both of the basic algebra of OLAP operations with different data sizes to demonstrate the effectiveness of our system.

    1 Introduction 1 2 Background 8 2.1 Data Warehouse 8 2.2 OLAP 11 2.2.1 Multidimensional data model 11 2.2.2 OLAP Operations 13 2.2.3 Relational Algebras 15 2.3 Cloud Technology 17 2.3.1 MapReduce Programming Model 17 2.3.2 Apache Hadoop 18 2.3.3 Apache HBase 19 3 Related Work 22 3.1 OLAP Data Storage Model 22 3.2 OLAP operation processing 25 4 System Design 28 4.1 Data Model Constructor 30 4.2 Query Analyzer 34 4.3 Algebra Execution Algorithm 36 5 Implementation 38 5.1 Query Analyzer 38 5.2 Model Constructor 40 5.3 Executing Algorithms 41 6 Performance Evaluation 46 6.1 Environments 46 6.2 Evaluation with the algebra 47 6.3 Discussion of Data Model 55 7 Conclusion and Future Work 57 Reference 58

    [1] Chaudhuri, Surajit, and Umeshwar Dayal. "An overview of data warehousing and OLAP technology." ACM Sigmod record 26.1 (1997): 65-74.
    [2] Jing-hua, Zhao, Song Ai-mei, and Song Ai-bo. "OLAP Aggregation Based on Dimension-oriented Storage." Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International. IEEE, 2012.
    [3] Dean, J. and S. Ghemawat, MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008. 51(1): p. 107-113.
    [4] Hadoop. Available from: http://hadoop.apache.org/ .
    [5] Borthakur, Dhruba. "HDFS architecture guide." HADOOP APACHE PROJECT http://hadoop. apache. org/common/docs/current/hdfs design. pdf (2008).
    [6] Apache HBase. Available from: http://hbase.apache.org/ .
    [7] Chang, Fay, et al. "Bigtable: A distributed storage system for structured data." ACM Transactions on Computer Systems (TOCS) 26.2 (2008): 4.
    [8] Thusoo, Ashish, et al. "Hive-a petabyte scale data warehouse using hadoop." Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 2010.
    [9] He, Yongqiang, et al. "RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems." Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011.
    [10] Elsmari, Ramez, and Sham Navathe. "Fundamentals of database systems." (2000).
    [11] Inmon, William H. Building the data warehouse. John wiley & sons, 2005.
    [12] Han, Jiawei, and Micheline Kamber. Data Mining, Southeast Asia Edition: Concepts and Techniques. Morgan kaufmann, 2006.
    [13] Giovinazzo, William A. Object-oriented data warehouse design: building a star schema. Prentice Hall PTR, 2000.
    [14] Davis, Martin, and Hilary Putnam. "A computing procedure for quantification theory."Journal of the ACM (JACM) 7.3 (1960): 201-215.
    [15] O'Neil, Patrick, et al. "The star schema benchmark." 2009.
    [16] Council, Transaction Processing Performance. "TPC-H benchmark specification."Published at http://www. tcp. org/hspec. html (2008).

    無法下載圖示 校內:2019-08-25公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE