簡易檢索 / 詳目顯示

研究生: 石允齊
Shih, Yun-Chi
論文名稱: 基於GPU之資料庫快速重建技術
Restoring Databases with GPU
指導教授: 蕭宏章
Hsiao, Hung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 26
中文關鍵詞: LevelDBCUDAGPUSkiplist
外文關鍵詞: LevelDB, CUDA, GPU, Skiplist
相關次數: 點閱:32下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Compute Unified Device Architecture (CUDA) 是由Nvidia 所推出的一種整合計算,透過Nvidia GPU多核的特性去解決資料密集的計算。而LevelDB是Google設計的一個開放原始碼 key-value 儲存的資料庫函式庫。本論文研究基於CUDA及LevelDB下提出一具平行處理的資料結構,設計的目的乃針對大量資料快速建立可查詢之資料庫,以及快速恢復資料庫之使用。我們提出GPU in-memory平行索引 (in-memory parallelized indexing) 建造方法並 透過實現於LevelDB 驗證。實驗結果發現,我們透過GPU來平行構建in-memory indexing可較傳統基於CPU的LevelDB在執行速度上達將近700倍。

    The thesis presents a novel database restoring algorithm and its implementation with a massive number of computing units in GPU. Compared with traditional CPU-based database restoring schema. Our proposed solution outperforms the CPU-based one over 700x in terms of execution time. Specifically, we present a novel algorithm in constructing in-memory indices with GPU. Given a set of data items, we exploit potential parallelism for building such in-memory indices while typical solutions attack the problem inefficiently by injecting data items into an DB one at a time. We demonstrate our idea by augmenting LevelDB, a well-know database serving as a library. Our present implementation is basedon Nvidia GPU CUDA (Compute Unified Device Architecture).

    摘要 i Extended Abstract ii 致謝 v 目錄 vi 圖目錄 viii Chapter 1. 簡介 1 Chapter 2. 背景研究 5 2.1 CUDA 5 2.2 LevelDB 7 2.3 多筆PUT於LevelDB上之議題 9 2.4 Linkedlist往返CPU與GPU之議題 9 Chapter 3. 相關研究 10 Chapter 4. 系統用戶端 11 4.1 Open API 11 4.2 建樹API 12 4.3 GET操作 12 Chapter 5. 演算法架構 13 5.1 Sorting 13 5.2 Replication 14 5.2.1 Global memory 14 5.2.2 Shared memory 15 5.3 Connection 16 5.3.1 Global memory 16 5.3.2 Shared memory 17 5.4 時間複雜度分析 17 Chapter 6. 實驗 18 6.1 測試環境 18 6.2 CUDA Emulation in CPU vs GPU 18 6.3 Building Performance 19 6.3.1 Without cudaMalloc Overheads 20 6.3.2 With cudaMalloc Overheads 21 Chapter 7. 結論 23 Chapter 8. 參考資料 24

    [1] "Graphics processing unit," [Online]. Available: https://en.wikipedia.org/wiki/Graphics_processing_unit.
    [2] C. Bishop, Pattern recognition and machine learning, 2006.
    [3] "Hadoop," Apache, [Online]. Available: https://hadoop.apache.org/.
    [4] K. Shvachko, H. Kuang, S. Radia and R. Chansler, "The Hadoop Distributed File System," 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
    [5] "HBase," Apache, [Online]. Available: https://hbase.apache.org/.
    [6] "LevelDB," google, [Online]. Available: https://github.com/google/leveldb.
    [7] "Google Chrome," [Online]. Available: https://en.wikipedia.org/wiki/Google_Chrome.
    [8] M. Swan, Blockchain: Blueprint for a new economy, O'REILLY.
    [9] "BitcoinCore," Bitcoin, [Online]. Available: https://bitcoin.org/en/bitcoin-core/.
    [10] "Go Ethereum," [Online]. Available: https://geth.ethereum.org/.
    [11] M. F. SANNER, "Python: a programming language for software integration and development," The Scripps Research Institute.
    [12] “R (programming language),” [線上]. Available: https://en.wikipedia.org/wiki/R_(programming_language).
    [13] C.-P. Tsai, H.-C. Hsiao, Y.-C. Chao, M. Hsu and A. R. Chang, "Bridging the Gap between Big Data System Software Stack and Application," in IEEE International Conference on Big Data, The Case of Semiconductor Wafer Fabrication Foundries, 2018.
    [14] W. Pugh, "Skip Lists: A Probabilistic Alternative to Balanced Trees".
    [15] "Cuda Toolkit," Nvidia, [Online]. Available: https://developer.nvidia.com/cuda-downloads.
    [16] "Cuda Toolkit Documentation," Nvidia, [Online]. Available: https://docs.nvidia.com/cuda/.
    [17] D. R. BW Kernighan, The C programming language, 2006.
    [18] "Stream processing," Wikipedia, [Online]. Available: https://en.wikipedia.org/wiki/Stream_processing.
    [19] "Single Instruction Multiple Thread," [Online]. Available: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads.
    [20] R. Cattell, "Scalable SQL and NoSQL data stores," Association for Computing Machinery, pp. 12-27, 4 12 2010.
    [21] P. O’Neil, E. Cheng, D. Gawlick and E. O’Neil, "The log-structured merge-tree (LSM-tree)," Acta Informatica, pp. 351-385, 7 1996.
    [22] P. Misra and M. Chaudhuri, "Performance Evaluation of Concurrent Lock-free Data Structures on GPUs," 2012 IEEE 18th International Conference on Parallel and Distributed Systems, 17-19 12 2012.
    [23] N. Moscovici, N. Cohen and E. Petrank, "A GPU-Friendly Skiplist Algorithm," in 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2017.
    [24] "Garbage collection," wikipedia, [Online]. Available: https://en.wikipedia.org/wiki/Tracing_garbage_collection.
    [25] A. H. Team, "Apache HBase ™ Reference Guide," [Online]. Available: https://hbase.apache.org/book.html#arch.bulk.load.
    [26] J. Dean and S. Ghemawat, "MapReduce:simplifed data processing on large clusters," Communications of the ACM - 50th anniversary issue: 1958 - 2008, pp. 107-113, 1 1 2008.
    [27] "Kinetica," Kinetica, [Online]. Available: https://www.kinetica.com/.
    [28] "Thrust," [Online]. Available: https://thrust.github.io/.
    [29] "Radix sort," Wekipedia, [Online]. Available: https://en.wikipedia.org/wiki/Radix_sort.
    [30] "Memory Allocation Overhead," virginia university department of computer science, [Online]. Available: https://www.cs.virginia.edu/~mwb7w/cuda_support/memory_management_overhead.html.

    下載圖示 校內:2024-08-06公開
    校外:2024-08-06公開
    QR CODE