簡易檢索 / 詳目顯示

研究生: 陳麗天
Chen, Li-Tian
論文名稱: 應用於 GPU 內部之快取記憶體與主記憶體間之資料傳輸壓縮架構設計
A cache-memory link compression architecture for GPU
指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2015
畢業學年度: 104
語文別: 中文
論文頁數: 88
中文關鍵詞: 記憶體壓縮快取壓縮資料壓縮
外文關鍵詞: Memory Compression, Cache Compression, Data Compression
相關次數: 點閱:107下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們提出一套針對圖形處理器 (Graphics Processing Unit) 之記憶體壓縮架構。 在傳輸大量資料時,利用所提出的壓縮技術節省匯流排頻寬與記憶體空間。 壓縮與解壓縮單元 (Compression and Decompression Unit) 加入在系統的記憶體階層中之快取記憶體與主記憶體間,使資料經過壓縮後降低資料量,達成降低匯流排負擔與主記憶體空間使用的目的。 本論文提出的壓縮演算法為一針對浮點數表示法開發的演算法,並針對 GPU 同時支援浮點數與整數運算的特性,將壓縮架構拓展為可以一次壓縮多個字元,並進一步觀察即利用資料的特性,藉此達到最有效率的壓縮。 經過實驗,我們提出的單字元與四字元壓縮架構在我們的測試環境下,可以提供平均 51.72% 與 46.81% 的壓縮率。 同時我們的壓縮架構估計可以將記憶體存取的能耗降低到 58.77% ,有效降低記憶體存取的功耗。

    In this paper, we present a memory-compression architecture for GPU (Graphic Processing Unit). This architecture saves bus bandwidth and memory space for huge data transfer. Our system improves the compression performance by reducing the redundancies in floating-point data. We propose both Single-word and Quad-word memory compression architectures. Comparing with previous works, we get a good improvement, 16.94% of compression ratio on average, 4.3% of area, 9.3% of power and 14% of operation frequency.

    中文摘要 I 英文延伸摘要 II 目錄 XII 圖目錄 XV 表目錄 XVII 第一章 緒論 1 1-1 前言 1 1-2 研究動機 2 1-3 研究貢獻 3 1-4 論文架構 4 第二章 相關研究背景介紹 5 2-1 IEEE 754 浮點數表示法介紹 5 2-1-1 IEEE 754 浮點數表示法特性分析 6 2-2 常見於記憶體壓縮的演算法 6 2-2-1 頻繁型態壓縮 (Frequent Pattern Compression, FPC) 7 2-2-2 頻繁數值編碼 (Frequent Value Encoding, FVE) 9 2-2-3 Diff-LX 11 2-3 字典壓縮方法 (dictionary compression method) 概述 12 2-3-1 LZ77 壓縮演算法介紹[13] 13 2-3-2 LZSS壓縮演算法介紹[13] 14 2-3-3 LZO壓縮演算法介紹[13] 15 2-3-4 針對嵌入式系統設計的壓縮記憶體(Compressed RAM for Embedded Systems, CRAMES) 19 2-4 型態匹配壓縮 (Pattern Match Compression) 概述 21 高效能微處理器快取壓縮演算法 (Cache Packer, CPACK) 21 2-5 壓縮與解壓縮模組位置討論 28 儲存與操控資料之快取記憶體及編/解碼器之模型[11] 28 2-6 快取區塊大小對系統壓縮效能影響 31 利用快取/主記憶體之間的連結壓縮技術與較大快取區塊技術節省頻寬[12] 31 第三章 應用於 GPU 內部之快取記憶體與主記憶體間之資料傳輸壓縮模組設計 34 3-1 壓縮與解壓縮單元 (Compression and Decompression Unit, CODEC) 35 3-1-1 系統架構 (System Architecture) 35 3-1-2 型態搜尋 (Pattern Search) 36 3-1-3 字典探勘 (Dictionary Exploration) 39 3-1-4 執行流程 (Execution Flow) 40 3-2 壓縮引擎 (Compression Engine) 43 3-2-1 字典暫存器 (Exponent/Mantissa Dictionary Register) 43 3-2-2 型態搜尋單元 (Pattern Search Unit) 44 3-2-3 字典搜尋單元 (Dictionary Search Unit) 45 3-2-4 字首編碼器(Prefix Encoder) 47 3-2-5 控制器 (Controller) 47 3-2-6 串接單元 (Concatenating Unit) 47 3-3 解壓縮引擎 (Decompression Engine) 50 3-3-1 解碼單元 (Decoder) 50 3-3-2 復原型態 (Recover Pattern) 51 3-3-3 字典暫存器 (Exponent/Mantissa Dictionary Register) 53 3-3-4 控制器 (Controller) 53 3-3-5 串接單元 (Concatenating Unit) 53 3-4 資料分析 (Data Analysis) 55 3-5 四字元壓縮與解壓縮 72 四字元壓縮與解壓縮模組架構加入針對指數部分壓縮演算法 73 第四章 實驗結果與分析比較 79 4-1 模擬器 (Simulator) 介紹 79 4-2 浮點數壓縮演算法實驗結果 80 4-3 合成結果 84 第五章 結論與未來展望 86 5-1 結論 86 5-2 未來展望 86 參考文獻 87

    [1] X. Chen; L. Yang; R.P. Dick; L. Shang; H. Lekatsas, “C-Pack: A High-Performance Microprocessor Cache Compression Algorithm,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.18, no.8, pp.1196-1208, Aug. 2010
    [2] E.G. Hallnor; S.K. Reinhardt, “A unified compressed memory hierarchy,” in 2005. HPCA-11. 11th International Symposium on High-Performance Computer Architecture, vol., no., pp.201-212, 12-16 Feb. 2005
    [3] S. Roy; R. Kumar; M. Prvulovic, “Improving system performance with compressed memory,” in Proceedings 15th International Parallel and Distributed Processing Symposium., Apr. 2001
    [4] I. C. Tuduce; T. Gross, “Adaptive main memory compression,” in Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC '05). USENIX Association, Berkeley, CA, USA, pp.29-29, 2005.
    [5] R. B. Tremaine; P. A. Franaszek; J. T. Robinson; C. O. Schulz; T. B. Smith; M. E. Wazlowski; P. M. Bland, “IBM Memory Expansion Technology (MXT),” in IBM Journal of Research and Development , vol.45, no.2, pp.271-285, March 2001
    [6] J. L. Nunez; S. Jones, “Gbit/s lossless data compression hardware,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.11, no.3, pp.499-510, June 2003
    [7] A. Alameldeen; D. A. Wood, “Frequent pattern compression: A significance-based compression scheme for 12 caches,” in Dept. Comp. Scie. , Univ. Wisconsin-Madison, Tech. Rep. 1500, Apr. 2004.
    [8] J. Yang; R. Gupta; C. Zhang, “Frequent value encoding for low power data Buses,” in ACM Transactions on Design Automation of Electronic Systems, 9(3), July 2004.
    [9] A. Macii et. al., “A New algorithms for energy driven data compression in VLIW embedded processors,” in Proc. IEEE Design Automation and Test Europe (DATE), Munich, Germany, pp. 24-29, Mar. 2003,
    [10] H. Lekatsas et. al., “CRAMES: Compressed RAM for Embedded Systems,” in 2005. CODES+ISSS '05. Third IEEE/ACM/IFIP International Conference on Hardware/Software Co-design and System Synthesis, Jersey City, NJ, USA ,Sept. 2005, pp. 93-98.
    [11] V. Bui, M.A. Kim, “The Cache and Codec Model for Storing and Manipulating Data ,”in IEEE Micro, vol. 34, no. 4, pp, 28-35, May 2014
    [12] M. Thuresson, P. Stentrom, “Accommodation of the Bandwidth of Large Cache Blocks Using Cache/Memory Link Compression,“ in 2008. 37th International Conference on Parallel Processing, ICPP '08, Portland, Sep. 2008, pp. 478-486
    [13] D. J. Cao; Y. Y. Lin, “即時性無失真壓縮編碼之研究,” 國立中央大學通訊工程學系博碩士論文, 桃園.
    [14] S. Sardashti; D.A. Wood, “Decoupled Compressed Cache: Exploiting Spatial Locality for Energy Optimization,” in IEEE Micro, Vol. 34, no. 3, pp. 91-99, May-June 2014

    無法下載圖示 校內:2017-11-24公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE