簡易檢索 / 詳目顯示

研究生: 陳緯峻
Chen, Wei-Jun
論文名稱: 全域淨頁優先置換法與索引感知多重流向預存取方法於整合型記憶體架構
Global Clean Page First Replacement and Index Aware Multi-Stream Prefetcher in Hybrid Memory Architecture
指導教授: 林英超
Lin, Ing-Chao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 48
中文關鍵詞: 整合型記憶體動態隨機存取記憶體NAND快閃記憶體續航力存取時間置換法預存取
外文關鍵詞: Hybrid memory, DRAM, NAND flash, Endurance, Latency, Replacement Policy, Prefetching
相關次數: 點閱:84下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,由於大數據應用需要大量儲存空間,包含擁有非揮發性以及高容量特性的整合型記憶體越來越被重視。由於結合了非揮發性記憶體(NVM)與動態隨機存取記憶體(DRAM),整合型記憶體非常適合於這些大量數據的應用。然而NVM往往含有一些缺點如較高的存取時間或較低的壽命。為了改善存取時間以及壽命的問題,我們提出了全域淨頁優先置換法(Global Clean Page First, GCPF)來減少對於NVM的寫入次數,進而提升壽命。接著,我們也提出了一個索引感知多重流向預存取方法(Index Aware Multi-Stream Prefetcher, IAMSP),考量各別預存取資料的索引來更準確地從NVM中預存取資料,進而減少平均存取時間。在實驗中,我們利用了擁有大範圍記憶體腳蹤的測試資料來檢測我們提出的方法。實驗結果顯示與傳統LRU方法比較之下,GCPF平均提高約56.8%的壽命。當更進一步使用預存取方法在GCPF上時,壽命並不會被減少。此外,IAMSP可以減少約42.0%的DRAM資料錯失次數,相比之下,擁有動態改變預存取深度的近代預存取方法只能減少38.0%。結合了GCPF以及IAMSP,平均存取時間與LRU方法比較之下可以減少28.8%,並且應用這兩個方法時所花費的硬體成本也非常微小。

    As cloud computing and big data applications become more popular, the demand for large capacity memory and preservation of data in memory increases. Therefore, non-volatile memory (NVM) with high capacity has been actively developed. A hybrid memory that consists of both NVM and DRAM and provides both high access speed and non-volatility has become a major trend. However, compared to DRAM, NVM in the hybrid memory typically suffers from shorter lifetime and higher latency. To improve the lifetime and latency issues of the hybrid memory, we propose a global clean page first replacement (GCPF) to reduce the write operations to NVM. We also propose an index aware multi-stream prefetcher (IAMSP) to consider the index of prefetch candidates individually to prefetch pages from NVM more accurately. Benchmarks with a large memory footprint are used to evaluate the proposed schemes. The experimental results show that GCPF enhance lifetime by 56.8% compared to LRU on average. When applying prefetching schemes on GCPF, the lifetime is insignificantly degraded. In addition, IAMSP reduce 42.0% DRAM misses compared to LRU, while a modern prefetcher that can change the prefetch degree dynamically only reduces 38.0% DRAM misses on average. When applying both GCPF and IAMSP, the average access latency can be reduced by 28.8% compared to LRU, and the overall hardware overhead of the two schemes is insignificant.

    摘要i Abstract ii 誌謝iii Table of Contents iv List of Tables v List of Figures vi Chapter 1. Introduction 1 1.1 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2. Background and Motivation 6 2.1 Data Placement of Hybrid Memory . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3. Proposed Hybrid Memory Architecture 12 3.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Memory Request Redirector . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 DRAM Miss Handler and Prefetcher . . . . . . . . . . . . . . . . . . . . . 14 3.4 Design of Replacement and Prefetching Schemes . . . . . . . . . . . . . . . 15 3.4.1. Segment-based mechanism . . . . . . . . . . . . . . . . . . . . . . 16 3.4.2. Global Clean Page First (GCPF) . . . . . . . . . . . . . . . . . . . 17 3.4.3. Index Aware Multi-stream Prefetcher (IAMSP) . . . . . . . . . . . 19 Chapter 4. Experimental Setup and Results 26 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.1. Flash Write and Lifetime Comparison . . . . . . . . . . . . . . . . 32 4.3.2. DRAM Miss and Average Latency Comparison . . . . . . . . . . . 33 4.3.3. Prefetch Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3.4. Changes of Degrees Analysis . . . . . . . . . . . . . . . . . . . . . 42 4.3.5. Hardware Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Chapter 5. Conclusion 45 References 46

    [1] Big data benchmark (tpcx-bb). http://www.tpc.org/tpcx-bb/default.asp. Accessed:
    2018-02-16.
    [2] Graph500. http://www.graph500.org/. Accessed: 2018-02-16.
    [3] Hibench. https://github.com/intel-hadoop/HiBench. Accessed: 2018-02-16.
    [4] Intel® simulation and analysis engine (intel® sae) sdk. https://software.intel.
    com/en-us/intel-sae-sdk. Accessed: 2018-02-16.
    [5] Micron technology - slc nand. https://www.micron.com/products/nand-flash/
    slc-nand/. Accessed: 2018-02-16.
    [6] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Manasse, and R. Panigrahy.
    Design tradeoffs for ssd performance. In USENIX 2008 Annual Technical Conference,
    ATC’08, pages 57–70, Berkeley, CA, USA, 2008. USENIX Association.
    [7] B. Aker. memslap - load testing and benchmarking a server. http://docs.
    libmemcached.org/bin/memslap.html. Accessed: 2018-02-16.
    [8] D. A. Bader, J. R. Gilbert, J. Kepner, and K. Madduri. Hpc graph analysis. http:
    //www.graphanalysis.org/benchmark/index.html. Accessed: 2018-02-16.
    [9] S. Beamer, D. Patterson, and K. Asanović. Gap benchmark. http://gap.cs.
    berkeley.edu/benchmark.html. Accessed: 2018-02-16.
    [10] S. Boboila and P. Desnoyers. Write endurance in flash drives: Measurements and analysis.
    In Proceedings of the 8th USENIX Conference on File and Storage Technologies,
    FAST’10, pages 9–9, Berkeley, CA, USA, 2010. USENIX Association.
    [11] R. Chen, Z. Shao, and T. Li. Bridging the i/o performance gap for big data workloads:
    A new nvdimm-based approach. In 2016 49th Annual IEEE/ACM International Symposium
    on Microarchitecture (MICRO), pages 1–12, Oct 2016.
    [12] B. Fitzpatrick. memcached –a distributed memory object caching system. https:
    //memcached.org. Accessed: 2018-02-16.
    [13] J. W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors.
    In [1992] Proceedings the 25th Annual International Symposium on Microarchitecture
    MICRO 25, pages 102–110, Dec 1992.
    [14] Y. Ishii, M. Inaba, and K. Hiraki. Access map pattern matching for data cache prefetch.
    In Proceedings of the 23rd International Conference on Supercomputing, ICS ’09, pages
    499–500, New York, NY, USA, 2009. ACM.
    [15] N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small
    fully-associative cache and prefetch buffers. In [1990] Proceedings. The 17th Annual
    International Symposium on Computer Architecture, pages 364–373, May 1990.
    46
    [16] S. Jung, Y. Lee, and Y. H. Song. A process-aware hot/cold identification scheme
    for flash memory storage systems. IEEE Transactions on Consumer Electronics,
    56(2):339–347, May 2010.
    [17] G. B. Kandiraju and A. Sivasubramaniam. Going the distance for tlb prefetching: an
    application-driven study. In Proceedings 29th Annual International Symposium on
    Computer Architecture, pages 195–206, 2002.
    [18] S. Kim and A. V. Veidenbaum. Stride-directed prefetching for secondary caches.
    In Proceedings of the 1997 International Conference on Parallel Processing (Cat.
    No.97TB100162), pages 314–321, Aug 1997.
    [19] T. Kim, D. Zhao, and A. V. Veidenbaum. Multiple stream tracker: A new hardware
    stride prefetcher. In Proceedings of the 11th ACM Conference on Computing Frontiers,
    CF ’14, pages 34:1–34:10, New York, NY, USA, 2014. ACM.
    [20] T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway,
    J. Kamp, T. H. Lee, J. Loaiza, N. Macnaughton, V. Marwah, N. Mukherjee, A. Mullick,
    S. Muthulingam, V. Raja, M. Roth, E. Soylemez, and M. Zait. Oracle database
    in-memory: A dual format in-memory database. In 2015 IEEE 31st International Conference
    on Data Engineering, pages 1253–1258, April 2015.
    [21] P. Larson and J. Levandoski. Modern main-memory database systems. Proc. VLDB
    Endow., 9(13):1609–1610, September 2016.
    [22] S. Lee, H. Bahn, and S. H. Noh. Clock-dwf: A write-history-aware page replacement
    algorithm for hybrid pcm and dram memory architectures. IEEE Transactions on Computers,
    63(9):2187–2200, Sept 2014.
    [23] J. Lindström, V. Raatikka, J. Ruuth, P. Soini, and K. Vakkila. Ibm soliddb: In-memory
    database optimized for extreme speed and availability. 36:14–20, 01 2013.
    [24] P. Michaud. Best-offset hardware prefetching. In 2016 IEEE International Symposium
    on High Performance Computer Architecture (HPCA), pages 469–480, March 2016.
    [25] S. Mittal and J. S. Vetter. A survey of software techniques for using non-volatile memories
    for storage and main memory systems. IEEE Transactions on Parallel and Distributed
    Systems, 27(5):1537–1550, May 2016.
    [26] K. Molka and G. Casale. Contention-aware workload placement for in-memory
    databases in cloud environments. ACM Trans. Model. Perform. Eval. Comput. Syst.,
    2(1):1:1–1:29, September 2016.
    [27] S. Naftaly. Pin - a dynamic binary instrumentation tool. https://software.intel.
    com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool. Accessed:
    2018-02-16.
    [28] K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. Ac/dc: an adaptive data cache prefetcher.
    In Proceedings. 13th International Conference on Parallel Architecture and Compilation
    Techniques, 2004. PACT 2004., pages 135–145, Sept 2004.
    [29] K. J. Nesbit and J. E. Smith. Data cache prefetching using a global history buffer. IEEE
    Micro, 25(1):90–97, Jan 2005.
    47
    [30] L. Ramos, E. Gorbatov, and R. Bianchini. Page placement in hybrid memory systems.
    In Proceedings of the International Conference on Supercomputing, ICS ’11, pages 85–
    95, New York, NY, USA, 2011. ACM.
    [31] R. Salkhordeh and H. Asadi. An operating system level data migration scheme in hybrid
    dram-nvm memory architecture. In 2016 Design, Automation Test in Europe Conference
    Exhibition (DATE), pages 936–941, March 2016.
    [32] S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving
    the performance and bandwidth-efficiency of hardware prefetchers. In 2007 IEEE 13th
    International Symposium on High Performance Computer Architecture, pages 63–74,
    Feb 2007.
    [33] H. Yoon, J. Meza, R. Ausavarungnirun, R. A. Harding, and O. Mutlu. Row buffer
    locality aware caching policies for hybrid memories. In 2012 IEEE 30th International
    Conference on Computer Design (ICCD), pages 337–344, Sept 2012.
    [34] H. Zhu, Y. Chen, and X. Sun. Timing local streams: Improving timeliness in data
    prefetching. In Proceedings of the 24th ACM International Conference on Supercomputing,
    ICS ’10, pages 169–178, New York, NY, USA, 2010. ACM.

    無法下載圖示 校內:2023-02-28公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE