成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳緯峻 Chen, Wei-Jun
論文名稱：	全域淨頁優先置換法與索引感知多重流向預存取方法於整合型記憶體架構 Global Clean Page First Replacement and Index Aware Multi-Stream Prefetcher in Hybrid Memory Architecture
指導教授：	林英超 Lin, Ing-Chao
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	48
中文關鍵詞：	整合型記憶體、動態隨機存取記憶體、NAND快閃記憶體、續航力、存取時間、置換法、預存取
外文關鍵詞：	Hybrid memory, DRAM, NAND flash, Endurance, Latency, Replacement Policy, Prefetching
相關次數：	點閱：124 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，由於大數據應用需要大量儲存空間，包含擁有非揮發性以及高容量特性的整合型記憶體越來越被重視。由於結合了非揮發性記憶體(NVM)與動態隨機存取記憶體(DRAM)，整合型記憶體非常適合於這些大量數據的應用。然而NVM往往含有一些缺點如較高的存取時間或較低的壽命。為了改善存取時間以及壽命的問題，我們提出了全域淨頁優先置換法(Global Clean Page First, GCPF)來減少對於NVM的寫入次數，進而提升壽命。接著，我們也提出了一個索引感知多重流向預存取方法(Index Aware Multi-Stream Prefetcher, IAMSP)，考量各別預存取資料的索引來更準確地從NVM中預存取資料，進而減少平均存取時間。在實驗中，我們利用了擁有大範圍記憶體腳蹤的測試資料來檢測我們提出的方法。實驗結果顯示與傳統LRU方法比較之下，GCPF平均提高約56.8%的壽命。當更進一步使用預存取方法在GCPF上時，壽命並不會被減少。此外，IAMSP可以減少約42.0%的DRAM資料錯失次數，相比之下，擁有動態改變預存取深度的近代預存取方法只能減少38.0%。結合了GCPF以及IAMSP，平均存取時間與LRU方法比較之下可以減少28.8%，並且應用這兩個方法時所花費的硬體成本也非常微小。

As cloud computing and big data applications become more popular, the demand for large capacity memory and preservation of data in memory increases. Therefore, non-volatile memory (NVM) with high capacity has been actively developed. A hybrid memory that consists of both NVM and DRAM and provides both high access speed and non-volatility has become a major trend. However, compared to DRAM, NVM in the hybrid memory typically suffers from shorter lifetime and higher latency. To improve the lifetime and latency issues of the hybrid memory, we propose a global clean page first replacement (GCPF) to reduce the write operations to NVM. We also propose an index aware multi-stream prefetcher (IAMSP) to consider the index of prefetch candidates individually to prefetch pages from NVM more accurately. Benchmarks with a large memory footprint are used to evaluate the proposed schemes. The experimental results show that GCPF enhance lifetime by 56.8% compared to LRU on average. When applying prefetching schemes on GCPF, the lifetime is insignificantly degraded. In addition, IAMSP reduce 42.0% DRAM misses compared to LRU, while a modern prefetcher that can change the prefetch degree dynamically only reduces 38.0% DRAM misses on average. When applying both GCPF and IAMSP, the average access latency can be reduced by 28.8% compared to LRU, and the overall hardware overhead of the two schemes is insignificant.

摘要i
Abstract ii
誌謝iii
Table of Contents iv
List of Tables v
List of Figures vi
Chapter 1. Introduction 1
1.1 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2. Background and Motivation 6
2.1 Data Placement of Hybrid Memory . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 3. Proposed Hybrid Memory Architecture 12
3.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Memory Request Redirector . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 DRAM Miss Handler and Prefetcher . . . . . . . . . . . . . . . . . . . . . 14
3.4 Design of Replacement and Prefetching Schemes . . . . . . . . . . . . . . . 15
3.4.1. Segment-based mechanism . . . . . . . . . . . . . . . . . . . . . . 16
3.4.2. Global Clean Page First (GCPF) . . . . . . . . . . . . . . . . . . . 17
3.4.3. Index Aware Multi-stream Prefetcher (IAMSP) . . . . . . . . . . . 19
Chapter 4. Experimental Setup and Results 26
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3.1. Flash Write and Lifetime Comparison . . . . . . . . . . . . . . . . 32
4.3.2. DRAM Miss and Average Latency Comparison . . . . . . . . . . . 33
4.3.3. Prefetch Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.4. Changes of Degrees Analysis . . . . . . . . . . . . . . . . . . . . . 42
4.3.5. Hardware Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Chapter 5. Conclusion 45
References 46
                                    

[1] Big data benchmark (tpcx-bb). http://www.tpc.org/tpcx-bb/default.asp. Accessed:
2018-02-16.
[2] Graph500. http://www.graph500.org/. Accessed: 2018-02-16.
[3] Hibench. https://github.com/intel-hadoop/HiBench. Accessed: 2018-02-16.
[4] Intel® simulation and analysis engine (intel® sae) sdk. https://software.intel.
com/en-us/intel-sae-sdk. Accessed: 2018-02-16.
[5] Micron technology - slc nand. https://www.micron.com/products/nand-flash/
slc-nand/. Accessed: 2018-02-16.
[6] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Manasse, and R. Panigrahy.
Design tradeoffs for ssd performance. In USENIX 2008 Annual Technical Conference,
ATC’08, pages 57–70, Berkeley, CA, USA, 2008. USENIX Association.
[7] B. Aker. memslap - load testing and benchmarking a server. http://docs.
libmemcached.org/bin/memslap.html. Accessed: 2018-02-16.
[8] D. A. Bader, J. R. Gilbert, J. Kepner, and K. Madduri. Hpc graph analysis. http:
//www.graphanalysis.org/benchmark/index.html. Accessed: 2018-02-16.
[9] S. Beamer, D. Patterson, and K. Asanović. Gap benchmark. http://gap.cs.
berkeley.edu/benchmark.html. Accessed: 2018-02-16.
[10] S. Boboila and P. Desnoyers. Write endurance in flash drives: Measurements and analysis.
In Proceedings of the 8th USENIX Conference on File and Storage Technologies,
FAST’10, pages 9–9, Berkeley, CA, USA, 2010. USENIX Association.
[11] R. Chen, Z. Shao, and T. Li. Bridging the i/o performance gap for big data workloads:
A new nvdimm-based approach. In 2016 49th Annual IEEE/ACM International Symposium
on Microarchitecture (MICRO), pages 1–12, Oct 2016.
[12] B. Fitzpatrick. memcached –a distributed memory object caching system. https:
//memcached.org. Accessed: 2018-02-16.
[13] J. W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors.
In [1992] Proceedings the 25th Annual International Symposium on Microarchitecture
MICRO 25, pages 102–110, Dec 1992.
[14] Y. Ishii, M. Inaba, and K. Hiraki. Access map pattern matching for data cache prefetch.
In Proceedings of the 23rd International Conference on Supercomputing, ICS ’09, pages
499–500, New York, NY, USA, 2009. ACM.
[15] N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small
fully-associative cache and prefetch buffers. In [1990] Proceedings. The 17th Annual
International Symposium on Computer Architecture, pages 364–373, May 1990.
46
[16] S. Jung, Y. Lee, and Y. H. Song. A process-aware hot/cold identification scheme
for flash memory storage systems. IEEE Transactions on Consumer Electronics,
56(2):339–347, May 2010.
[17] G. B. Kandiraju and A. Sivasubramaniam. Going the distance for tlb prefetching: an
application-driven study. In Proceedings 29th Annual International Symposium on
Computer Architecture, pages 195–206, 2002.
[18] S. Kim and A. V. Veidenbaum. Stride-directed prefetching for secondary caches.
In Proceedings of the 1997 International Conference on Parallel Processing (Cat.
No.97TB100162), pages 314–321, Aug 1997.
[19] T. Kim, D. Zhao, and A. V. Veidenbaum. Multiple stream tracker: A new hardware
stride prefetcher. In Proceedings of the 11th ACM Conference on Computing Frontiers,
CF ’14, pages 34:1–34:10, New York, NY, USA, 2014. ACM.
[20] T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway,
J. Kamp, T. H. Lee, J. Loaiza, N. Macnaughton, V. Marwah, N. Mukherjee, A. Mullick,
S. Muthulingam, V. Raja, M. Roth, E. Soylemez, and M. Zait. Oracle database
in-memory: A dual format in-memory database. In 2015 IEEE 31st International Conference
on Data Engineering, pages 1253–1258, April 2015.
[21] P. Larson and J. Levandoski. Modern main-memory database systems. Proc. VLDB
Endow., 9(13):1609–1610, September 2016.
[22] S. Lee, H. Bahn, and S. H. Noh. Clock-dwf: A write-history-aware page replacement
algorithm for hybrid pcm and dram memory architectures. IEEE Transactions on Computers,
63(9):2187–2200, Sept 2014.
[23] J. Lindström, V. Raatikka, J. Ruuth, P. Soini, and K. Vakkila. Ibm soliddb: In-memory
database optimized for extreme speed and availability. 36:14–20, 01 2013.
[24] P. Michaud. Best-offset hardware prefetching. In 2016 IEEE International Symposium
on High Performance Computer Architecture (HPCA), pages 469–480, March 2016.
[25] S. Mittal and J. S. Vetter. A survey of software techniques for using non-volatile memories
for storage and main memory systems. IEEE Transactions on Parallel and Distributed
Systems, 27(5):1537–1550, May 2016.
[26] K. Molka and G. Casale. Contention-aware workload placement for in-memory
databases in cloud environments. ACM Trans. Model. Perform. Eval. Comput. Syst.,
2(1):1:1–1:29, September 2016.
[27] S. Naftaly. Pin - a dynamic binary instrumentation tool. https://software.intel.
com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool. Accessed:
2018-02-16.
[28] K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. Ac/dc: an adaptive data cache prefetcher.
In Proceedings. 13th International Conference on Parallel Architecture and Compilation
Techniques, 2004. PACT 2004., pages 135–145, Sept 2004.
[29] K. J. Nesbit and J. E. Smith. Data cache prefetching using a global history buffer. IEEE
Micro, 25(1):90–97, Jan 2005.
47
[30] L. Ramos, E. Gorbatov, and R. Bianchini. Page placement in hybrid memory systems.
In Proceedings of the International Conference on Supercomputing, ICS ’11, pages 85–
95, New York, NY, USA, 2011. ACM.
[31] R. Salkhordeh and H. Asadi. An operating system level data migration scheme in hybrid
dram-nvm memory architecture. In 2016 Design, Automation Test in Europe Conference
Exhibition (DATE), pages 936–941, March 2016.
[32] S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving
the performance and bandwidth-efficiency of hardware prefetchers. In 2007 IEEE 13th
International Symposium on High Performance Computer Architecture, pages 63–74,
Feb 2007.
[33] H. Yoon, J. Meza, R. Ausavarungnirun, R. A. Harding, and O. Mutlu. Row buffer
locality aware caching policies for hybrid memories. In 2012 IEEE 30th International
Conference on Computer Design (ICCD), pages 337–344, Sept 2012.
[34] H. Zhu, Y. Chen, and X. Sun. Timing local streams: Improving timeliness in data
prefetching. In Proceedings of the 24th ACM International Conference on Supercomputing,
ICS ’10, pages 169–178, New York, NY, USA, 2010. ACM.

校外：不公開電子論文及紙本論文均尚未授權公開

簡易檢索 / 詳目顯示

相關論文