簡易檢索 / 詳目顯示

研究生: 許順明
Syu, Shun-Ming
論文名稱: 適用在多核心架構的高耐久性混合式快取記憶體設計及快取切割與存取管理演算法
High-Endurance Hybrid Cache Design in CMP Architecture with Cache Partitioning and Access-Aware Policy
指導教授: 林英超
Lin, Ing-Chao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 45
中文關鍵詞: 混合式快取記憶體耗損均勻自旋轉移力矩隨機存取記憶體動態快取記憶體切割演算法
外文關鍵詞: Hybrid L2 cache, STT-RAM, Wear leveling, Dynamic cache partitioning
相關次數: 點閱:122下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,自旋轉移力矩隨機存取記憶體(STT-RAM)、相變隨機存取記憶體(PCRAM)等非揮發性記憶體(NVM),因為其低耗能及高密度的特色而受到相當大的重視。但是,NVM同時也有著寫入時間長、有限的寫入次數等問題存在。因此,許多SRAM混合NVM的快取記憶體架構(hybrid cache)被提出,配合上適當的快取寫入管理方針來降低非揮發性記憶體的寫入壓力。
    這篇論文考慮到在多核心架構下的hybrid L2 cache會因為不均勻的寫入分布,而導致不均勻的耗損。因此,我們提出了一種由SRAM bank,STT-RAM bank,以及混和SRAM跟STT-RAM的hybrid bank等所組成的混合式快取記憶體架構。基於這個架構,兩個快取寫入管理方針被提出來降低STT-RAM上的寫入壓力。接著,我們提出一個動態切割演算法來動態地分配hybrid L2 cache給每個處理器使用,藉此達到耗損均勻的目的。此外,我們還分析在不同SRAM跟STT-RAM的比例下,hybrid bank對耗損均勻改善的影響變化。
    實驗結果顯示,藉由使用提出的快取寫入管理方針及動態切割演算法,我們提出的混合式快取記憶體平均可以改善89倍的使用時間,同時與傳統的SRAM 快取記憶體相比,平均可以省下58%的能量消耗。

    Abstract—In recent years, NVM (non-volatile memory) technologies, such as STT-RAM (spin transfer torque RAM) and PRAM (phase change RAM), have drawn a lot of attention due to their low leakage and high density. However, both NVMs suffer from high write latency and limited endurance problems. To mitigate the write pressure on NVM, many SRAM/NVM hybrid cache designs have been proposed with write management policies. Unfortunately, existing hybrid cache design does not consider the unbalanced workload of each core in CMP (chip multi-processors) architecture, resulting in unbalanced wear-out of hybrid cache.
    This paper considers the unbalanced write distribution of hybrid cache for CMP architecture, and a novel hybrid cache design that includes SRAM cache, STT-RAM cache, and STT-RAM/SRAM hybrid cache banks is proposed. Based on the proposed hybrid cache design, two access-aware policies are proposed to mitigate unbalanced wear-out of STT-RAM region, and a wearout-aware dynamic cache partitioning scheme is proposed to dynamically partition the hybrid cache, improving the unbalanced write pressure among different cache partitions. Experimental results show that, our proposed scheme and policies can achieve an average of 89 times improvement in cache lifetime and are able to save 58% energy consumption compared to SRAM cache.

    中文摘要 i Abstract ii 誌謝 iii Contents iv List of Tables vi List of Figures vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Our Contributions 5 1.3 Paper Organization 6 Chapter 2 Preliminaries 7 2.1 STT-RAM Fundamentals 7 2.2 Related Works of Non-uniform and Hybrid Cache Design 8 Chapter 3 Proposed Methodology 12 3.1 Local Hybrid Bank in Hybrid Cache Architecture 12 3.2 Access-aware Policies 16 3.3 Dynamic Cache Partitioning Scheme 19 Chapter 4 Experimental Setup And Results 24 4.1 Experimental Setup 24 4.2 Write Pressure on Local Banks. 26 4.3 Wrire Pressure Comparsion w/ and w/o Hybrid Banks 27 4.4 Write Pressure and Standard Deviation Reduction Using Access-aware Policies 30 4.5 Unbalanced Write Distribution Improvement Using Dynamic Cache Partitioning Scheme 31 4.6 Sensitivity Analysis of Hybrid Bank 33 4.7 Lifetime Comparison 36 4.8 Energy Consumption Comparison 38 4.9 Performance Evaluation 39 Chapter 5 Conclusion 41 References 42

    [1] N. Binkert, B. Beckmann, G. Black, S. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. Hill, and D. Wood, “The Gem5 Simulator,” ACM Computer Architecture News, vol 39, no. 2, pp. 1 - 7, May. 2011.
    [2] C. Bienia, S. Kumar, J. Singh, and K. Li, “The Parsec Benchmark Suite: Characterization and Architectural Implications,” in Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2008, pp. 72 - 81.
    [3] J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou, “An Energy-Efficient Adaptive Hybrid Cache,” in Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2011, pp. 67 - 72.
    [4] Y.-T. Chen, J. Cong, H. Huang, B. Liu, M. Potkonjak, and G. Reinman, “Dynamically Reconfigurable Hybrid Cache: An Energy-Efficient Last-Level Cache Design,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE), Mar. 2012, pp. 45 - 50.
    [5] X. Dong, C. Xu, Y. Xie, and N. Jouppi, “NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol 31, no. 7, pp. 994 - 1007, Jul. 2012.
    [6] Z. Diao, Z. Li, S. Wang, Y. Ding, A. Panchula, E. Chen, L.-C. Wang, and Y. Huai, “Spin-transfer Torque Switching in Magnetic Tunnel Junctions and Spin-transfer Torque Random Access Memory,” Journal of Physics, vol 19, no. 16, 2007.
    [7] J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler, “A NUCA Substrate for Flexible CMP Cache Sharing,” IEEE Transaction on Parallel and Distributed Systems (TPDS), vol. 18, no. 8, pp. 1028 - 1040, 2007.
    [8] J.-W. Hsieh, and Y.-H. Kuan, “Double Circular Caching Scheme for DRAM/PRAM Hybrid Cache,” in Proceedings of IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Aug. 2012, pp. 469 - 472.
    [9] A. Jog, A. Mishra, C. Xu, and Y. Xie, “Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs,” in Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2012, pp. 243 - 252.
    [10] A. Jadidi, M. Arjomand, and S.-A. Hamid, “High-endurance and Performance-Efficient Design of Hybrid Cache Architectures through Adaptive Line Replacement,” in Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2011, pp. 79 - 84.
    [11] C. Kim, D. Burger, and S. Keckler, “An Adaptive, Non-uniform Cache Structure for Wire-delay Dominated On-chip Caches,” in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2002, pp. 211 - 222.
    [12] J. Li, C. Xue, and Y. Xu, “STT-RAM based Energy-Efficiency Hybrid Cache for CMPs,” in Proceedings of IEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC), Oct. 2011, pp. 31 - 36.
    [13] C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen, X. Li, W. Hsu, Y. Kao, M. Liu, Y. Lin, M. Nowak, N. Yu, and L. Tran, “45nm Low Power CMOS Logic Compatible Embedded STT MRAM Utilizing a Reverse-Connection 1T/1MTJ Cell,” in Proceedings of IEEE International Electron Devices Meeting (IEDM), Dec. 2009, pp. 1 - 4.
    [14] S. Lee, J. Jung, and C.-M. Kyung, “Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology,” in Proceedings of International Symposium on Circuits and Systems (ISCAS), May. 2012, pp. 2481 - 2484.
    [15] M. Mao, H. Li, A. Jones, and Y. Chen, “Coordinating Prefetching and STT-RAM based Last-level Cache Management for Multicore Systems,” in Proceedings of Great Lakes Symposium on VLSI (GLSVLSI), May. 2013, pp. 55 - 60.
    [16] K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith, “Multicore Resource Management,” IEEE Micro, vol. 28, no. 3, pp. 6 - 16, May. 2008.
    [17] M. Qureshi, and Y. Patt, “Utility-Based Cache Partitioning: A Low-Overhead. High-Performance, Runtime Mechanism to Partition Shared Caches,” in Proceedings of IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2006, pp. 423 - 432.
    [18] G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, “A Novel Acrhitecture of the 3D Stacked Mram L2 Cache for CMPs, ” in Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2009, pp. 239 - 249.
    [19] J. Wang, X. Dong, and Y. Xie, “Point and Discard: A Hard-Error-Tolerant Architecture for Non-Volatile Last Level Caches,” in Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2012, pp. 253 - 258.
    [20] X. Wu, J. Li, E. Speight, and Y. Xie, “Power and Performance of Read-Write Aware Hybrid Cache with Non-Volatile Memories,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE), Apr. 2009, pp. 737 - 742.
    [21] G. Wu, H. Zhang, Y. Dong, and J. Hu, “CAR:Securing PCM Main Memory System with Cache Address Remapping,” in Proceedings of International Conference on Parallel and Distributed Systems (ICPADS), Dec. 2012, pp. 628 – 635.

    [22] CACTI: ”An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model.” Version 5.3 available at http://www.hpl.hp.com/research/cacti/

    無法下載圖示 校內:2016-06-30公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE