簡易檢索 / 詳目顯示

研究生: 裘証年
Chiou, Jeng-Nian
論文名稱: 適用於多核心架構的三維混合式快取記憶體設計及存取管理與動態快取切割演算法
A 3-D Hybrid Cache Design for CMP Architecture with Access-Aware Technique and Dynamic Cache Partitioning
指導教授: 林英超
Lin, Ing-Chao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 50
中文關鍵詞: 混合式快取記憶體三維整合技術自旋轉移力矩隨機存取記憶體
外文關鍵詞: Hybrid L2 cache, 3-D integration technology, STT-RAM
相關次數: 點閱:87下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,自旋轉移力矩隨機存取記憶體(STT-RAM)、相變隨機存取記憶體(PCRAM)等非揮發性記憶體(NVM),因為其低耗能、高密度的特色而受到相當大的重視。然而,NVM 卻也有著較高的寫入耗能、寫入時間長以及有限的寫入次數等問題。因此,為了舒緩NVM的寫入壓力,SRAM混合NVM的混合式快取記憶體架構(hybrid cache) 被學者提出了。與此同時,晶片系統上的傳輸線延遲(wire delay)也成為多核心系統設計的主要議題。為了降低晶片系統上的wire delay,三維整合(3-D integration)的技術被學者提出並使用於最底層的快取記憶體的設計上。在此篇論文當中,為了降低耗能及wire delay,我們採用了一種由SRAM bank,STT-RAM bank以及混合了SRAM及STT-RAM的hybrid cache所組成的三維堆疊的混合式快取記憶體架構。基於此架構,我們提出了一個快取存取管理機制以及快取切割演算法來降低hybrid last level cache存取延遲。實驗結果顯示,藉由我們的技術與傳統純SRAM快取記憶體相比,可以省下60.4%的能量消耗,且降低約21%的STT-RAM寫入壓力,此現象可以降低快取記憶體約18.8%的存取延遲,並且延長快取記憶體26.5%的使用壽命。

    In recent years, NVM (non-volatile memory) technologies, such as STT-RAM (spin transfer torque RAM) and PRAM (phase change RAM), have drawn a lot of attention due to the advantage of low leakage energy and high density. However, NVM suffers from write problems such as high write power, high write latency and limited endurance. Therefore, SRAM/NVM hybrid cache designs have been proposed to mitigate the write pressure on NVM. Meanwhile, wire delay induced from long on-chip interconnections is also becoming major issue of chip multi-processor (CMP) design. The three-dimensional (3-D) integration technology was proposed to mitigate the on-chip interconnect delay problem. In this thesis, we proposed a 3-D stacked non-uniform hybrid cache architecture which contains three types of cache bank: SRAM bank, STT-RAM bank, and STT-RAM/SRAM hybrid bank for CMP architecture to reduce power consumption and wire delay. Based on the proposed hybrid cache design, the access-aware technique, and partitioning algorithm are proposed to mitigate hybrid last level cache (LLC) access latency. The experimental results showed that our works can save 60.4% power compared to pure SRAM cache, and mitigates the write pressure to STT-RAM about 21%, which causes the reduction of cache access latency about 18.8% and improve 26.5% of lifetime.

    中文摘要 i Abstract ii 誌謝 iii Contents iv List of Tables vi List of Figures vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Our Contributions 6 1.3 Thesis Organization 7 Chapter 2 Preliminaries 8 2.1 STT-RAM Fundamentals 8 2.2 Non-Uniform Cache Access (NUCA) cache design 11 2.3 Related works 12 Chapter 3 Methodology 15 3.1 3-D Stacked SRAM/STT-RAM Hybrid Cache Architecture 15 3.2 Access- aware Technique 16 3.3 Address Remapping 21 3.4 Cache Partitioning Algorithm 22 3.5 Address Controlling Mechanism 30 Chapter 4 Experimental Setup and Results 32 4.1 Experimental Setup 32 4.2 Energy Consumption Evaluation of last level cache 34 4.3 Normalized Miss Rate for last level cache 35 4.4 Write pressure reduction using access-aware technique 36 4.5 The comparison of Cache Access Latency for proposed Access-aware Technique 37 4.6 Evaluation for proposed partitioning algorithm 39 4.7 Normalized Energy Delay Product 41 4.8 Normalized Lifetime w/ and w/o access-aware technique 42 Chapter 5 Conclusions 44 References 45

    [1] M. Annavaram, E. Grochowski, and J. Shen, “Mitigating Amdahl’s Law through EPI Throttling”, in Proc. of the 32nd Ann. Int. Symp. on Comp. Architecture (ISCA), pp. 298-309, June. 2005.
    [2] N. Binkert, B. Beckmann, G. Black, S. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. Hill, and D. Wood, “The Gem5 Simulator,” ACM Computer Architecture News, vol 39, no. 2, pp. 1 - 7, May. 2011.
    [3] C. Bienia, S. Kumar, J. Singh, and K. Li, “The Parsec Benchmark Suite: Characterization and Architectural Implications,” in Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2008, pp. 72 - 81.
    [4] Y.-T. Chen, J. Cong, H. Huang, B. Liu, M. Potkonjak, and G. Reinman, “Dynamically Reconfigurable Hybrid Cache: An Energy-Efficient Last-Level Cache Design,” in Proc. of DATE, pp. 45–50, March, 2012.
    [5] J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou, “An Energy-Efficient Adaptive Hybrid Cache,” in Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2011, pp. 67 - 72.
    [6] M.-T. Chang, P. Rosenfeld, S.-L. Lu, B. Jocab, “Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM” in Proceedings of IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), Feb, 2013, pp. 23 - 27.
    [7] Y. Chen, W. -F. Wong, H. Li, C. -K. Koh, “Processor Caches Built Using Multi-Level Spin-Transfer Torque RAM Cells”, in Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2011, pp. 73 – 78.
    [8] Y. Chen, X. Wang, W. Zhu, H. Li, Z. Sun, G. Sun, and Y. Xie. “Access Scheme of Multi-Level Cell Spin-Transfer Torque Random Access Memory and Its Optimization”. In 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), pages 1109 –1112, Aug. 2010.
    [9] Z. Diao, Z. Li, S. Wang, Y. Ding, A. Panchula, E. Chen, L.-C. Wang, and Y. Huai, “Spin-transfer Torque Switching in Magnetic Tunnel Junctions and Spin-transfer Torque Random Access Memory,” Journal of Physics, vol 19, no. 16, 2007.
    [10] X. Dong, C. Xu, Y. Xie, and N. Jouppi, “NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol 31, no. 7, pp. 994 - 1007, Jul. 2012.
    [11] A. Jadidi, M. Arjomand, and H. Sarbazi-Azad, “High-Endurance and Performance-Efficient Design of Hybrid Cache Architectures through Adaptive Line Replacement”, in Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2011, pp. 79 – 84.
    [12] J. Jung, K. Kang, and C.-M. Kyung, “Design and Management of 3D-Stacked NUCA Cache for Chip Multiprocessors,” in ACM Symposiumon Great lakes on VLSI (GLSVLSI), pp. 91 - 96, May 2011.
    [13] J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl, “A stochastic global net-length distribution for a three-dimensional system-on-a-chip (3-D SoC) ,” in Proc. 14th Ann. IEEE Int. ASIC/SOC Conference, Sep. 2001, pp. 147-151.
    [14] C. Kim, D. Burger, and S. Keckler, “An Adaptive, Non-uniform Cache Structure for Wire-delay Dominated On-chip Caches,” in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2002, pp. 211 - 222.
    [15] S. Kim, D. Chandra, and Y. Solihin, “Fair cache sharing and partitioning in a chip multiprocessor architecture,” in Proceedings of Parallel Architecture and Compilation Techniques (PACT), pp. 111 - 122, Oct, 2004.
    [16] D. Kadjo, H. Kim, P.Gratz, J. Hu, and R. Ayoub, “Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches,” in IEEE Conference on International Conference on Computer Design (ICCD),Oct, 2013, pp. 93 - 99.
    [17] S. Li, K. Chen, J. -H. Ahn, J.-B. Brockman, N. –P. Jouppi, “CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques”, in IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov, 2011, pp. 649-701.
    [18] S. Lee, J. Jung, and C.-M. Kyung, “Hybrid Cache Architecture Replacing SRAM Cache with Future Memory Technology,” in Proceedings of International Symposium on Circuits and Systems (ISCAS), May. 2012, pp. 2481 - 2484.
    [19] G. Loh, Y. Xie, and B. Black, “Processor design in 3D die-stacking technologies,” in IEEE Micro, vol. 27, no. 3, pp. 31–48, May/Jun. 2007.
    [20] J. Li, C. Xue, and Y. Xu, “STT-RAM based Energy-Efficiency Hybrid Cache for CMPs,” in Proceedings of IEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC), Oct. 2011, pp. 31 - 36.
    [21] N. Magen, A.Kolodny, U.Weiser, and N. Shamir, “Interconnect-power dissipation in a microprocessor,” in proceedings of 6th Int. Workshop Syst. Level Interconnect Prediction (SLIP), pp. 7 - 13, 2004.
    [22] K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith, “Multicore Resource Management,” in IEEE Micro, vol. 28, no. 3, pp. 6 - 16, May. 2008.
    [23] G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, “A Novel Acrhitecture of the 3D Stacked Mram L2 Cache for CMPs, ” in Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2009, pp. 239 - 249.
    [24] S.-M. Syu, Y.-H. Shao, and I.-C. Lin, “High-Endurance Hybrid Cache Design in CMP Architecture with Cache Partitioning and Access-Aware Policy,” in ACM Symposiumon Great lakes on VLSI (GLSVLSI), pp. 19 - 24, May, 2013.
    [25] G. Suo, X. Yang, G. Liu, J. Wu, K. Zeng, B. Zhang, and Y. Lin, “IPC-based Cache Partitioning: An IPC-oriented Dynamic Shared Cache Partitioning Mechanism,” in Proceedings of International Conference on Convergence and Hybrid Information Technology(ICHIT), Aug 2008, pp. 399 - 406.
    [26] Y. -F. Tsai, F. Wang, Y. Xie, N. Vijaykrishnan, and M. -J. Irwin, “Design Space Exploration for 3-D Cache,” in IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol 16, no. 4, pp. 444 - 455, March. 2008.
    [27] B. Verghese, A. Gupta, and M. Rosenblum, “Performance isolation : Sharing and isolation in shared-memory multiprocessors,” in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 181 - 192, Oct, 1998.
    [28] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar, “An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS”, IEEE Journal of Solid-State Circuits, 43, 1, pp. 29-41, Jan, 2008.
    [29] B. Vaidyanathan, W. Hung, F. Wang, Y. Xie, V. Narayanan, and M. J. Irwin, “Architecting microprocessor components in 3D design space,” in Proc. Int. Conf. VLSI Des., 2007, pp. 103–108.
    [30] M. Qureshi, and Y. Patt, “Utility-Based Cache Partitioning: A Low-Overhead. High-Performance, Runtime Mechanism to Partition Shared Caches,” in Proceedings of IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2006, pp. 423 - 432.
    [31] CACTI: ”An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model.” Version 5.3 available at http://www.hpl.hp.com/research/cacti/
    [32] J. Wang, X. Dong, and Y. Xie, “Point and Discard: A Hard-Error-Tolerant Architecture for Non-Volatile Last Level Caches,” in Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2012, pp. 253 – 258.
    [33] X. Wu, J. Li, E. Speight, and Y. Xie, “Power and Performance of Read-Write Aware Hybrid Cache with Non-Volatile Memories,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE), Apr. 2009, pp. 737 - 742.
    [34] S. Yazdanshenas, M. R. Pirbasti, M. Fazeli, and A. Patooghy “Coding Last Level STT-RAM Cache For High Endurance And Low Power”, IEEE Computer Architecture Letters, 99, 1, 2013.

    下載圖示 校內:2019-09-05公開
    校外:2019-09-05公開
    QR CODE