簡易檢索 / 詳目顯示

研究生: 黃文志
Huang, Wen-Zhi
論文名稱: 多核心系統下需求感知混合性草稿式記憶體即時管理方法之設計與實作
Design and Implementation of a Demand-aware Online Hybrid Scratchpad Memory Management Method for Multi-core Systems
指導教授: 張大緯
Chang, Da-Wei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 106
語文別: 英文
論文頁數: 45
中文關鍵詞: 混合式晶片上記憶體草稿式記憶體非揮發性記憶體記憶體分配策略多核心系統架構
外文關鍵詞: Hybrid on-chip memory, Scratchpad memory, Non-volatile memory, Memory allocation, Multi-core system
相關次數: 點閱:104下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著程式高效能的需求持續成長,多核心系統架構成為大部分嵌入式系統的發展趨勢,然而隨著核心數量增加,傳統快取的耗電量也急遽提高。在嵌入式系統中,草稿式記憶體相比於傳統的快取記憶體,耗電量低、面積也小,因此草稿式記憶體被廣泛運用在多核心嵌入式系統中。然而,隨著製程越來越小,由靜態隨機存取記憶體組成的草稿式記憶體的靜態能耗越來越高,為了解決此問題,非揮發性記憶體與靜態隨機存取記憶體組成的混合性草稿式記憶體被廣泛使用於晶片裡的記憶體,以利用非揮發性記憶體的低靜態能耗和高密度的優點,但由於非揮發性記憶體的寫入成本遠大於靜態隨機存取記憶體,所以也必須避免寫入非揮發性記憶體次數過多。在多核心系統下,每個核心有各自的本地端混合性草稿式記憶體,核心上運行的應用程式可以存取本地端與遠端的混合性草稿式記憶體,因此,如果能根據程式需求,分配合適的混合性草稿式記憶體資源給程式,便可以減少非揮發性記憶體的寫入,且提高記憶體使用量,降低耗電量。因此,我們提出了一個動態的混合性草稿式記憶體的即時管理策略,在多核心系統下,根據程式執行期的存取行為去動態分配混合性草稿式記憶體資源,此外,當儲存在非揮發性記憶體裡的資料變成寫入密集時,這筆資料將會被搬移到靜態隨機存取記憶體,以減少非揮發性記憶體的大量寫入。我們的方法結合硬體與軟體的實作。
    實驗結果顯示,我們的方法和傳統相同面積的靜態隨機存取記憶體組成的草稿式記憶體相比,在四核心系統與八核心系統下分別可以改善平均效能47%和43%,此外,硬體成本約1.08%

    As requirements for high performance keeps growing, multi-core systems have become one of most promising designs in embedded systems. With the number of cores increasing, the energy consumed by cache has become extremely high. Therefore, scratchpad memory (SPM), a software-controlled on-chip memory, have been increasingly adopted in multi-core embedded systems as a substitute for caches. Moreover, due to the increasing leakage power of traditional on-chip SRAM, Hybrid SPM consisting of SRAM and non-volatile memory (NVM) is proposed to exploit the benefits of low leakage power and high density of NVM while avoiding the problem of expensive writes of NVM. In a multi-core system with hybrid SPMs, each core has local hybrid SPM and can access its local hybrid SPM and hybrid SPMs of other cores (i.e., remote hybrid SPMs). Therefore, how to allocate these hybrid SPMs to the tasks running in multi-core systems will influence the energy consumption. This paper proposes a demand-aware online hybrid SPM management for multi-core systems (DOSAM). DOSAM considers the demands of each task and runtime access behaviors to allocate the on-chip SRAM and NVM spaces for each task. In addition, the in-NVM data will be migrated to SRAM when they become write-intensive. DOSAM is implemented by the cooperation of hardware and software.
    Evaluation results show that DOSAM can reduce the energy delay product (EDP) by up to 63% (47% on average) in 4-core systems and 57% (43% on average) in 8-core systems, compared to SRAM-based SPMs in multi-core architecture. Furthermore, the hardware area overhead is significant (about 1.08%).

    摘要 ………………………………………………………………………………………. I ABSTRACT ……………………………………………………………………………… II 誌謝 ……………………………………………………………………………………... III CONTENT ………………………………………………………………………………. IV LIST OF TABLES ………………………………………………………………………... V LIST OF FIGURES ……………………………………………………………………… VI Chapter 1 INTRODUCTION ……………………………………………………………... 7 Chapter 2 RELATED WORK …………………………………………………………….10 2.1 SPM allocation methods ………………………………………………………... 10 2.2 Hybrid SPM management methods …………………………………………….. 13 Chapter 3 DESIGN of DOSAM …………………………………………………………. 15 3.1 Overall Architecture ……………………………………………………………. 15 3.2 SPM Allocator ………………………………………………………………….. 16 3.3 NVM-write Estimator …………………………………………………………... 20 3.4 SPM Data Migrator …………………………………………………………….. 22 3.5 Page Write Booking Circuits (PWBC) …………………………………………. 27 3.6 Access Flow …………………………………………………………………….. 28 Chapter 4 PERFORMANCE EVALUATION ………………………………………….... 30 4.1 Simulation Environment ………………………………………………………... 30 4.2 Effectiveness of DOSAM ………………………………………………………. 33 4.3 Effectiveness of the SPM Allocator/SPM Data Migrator …………………….... 36 4.4 Migration Overhead ……………………………………………………………. 37 4.5 Determining the Period of SPM Data Migrator ………………………………... 38 4.6 Counter Size and Area Overhead ………………………………………………. 39 Chapter 5 CONCLUSION ……………………………………………………………….. 41 REFERENCES …………………………………………………………………………... 42

    [1] T. Mück and A. Frohlich, "A run-time memory management approach for scratch-pad-based embedded systems," in Proc. IEEE Emerging Technologies and Factory Automation, 2010, pp. 1-4.
    [2] M. Verma and P. Marwedel, “Overlay techniques for scratchpad memories in low power embedded processors,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 8, pp. 802-815, 2006.
    [3] M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, “Compiler-directed scratch pad memory optimization for embedded multiprocessors,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, pp. 281-287, 2004.
    [4] F. Menichelli and M. Olivieri, “Static minimization of total energy consumption in memory subsystem for scratchpad-based systems-on-chips,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17(2), pp. 161 –171, 2009.
    [5] M. Kandemir and A. Choudhary, “Compiler-directed scratch pad memory hierarchy design and management,” in Proc. ACM Design Automation Conf., 2002, pp. 628-633.
    [6] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, “Scratchpad memory: A design alternative for cache on-chip memory in embedded systems,” in Proc. ACM 10th Int. Symp. Hardw./Softw. Codesign (CODES), May 2002, pp. 73-78.
    [7] A. Shrivastava, N. Dutt, J. Cai, M. Shoushtari, B. Donyanavard, and H. Tajik, “Automatic management of Software Programmable Memories in Many-core Architectures,” IET Computers & Digital Techniques, vol. 10, no. 6, pp. 288-298, 2016.
    [8] Y. Guo, Q. Zhuge, and J. Hu, “Data Placement and Duplication for Embedded Multicore Systems with Scratch Pad Memory,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 32, no. 6, pp. 809-817, June 2013.
    [9] F. Menichelli and M. Olivieri, “Static minimization of total energy consumption in memory subsystem for scratchpad-based systems-on-chips,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17(2), pp. 161 –171, 2009.
    [10] R. Lorenzo and S. Chaudhury, “Review of circuit level leakage minimization techniques in CMOS VLSI circuits,” IETE Technical Review, vol. 34, no. 2, pp. 165 –187, 2017.
    [11] D. Apalkov, A. Khvalkovskiy, S. Watts, V. Nikitin, X. Tang, D. Lottis, K. Moon, X. Luo, E. Chen, A. Ong, A. Driskill-Smith, and M. Krounbi, “Spin-transfer torque magnetic random access memory (STT-MRAM),” ACM J. Emerg. Technol. Comput. Syst., vol. 9, no. 2, pp. 13:1–13:35, May 2013.
    [12] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, “Architecting phase change memory as a scalable dram alternative,” in Proc. IEEE/ACM Int. Symp. Comput. Archit. (ISCA), Jun. 2009, pp. 2–13.
    [13] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, “Scalable high performance main memory system using phase-change memory technology,” in Proc. IEEE/ACM Int. Symp. Comput. Archit. (ISCA), Jun. 2009, pp. 24–33.
    [14] C. Xu et al., “Overcoming the challenges of crossbar resistive memory architectures,” in Proc. IEEE 21st Int. Symp. High Perform. Comput. Archit. (HPCA), Feb. 2015, pp. 476–488.
    [15] J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E.-M. Sha, “Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory,” in Proc. of the Conf. on Design, Automation and Test in Europe, 2011, pp. 1-6.
    [16] J. Hu, Q. Zhuge, C. J. Xue, W. Tseng, and E.-M. Sha, “Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors,” ACM Trans. on Embedded Computing Systems (TECS), vol. 13, no. 79, November 2014.
    [17] M. Qiu, Z. Chen, J. Niu, Z. Zong, G. Quan, X. Qin, and L. T. Yang, “Data allocation for hybrid memory with genetic algorithm,” IEEE Transactions on Emerging Topics in Computing, vol. 3, no. 4, pp. 544-555, 2015.
    [18] L. A. Bathen and N. Dutt, “HaVOC: a hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and non-volatile memories,” in Proc. ACM Design Automation Conf., 2012, pp. 447-452.
    [19] D. Lee and K. Choi, “Energy-efficient partitioning of hybrid caches in multi-core architecture,” in Int. Conf. Very Large Scale Integr. (VLSI-SoC), Oct. 2014, pp. 58-74.
    [20] I. Lin and J. Chiou, “High-Endurance Hybrid Cache Design in CMP Architecture with Cache Partitioning and Access-Aware Policies,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 10, pp. 2149-2161, Oct. 2015.
    [21] D. Chang, I. Lin, and L. Yong, “ROHOM: Requirement-Aware Online Hybrid On-Chip Memory Management for Multicore Systems,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 36, no. 3, pp. 357-369, Mar. 2017.
    [22] S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel, “Assigning program and data objects to scratchpad for energy reduction,” in Proc. of the Conf. on Design, Automation and Test in Europe, 2002, pp. 409-415.
    [23] O. Avissar, R. Barua, and D. Stewart, “An optimal memory allocation scheme for scratch-pad-based embedded systems,” ACM Trans. Embedded Computing Syst. (TECS), vol. 1, pp. 6-26, 2002.
    [24] F. Angiolini, L. Benini, and A. Caprara, “An efficient profile-based algorithm for scratchpad memory partitioning,” IEEE Trans. Comput.-Aided Des. of Integr. Circuits and Syst., vol. 24, pp. 1660-1676, 2005.
    [25] S. Chattopadhyay and A. Roychoudhury, “Static bus schedule aware scratchpad allocation in multiprocessors,” in Proceedings of the 2011 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems, 2011, pp. 11-20.
    [26] M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, “Dynamic management of scratch-pad memory space,” in Proc. ACM Design Automation Conf., 2001, pp. 690-695.
    [27] M. Kandemir, J. Ramanujam, and A. Choudhary, “Exploiting shared scratch pad memory space in embedded multiprocessor systems,” in Proc. ACM Design Automation Conf., 2002, pp. 219-224.
    [28] A. Janapsatya, S. Parameswaran, and A. Ignjatovic, “Hardware/software managed scratchpad memory for embedded system,” in Int. Conf. Comput. Aided Design, Nov. 2004, pp. 370-377.
    [29] H. Takase, H. Tomiyama, and H. Takada, “Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems,” in Proc. IEEE DATE, Dresden, Germany, 2010, pp. 1124–1129.
    [30] Z.-H. Chen, and A. W.-Y. Su, “A hardware/software framework for instruction and data scratchpad memory allocation,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 7, no. 1, 2010.
    [31] L. A. D. Bathen, N. D. Dutt, D. Shin, and S.-S. Lim, “SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories,” in Proc. IEEE CODES+ISSS, Taipei, Taiwan, 2011, pp. 79–88.
    [32] L. Alvarez, L. Vilanova, M. Moreto, M. Casas, M. González, X. Martorell, N. Navarro, E. Ayguadé, and M. Valero, “Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures,” in Proc. IEEE/ACM Int. Symp. Comput. Archit. (ISCA), Jun. 2015, pp. 720–732.
    [33] W. Ji, N. Deng, F. Shi, Q. Zuo, and J. Li, “Dynamic and adaptive SPM management for a multi-task environment,” Journal of Systems Architecture, vol. 57, no. 2, pp. 181-192, 2011.
    [34] D. Chang, I. Lin, Y. Chien, C. Lin, A. Su, and C. Young, “CASA: Contention-Aware Scratchpad Memory Allocation for Online Hybrid On-Chip Memory Management,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 33, no. 12, pp. 1806-1817, 2014.
    [35] J. Hu, C. J. Xue, Q. Zhuge, W. C. Tseng, and E. M. Sha, “Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory,” in Proc. IEEE Design, Automation & Test in Europe, 2011, pp. 1-6.
    [36] Y. Etsion and D. G. Feitelson, “L1 cache filtering through random selection of memory references,” in Proc. ACM PACT, 2007, pp. 235–244.
    [37] N. Binkert et al., “The gem5 simulator,” ACM SIGARCH Comput. Archit. News, vol. 39, no. 2, 2011.
    [38] X. Dong, C. Xu, and Y. Xie, “NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 31, no. 7, 2012.
    [39] C. Huang and V. Nagarajan, “ATCache: reducing DRAM cache latency via a small SRAM tag cache,” in Proc. ACM PACT, 2014, pp. 51-60.
    [40] M. R. Guthaus et al., “MiBench: A free, commercially representative embedded benchmark suite,” in Proc. IEEE WWC, 2001, pp. 3–14.
    [41] Synopsys Inc. (2013, Aug.). Accelerate Design Innovation with Design Compiler. [Online]. Available: http://www.synopsys.com/tools/implementation/rtlsynthesis/pages/default.aspx
    [42] A. Stillmaker, Z. Xiao, and B. Baas., “Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm,” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.447.2582&rep=rep1&type=pdf, 2011.

    無法下載圖示 校內:2022-11-21公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE