| 研究生: |
林弋喬 Lin, Yi-Chiao |
|---|---|
| 論文名稱: |
多分區草稿式記憶體線上分群方法之設計 Design of an Online Data Clustering Method for Multi-bank Scratchpad Memory |
| 指導教授: |
張大緯
Chang, Da-Wei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 英文 |
| 論文頁數: | 52 |
| 中文關鍵詞: | 草稿式記憶體 、多分區草稿式記憶體 、資料分群 |
| 外文關鍵詞: | Scratchpad memory, Multi-bank Scratchpad memory, Data clustering |
| 相關次數: | 點閱:87 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在晶片的耗能上,最主要分為動態能耗與靜態能耗。然而隨著製程的進步,靜態能耗已經占整體功率中的50%。因此在靜態能耗上的設計已經越來越重要。在嵌入式系統中,草稿式記憶體相比於傳統的快取記憶體,擁有較少的耗電量以及總面積的減少。目前現有的研究已經提出在草稿式記憶體中分成許多區塊來進一步減少能量的消耗。現有的研究中,為了得知程式中的記憶體的存取模式,必須在程式執行前便分析程式的行為,再根據這些行為來分配記憶體。
在此篇論文中,我們提出了一個不需要事前分析的多分區草稿式記憶體線上分群方法之設計。此方法根據目前草稿式記憶體的頁訪問程度來區分熱度。熱的資料與冷的資料分配到不同的區塊。讓放置冷資料的區塊可以增加閒置時間並進入低電位模式來減少的靜態能耗。最後結果顯示,我們的方法和傳統的超時方法相比。在32KB 4-bank中,能量與延遲乘積平均可減少18.09%、32KB 8-bank中,可減少15.71%、16KB 4-bank中,可減少13.44%。16KB 8-bank中,可減少10.05%。除此之外,此方法所造成的額外的面積開銷會占據晶片記憶體總面積的0.72%。
Power consumption of on-chip memory is divided two parts, dynamic power and leakage power. With the scaling of CMOS devices, the portion of leakage power consumption increases and it can go over 50% of total power consumption in a CMOS device. Scratchpad memory (SPM), a software-controlled on-chip memory, has less access energy and higher area density when compared to an ordinary cache. Multi-bank SPM was proposed to further reduce the energy consumption.
This paper proposed OCBAS, a novel Online Data Clustering for Multi-bank Scratchpad memory. OCBAS identifies the hotness of each SPM page and clusters pages with different degrees of hotness into different SPM banks, increasing idleness for cold SPM banks and hence reducing the leakage energy of the SPM. Moreover, OCBAS allows cold SPM banks to enter the low power mode more aggressively, further reducing the leakage energy. Offline profiling is not required in OCBAS. The evaluation results show that OCBAS can reduce the energy delay product (EDP) by up to 37.71% (18.09% on average) in 32KB 4-bank, 25.33% (15.71% on average) in 32KB 8-bank, 28.99% (13.45% on average) in 16KB 4-bank, 30.18% (10.05% on average) in 16KB 8-bank SPM, compared to conventional time-out-based policy. Moreover, the area overhead of the hardware support is insignificant (about 0.72%).
[1] T. Mück and A. Frohlich, “A run-time memory management approach for scratch-pad-based embedded systems,” in Proc. ETFA, pp. 1-4, 2010.
[2] M. Verma and P. Marwedel, “Overlay techniques for scratchpad memories in low power embedded processors,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 14, pp. 802-815, 2006.
[3] M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, “Compiler-directed scratch pad memory optimization for embedded multiprocessors,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 12, pp. 281-287, 2004.
[4] S. Wuytack, F. Catthoor, L. Nachtergaele, and H. De Man, “Power exploration for data dominated video applications,” in Proc. Proceedings of the 1996 international symposium on Low power electronics and design, pp. 359-364, 1996.
[5] F. Menichelli, M. Olivieri. “Static minimization of total energy consumption in memory subsystem for scratchpad-based systems-on-chips,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17(2):pp. 161 –171, 2009.
[6] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, “Scratchpad memory: A design alternative for cache on-chip memory in embedded systems,” in Proc. ACM 10th Int. Symp. Hardw./Softw. Codesign (CODES), Estes Park, CO, USA, pp. 73-78, May. 2002.
[7] R. Banakar, S. Steinke, B. S. Lee, M. Balakrishnan, and P. Marwedel, “Scratchpad memory: a design alternative for cache on-chip memory in embedded systems,” in Proc. ACM CODES, pp. 73-78, 2002.
[8] L. Benini, L. Macchiarulo, A. Macii, and M. Poncino, “Layout driven memory synthesis for embedded systems-on-chip,” IEEE Trans. VLSI Syst., vol. 10, no. 2, pp. 96-105, Apr. 2002.
[9] O. Ozturk and M. Kandemir, “Nonuniform banking for reducing memory energy consumption,” in Proc. ACM/IEEE Design Automation and Test in Europe, Munich, Germany, pp. 814-819, Mar. 2005.
[10] F. Angiolini, L. Benini, and A. Caprara, “An efficient profilebased algorithm for scratchpad memory partitioning,” IEEE Trans. CAD, vol. 24, no. 11, pp. 1660-1676, Nov. 2005.
[11] F. Balasa, N. Abuaesh, C.V. Gingu, I.I. Luican and D.V. Nasui, “Energy-aware scratch-pad memory partitioning for embedded systems,” in 15th International Symposium on Quality Electronic Design (ISQED)., 2014, pp. 653-659. Mar. 2014.
[12] M. Loghi, et al. “Architectural Leakage Power Minimization of Scratchpad Memories by Application-Driven Subbanking,” IEEE Transactions on Computers, vol. 59, no. 7, July. 2010.
[13] M. Kandemir, et al. “Compiler-guided leakage optimization for banked scratch-pad memories,” In Proc. of Very Large Scale Integration (VLSI) Systems, 2005.
[14] C. Chen, et al. “Leakage-aware SPM management.” In Proc. of ISVLSI, 2006.
[15] Y. Huangfu and W. Zhang. “Compiler-directed leakage energy reduction for instruction scratch-pad memories,” in 15th International Symposium on Quality Electronic Design (ISQED), 2014, PP. 392-399.Mar. 2014.
[16] K. Flautner, N. S. Kim, S. Martin, D. Blaauw and T. Mudge, “Drowsy caches: simple techniques for reducing leakage power,” In Proc. of International Symposium on Computer Architecture (ISCA), pp. 148-157, May. 2002.
[17] S. Heo, K. Barr, M. Hampton, and K. Asanovic. “Dynamic fine-grain leakage reduction using leakage-biased bitlines,” In Proc. of the ISCA-29, Anchorage, Alaska, May 2002.
[18] M. D. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. “Gated-Vdd: A circuit technique to reduce leakage in cache memories,” In Proceedings of the 2000 International Symposium on Low Power Electronics and Design (ISLPED), pages 9C-95, July 2000.
[19] N. Kim, et al. “Drowsy Instruction Caches: Leakage Power Reduction Using Dynamic Voltage Scaling and Cache,” In Proc. of MICRO, 2002.
[20] S. Kaxiras. “Cache decay: exploiting generational behavior to reduce cache leakage power,” In Proc. of ISCA, 2001.
[21] X. Lu and Y. Fu, “Reducing leakage power in instruction cache using WDC for embedded processors,” in Proc. ASPDAC, pp. 1292–1295, Jan. 2005.
[22] H. Zhou, M. C. Toburen, E. Rotenberg, and T. M. Conte. “Adaptive mode control: a static power-efficient cache design,” In Proc. of PACT, Sep. 2001.
[23] M. D. Powell, S. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. “Reducing leakage in a high-performance deep-submicron instruction cache,” IEEE Transactions on VLSI, 9(1), Feb. 2001.
[24] H. Wen and W. Zhang, “Reducing Cache Leakage Energy for Hybrid SPM-Cache Architectures,” Compilers, Architecture and Synthesis for Embedded Systems (CASES), Oct, 2014.
[25] Y. Meng, T. Sherwood, and R. Kastner. “On the limits of leakage power reduction in caches,” In Proc. of HPCA, 2005.
[26] P. R. Panda, N. D. Dutt, and A. Nicolau, “Efficient utilization of scratch-pad memory in embedded processor applications,” in Proc. IEEE European conference on Design and Test, 1997, pp. 7-11.
[27] J. Hu, C. J. Xue, Q. Zhuge, W. C. Tseng, and E. M. Sha, “Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory,” in Proc. IEEE DATE, 2011, pp. 1-6.
[28] Z. H. Chen and A. W. Su, “A hardware/software framework for instruction and data scratchpad memory allocation,” IEEE Trans. TACO, vol. 7, p. 2, 2010.
[29] M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, “Dynamic management of scratch-pad memory space,” in Proc. ACM DAC, 2001, pp. 690-695.
[30] M. Kandemir, J. Ramanujam, and A. Choudhary, “Exploiting shared scratch pad memory space in embedded multiprocessor systems,” in Proc. ACM DAC, 2002, pp. 219-224.
[31] W. Ji, N. Deng, F. Shi, Q. Zuo, and J. Li, “Dynamic and adaptive SPM management for a multi-task environment,” J. Syst. Archit., vol. 57, no. 2, pp. 181-192, 2011.
[32] D. W. Chang, I. C. Lin, Y. S. Chien, C. L. Lin, A. W. Su, and C. P. Young, “CASA: Contention-Aware Scratchpad Memory Allocation for Online Hybrid On-Chip Memory Management,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 33, pp. 1806-1817, 2014.
[33] (2010) Linear Feedback Shift Register. [Online]. Avaliable: http://en.wikipedia.org/wiki/Linear_feedback_shift_register
[34] T. Austin, E. Larson, and D. Ernst, “SimpleScalar: An infrastructure for computer system modeling,” IEEE Trans. Computer, vol. 35, pp. 59-67, 2002.
[35] N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, “CACTI 6.0: A tool to understand large caches,” IEEE Trans. Univ. Utah and Hewlett Packard Lab.,Tech. Rep., 2009.
[36] C.-C. Huang and V. Nagarajan, “ATCache: reducing DRAM cache latency via a small SRAM tag cache,” in Proc. ACM PACT, 2014, pp. 51-60.
[37] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “MiBench: A free, commercially representative embedded benchmark suite,” in Proc. Workload Characterization, pp. 3-14, 2001.
[38] C. Lee, M. Potkonjak, and W. H. Mangione-Smith, “Mediabench: a tool for evaluating and synthesizing multimedia and communicatons systems,” in Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, 1997.
[39] http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages/default.aspx . Synopsys Design Compiler, 2010.
[40] J. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. Davis, P. Franzon, M. Bucher, S. Basavarajaiah, J. Oh, and Others, “Freepdk:An open-source variation-aware design kit,” in Microelectronic Systems Education, 2007. MSE’07. IEEE International Conference on, pp. 173-174, IEEE, 2007.