簡易檢索 / 詳目顯示

研究生: 吳柏翰
Wu, Bo-Han
論文名稱: 用於新穎晶片內混合式記憶體之資料存取行為感知的線上配置方法
Data Access Behavior Aware Online Allocation in an Innovative Hybrid On-Chip Memory
指導教授: 楊中平
Young, Chung-Ping
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 66
中文關鍵詞: 草稿式記憶體晶片上記憶體配置方法
外文關鍵詞: SPM, on chip memory, reposition, online allocation
相關次數: 點閱:68下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了要降低嵌入式系統的能量消耗,有許多研究採用草稿式記憶體來取代單純的快取或是混合快取與草稿式記憶體的架構。除了記憶體架構之外,如何配置資料方法也是一項重要的議題。傳統的草稿式記憶體配置的研究在程式執行前需要分析資料。分析程式需要花很多的時間且可能會預測錯誤。因此,有研究提出不需要執行前分析的線上草稿式記憶體的配置方法。為了優化記憶體架構與配置方法,本論文提出一個新穎晶片內混合式記憶體之資料存取行為感知的線上配置方法。在記憶體架構方面,我們提出了設置一小塊獨立的靜態隨機存取存儲記憶體去放置經常被存取的資料。在資料配置方法方面,我們根據資料在快取裡面的存取行為去決定要配置在哪塊記憶體上。為了實現本論文所提出來的架構與配置方法,為了能夠定址到我們新增的記憶體,我們加一塊小型的轉譯後備緩衝區在我們的系統上。另外,我們增加了一些額外的電路去記錄配置方法所需要的資訊。
    我們使用Mibench去跟先前的研究及快取架構來比較能量延遲乘積以測量我們所提出來的系統效能。實驗結果顯示,我們相較於傳統快取架構降低了50百分比的能量延遲乘積,相較於先前的線上草稿式記憶體的配置方法降低了25百分比的能量延遲乘積。

    In order to reduce the energy consumption of processor chips in embedded systems, there are many studies that use scratchpad memory (SPM) to replace the L1 cache or use SPM and cache hybrid on-chip memory architecture. Apart from modifying the memory architecture, another impartment research topic is data allocation policy. The traditional SPM allocation policies have to profile before program execution. This requires a lot of time for analysis and predictions may be wrong. Therefore, there are some studies that propose using online SPM allocation, which does not have to profile. To optimize the architecture and allocation policy, this thesis proposes using an online SPM allocation in innovative hybrid on-chip memory architecture. In innovative hybrid on-chip memory architecture, we propose adding a small individual hot data placement memory (HPM) to place the hot data. In online allocation policy, we judge according to data access behavior in cache to determine where the data should be placed. For implementation of the proposed policy and architecture, we modified the on-chip architecture with tinyTLB to redirect to HPM and for the circuit to record the information of proposed migration policies.
    This thesis uses Mibench to measure the proposed system comparing energy delay product (EDP) with previous online SPM allocation studies and cache architectures. The experiment results show that this thesis can reduce the average EDP by 50% with cache, 25% with online SPM allocation study.

    Content 摘要 I Abstract II 誌謝 III 名詞縮寫 V Content VI List of Tables IX List of Figures X Chapter 1 Introduction 1 1.1 Motivation 1 1.2 System Overview 3 Chapter 2 Background Knowledge and Related Works 5 2.1 On-chip Memory Architecture 5 2.1.1 Cache 6 2.1.2 SPM 8 2.1.3 SPM and Cache Hybrid Architecture 12 2.2 SPM allocation policy 13 2.2.1 Static allocation 14 2.2.2 Dynamic allocation 15 2.2.3 Runtime allocation with profiling 16 2.2.4 Runtime allocation without profiling 17 Chapter 3 Memory System Design 22 3.1 On-chip memory architecture 22 3.2 Migration policy 24 Chapter 4 Hardware design 28 4.1 System Hardware Architecture 28 4.1.1 TinyTLB 29 4.1.2 Block Counter 32 4.2 The mechanism of access and migration 35 4.2.1 Data access in DABA system 35 4.2.2 Data migration in DABA system 37 Chapter 5 Experiment Results 40 5.1 Simulation Environment 40 5.2 The software control threshold 46 5.2.1 Threshold of Miss Count and Hit Count 46 5.2.2 Migration from HPM to SPM 48 5.3 The hardware configuration of DABA allocation system 51 5.3.1 The cache associativity 51 5.3.2 TinyTLB and HPM 53 5.4 The comparison with previous studies 55 5.4.1 Comparing with cache architecture and online SPM policies 55 5.4.2 The CASA and Ji’s study 57 Chapter 6 Conclusion and Future Work 61 6.1 Conclusion 61 6.2 Future work 61 References 63

    References
    [1] F. Vahid and T. Givargis, Embedded System Design: A Unified Hardware/Software Introduction. USA: Wiley, 2002.
    [2] C. C. Fan, "Self-Tuning Policy for Dynamic Power Management on Non-Stationary Service Requests," Master Thesis, Department of Computer Science and Information Engineering, National Cheng Kung University, 2005.
    [3] M. Verma and P. Marwedel, "Overlay techniques for scratchpad memories in low power embedded processors," in Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 14, pp. 802-815, 2006.
    [4] Mu, x, T. R. ck, and A. A. Frohlich, "A run-time memory management approach for scratch-pad-based embedded systems," in Emerging Technologies and Factory Automation (ETFA), 2010 IEEE Conference on, pp. 1-4, 2010.
    [5] D. Brash, The ARM architecture Version 6 (ARMv6), ARM Ltd., White Paper, 2002.
    [6] IBM. The cell project. [Online]. Available: http://www.research.ibm.com/cell/.
    [7] Philips LPC3180 microcontroller. [Online]. Available: http://www.standardics.nxp.com/products/lpc3000/lpc3180, 2009
    [8] P. R. Panda, N. D. Dutt, and A. Nicolau, "Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications," in the Proceedings of the 1997 European conference on Design and Test, 1997.
    [9] M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, "Dynamic management of scratch-pad memory space," in Design Automation Conference, 2001. Proceedings, pp. 690-695, 2001.
    [10] F. Angiolini, L. Benini, and A. Caprara, "Polynomial-time algorithm for on-chip scratchpad memory partitioning," in the Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, 2003.
    [11] Z.-H. Chen and A. W. Y. Su, "A hardware/software framework for instruction and data scratchpad memory allocation," in ACM Trans. Archit. Code Optim., vol. 7, pp. 1-27, 2010.
    [12] W. Ji, N. Deng, F. Shi, Q. Zuo, and J. Li, "Dynamic and adaptive SPM management for a multi-task environment," in Journal of Systems Architecture, vol. 57, pp. 181-192, 2011.
    [13] F. Angiolini, F. Menichelli, A. Ferrero, L. Benini, and M. Olivieri, "A post-compiler approach to scratchpad mapping of code," in the Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, 2004.
    [14] A. Moshovos, D. N. Pnevmatikatos, and A. Baniasadi, "Slice-processors: an implementation of operation-based prediction," in the Proceedings of the 15th international conference on Supercomputing, 2001.
    [15] C. Zilles and G. Sohi, "Execution-based prediction using speculative slices," in Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on, pp. 2-13, 2001.
    [16] A. Janapsatya, S. Parameswaran, and A. Ignjatovic, "Hardware/software managed scratchpad memory for embedded system," in the Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design, 2004.
    [17] Y. S. Chien, "Design of a Contention-aware Hybrid On-Chip Memory Management Mechanism," Master Thesis, Department of Computer Science and Information Engineering, National Cheng Kung University, 2013.
    [18] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," in Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, pp. 3-14, 2001.
    [19] N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A tool to understand large caches, " in HP Tech Report HPL-2009-85, 2009.
    [20] J. L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach fourth edition. USA: ELSEVIER, 2007.
    [21] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, "Scratchpad Memory: A Design Alternative for Cache On-chip memory in Embedded systems," in the Proceedings of the tenth international symposium on Hardware/software codesign, 2002.
    [22] L. Xue, O. ozturk, F. Li, M. Kandemir, and I. Kolcu, "Dynamic partitioning of processing and memory resources in embedded MPSoC architectures," in the Proceedings of the conference on Design, automation and test in Europe: Proceedings, 2006.
    [23] S. Chattopadhyay and A. Roychoudhury, "Static bus schedule aware scratchpad allocation in multiprocessors," in SIGPLAN Not., vol. 46, pp. 11-20, 2011.
    [24] L. A. D. Bathen, N. D. Dutt, S. Dongyoun, and L. Sung-Soo, "SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories," in Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2011 Proceedings of the 9th International Conference on, pp. 79-88, 2011.
    [25] A. Dominguez, S. Udayakumaran, and R. Barua, "Heap data allocation to scratch-pad memory in embedded systems," in Journal of Embedded Computing, vol. 1, pp. 521-540, 2005.
    [26] S. Udayakumaran and R. Barua, "Compiler-decided dynamic memory allocation for scratch-pad based embedded systems," in the Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, 2003.
    [27] T. R. Muck and A. A. Frohlich, "Run-time scratch-pad memory management for embedded systems," in IECON 2011 - 37th Annual Conference on IEEE Industrial Electronics Society, pp. 2833-2838, 2011.
    [28] V. Suhendra, T. Mitra, A. Roychoudhury, and C. Ting, "WCET centric data allocation to scratchpad memory," in Real-Time Systems Symposium, 2005. RTSS 2005. 26th IEEE International, pp. 10 pp.-232, 2005.
    [29] O. Avissar, R. Barua, and D. Stewart, "Heterogeneous memory management for embedded systems," in the Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, Atlanta, Georgia, 2001.
    [30] O. Avissar, R. Barua, and D. Stewart, "An optimal memory allocation scheme for scratch-pad-based embedded systems," in ACM Trans. Embed. Comput. Syst., vol. 1, pp. 6-26, 2002.
    [31] P. R. Panda, N. D. Dutt, and A. Nicolau, "On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems," in ACM Trans. Des. Autom. Electron. Syst., vol. 5, pp. 682-704, 2000.
    [32] S. Steinke, L. Wehmeyer, L. Bo-Sik, and P. Marwedel, "Assigning program and data objects to scratchpad for energy reduction," in Design, Automation and Test in Europe Conference and Exhibition, 2002. Proceedings, pp. 409-415, 2002.
    [33] H. Takase, H. Tomiyama, and H. Takada, "Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems," in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010, pp. 1124-1129, 2010.
    [34] R. Pyka, C. Faßbach, M. Verma, H. Falk, and P. Marwedel, "Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications," in the Proceedingsof the 10th international workshop on Software & compilers for embedded systems, 2007.
    [35] B. Egger, J. Lee, and H. Shin, "Scratchpad memory management in a multitasking environment," in the Proceedings of the 8th ACM international conference on Embedded software, 2008.
    [36] Cortex™-A7 MPCore™ Technical Reference Manual, [Online]. Available: http://infocenter.arm.com/help/index.jsp
    [37] D. Burger and T. M. Austin, "The SimpleScalar tool set, version 2.0," SIGARCH Comput. Archit. News, vol. 25, pp. 13-25, 1997.
    [38] Design Compiler, Synopsys Inc. (http://www.synopsys.com)

    下載圖示 校內:2018-09-12公開
    校外:2022-01-20公開
    QR CODE