| 研究生: |
陳忠和 Chen, Zhong-Ho |
|---|---|
| 論文名稱: |
草稿式記憶體配置之軟硬體架構 A Hardware/Software Framework for Instruction and Data Scratchpad Memory Allocation |
| 指導教授: |
蘇文鈺
Su, W. Y. Alvin |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 草稿式記憶體 、配置 、軟硬體架構 |
| 外文關鍵詞: | Scratchpad Memory, Allocation, Hardware/Software Framework |
| 相關次數: | 點閱:105 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
以往的研究顯示草稿記憶體(Scratchpad Memory)在相同容量下比快取記憶體(Cache Memory)消耗較少的能量。在這篇論文中,我們將草稿記憶體放置在記憶體架構的最上層以減少能量消耗。我們指出了兩個議題來有效利用草稿記憶體。第一點,必須加強程式記憶體存取的局部性;第二點,必須有效管理草稿記憶體。為了解決這兩個議題,我們提出了草稿記憶體上動態配置指令及資料的軟硬體架。軟體架構可分為三部份:局部性加強、局部性萃取以及執行期管理草稿記憶體。我們可以不用改變程式編譯工具及原始碼就可以加強程式的局部性。我們提出了一個最佳化的演算法來萃取程式的局部性。在程式執行期間,我們使用了系統軟體來管理草稿記憶體。在硬體架構上,我們提出了位址轉換器(Address Translation Logic)來減少管理草稿記憶體所造成的額外負擔。
實驗結果顯示和傳統的快取記憶體比較,所提出的架構可比減少平均耗能延遲乘積(EDP)63%。所減少的部份主要是由適當地配置指令及資料在草稿記憶體上。如果只配置指令可以減少平均耗能延遲乘積45% ;如果只配資料則可以減少平均耗能延遲乘積(EDP)14%。
Previous researches show that a scratchpad memory device consumes less energy than a cache device with the same capacity. In this dissertation, we locate the scratchpad memory (SPM) in the top level of the memory hierarchy to reduce the energy consumption. To take the advantage of a scratchpad memory, we address two issues of utilizing a scratchpad memory. First, the program’s locality should be improved. The second issue is scratchpad memory management. To tackle these two issues, we present a hardware/software framework for dynamically allocating both instructions and data in scratchpad memory. The software flow could be divided into three phases: locality improving, locality extraction and runtime SPM management. Without modifying the original compiler and the source code, we improve the locality of a program. An optimization algorithm is proposed to extract the scratchpad memory allocations. At runtime, an SPM management program is employed. In hardware, an address translation logic (ATL) is proposed to reduce the overhead of SPM management.
The results show that the proposed framework can reduce energy delay product (EDP) by 63% in average when compared with the traditional cache architecture. The reduction in EDP is contributed by properly allocating both instructions and data in SPM. By allocating only instructions in SPM, the EDPs are reduced by 45% in average. By allocating only data in SPM, the EDPs are reduced by 14% in average.
[1] ANGIOLINI, F., BENINI, L. AND CAPRARA, A. 2005. An efficient profile-based algorithm for scratchpad memory partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24, 1660-1676.
[2] AVISSAR, O., BARUA, R. AND STEWART, D. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, 2001, ACM.
[3] AVISSAR, O., BARUA, R. AND STEWART, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Transaction on Embedded Computing Systems 1, 6-26.
[4] BANAKAR, R., STEINKE, S., LEE, B.-S., BALAKRISHNAN, M. AND MARWEDEL, P. 2002. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the tenth international symposium on Hardware/software codesign, Estes Park, Colorado 2002 ACM, 73-78.
[5] BROCKMEYER, E., MIRANDA, M., CORPORAAL, H. AND CATTHOOR, F. 2003. Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations. In Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, 2003 IEEE Computer Society.
[6] CHEN, T.-F. and BAER, J.-L. 1995. Effective Hardware-Based Data Prefetching for High-Performance Processors. IEEE TRANSACTIONS ON COMPUTERS 1995.
[7] CHEN, Z.-H., 2009, NCKU SPM SIMULATOR, http://code.google.com/p/nckuspmsimulator/
[8] CMELIK, B. AND KEPPEL, D. 1994. Shade: a fast instruction-set simulator for execution profiling. In Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, Nashville, Tennessee, United States, 1994, ACM.
[9] HATFIELD, D.J. AND GERALD, J.G. 1971. Program Restructuring for Virtual Memory. IBM System Journal 10, 168.
[10] DOMINGUEZ, A., UDAYAKUMARAN, S. AND BARUA, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing 1, 521-540.
[11] EGGER, B., LEE, J., AND SHIN, H. 2006. Scratchpad memory management for portable systems with a memory management unit. In Proceedings of the 6th ACM & IEEE International conference on Embedded software, Seoul, Korea, 2006, ACM.
[12] EGGER, B., LEE, J. and SHIN, H. Dynamic scratchpad memory management for code in portable systems with an MMU. ACM Trans. Embed. Comput. Syst., 7, 2 2008), 1-38.
[13] FRANCESCO, P., MARCHAL, P., ATIENZA, D., BENINI, L., CATTHOOR, F. AND MENDIAS, J.M. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st annual conference on Design automation, San Diego, CA, USA, 2004, ACM.
[14] FRASER, C.W. 1991. A retargetable compiler for ANSI C. SIGPLAN Notice 26, 29-43.
[15] FRASER , C.W and Hanson D.R. The lcc 4.x Code-Generation Interface, 2003, online: http://drhanson.s3.amazonaws.com/storage/documents/interface4.pdf
[16] GUTHAUS, M.R., RINGENBERG, J.S., ERNST, D., AUSTIN, T.M., MUDGE, T. AND BROWN, R.B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In 2001 IEEE Workshop on Workload Characterization, 2001, 3-14.
[17] HALLNOR, E.G. AND REINHARDT, S.K. 2000. A fully associative software-managed cache design. ACM SIGARCH Computer Architecture News 28, 107-116.
[18] JANAPSATYA, A., PARAMESWARAN, S. AND A., I. 2004. Hardware/software managed scratchpad memory for embedded system. In Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design, 2004, IEEE Computer Society, 370-377.
[19] JANAPSATYA, A., IGNJATOVI?, A. AND PARAMESWARAN, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the 2006 conference on Asia South Pacific design automation, Yokohama, Japan 2006 IEEE Press.
[20] KALAMATIONOS, J. AND KAELI, D.R. 1998. Temporal-based procedure reordering for improved instruction cache performance. In Proceedings of the 1998 Fourth International Symposium on High-Performance Computer Architecture, 1998, 244-253.
[21] KANDEMIR, M., RAMANUJAM, J., IRWIN, M.J., VIJAYKRISHNAN, N., KADAYIF, I. AND PARIKH, A. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the Design Automation Conference 2001, 690-695.
[22] KANDEMIR, M., KADAYIF, I. AND SEZER, U. 2001. Exploiting scratch-pad memory using Presburger formulas. In Proceedings of the 14th international symposium on Systems synthesis, 2001, ACM.
[23] KIROVSKI, D., LEE, C., POTKONJAK, M. AND MANGIONE-SMITH, W.H. 1999. Application-driven synthesis of memory-intensive systems-on-chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 18, 1316-1326.
[24] Micron Technique, Inc, 2009, System Power Calculator, http://www.micron.com/support/part_info/powercalc
[25] NGUYEN, N., DOMINGUEZ, A. AND BARUA, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, 2005, ACM.
[26] PANDA, P.R., DUTT, N.D. AND NICOLAU, A. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In European Design and Test Conference 1997 (ED&TC 97), 1997, 7-11.
[27] PANDA, P.R., NIKIL, D.D. AND ALEXANDRU, N. 2000. On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems 5, 682-704.
[28] PARK, S., PARK, H.-W. AND HA, S. 2007. A novel technique to use scratch-pad memory for stack management. In Proceedings of the conference on Design, automation and test in Europe, Nice, France, 2007, EDA Consortium.
[29] PETTIS, K. AND HANSEN, R.C. 1990. Profile guided code positioning. SIGPLAN Notice 25, 16-27.
[30] PYKA, R., FABACH, C., VERMA, M., FALK, H. AND MARWEDEL, P. 2007. Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications. In Proceedings of the 10th international workshop on Software & compilers for embedded systems, 2007, ACM.
[31] SJODIN, J., FR DERBERG, B. AND LINDGREN, T. 1998. Allocation of Global Data Objects in On-Chip RAM. In Compiler and Architecture Support for Embedded Computing Systems (CASES 98), 1998
[32] SJODIN, J. AND PLATEN, C.V. 2001. Storage allocation for embedded processors. In Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, 2001, ACM.
[33] STEINKE, S., GRUNWALD, N., WEHMEYER, L., BANAKAR, R., BALAKRISHNAN, M. AND MARWEDEL, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In Proceedings of the 15th International Symposium on System Synthesis, 2002, 213-218.
[34] STEINKE, S., WEHMEYER, L., LEE, B.-S. AND MARWEDEL, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, 409-415.
[35] UDAYAKUMARAN, S. AND BARUA, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, 276-286.
[36] UDAYAKUMARAN, S., DOMINGUEZ, A. AND BARUA, R. 2006a. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Transactions on Embedded Computing Systems (TECS) 5, 472-511.
[37] UDAYAKUMARAN, S. AND BARUA, R. 2006b. An integrated scratch-pad allocator for affine and non-affine code. In Proceedings of the conference on Design, automation and test in Europe, Munich, Germany, 2006, European Design and Automation Association.
[38] VERMA, M., WEHMEYER, L. AND MARWEDEL, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2004, 1264-1269 Vol.1262.
[39] VERMA, M., WEHMEYER, L. AND MARWEDEL, P. 2004b. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, Stockholm, Sweden, 2004, ACM.
[40] VERMA, M., PETZOLD, K., WEHMEYER, L., FALK, H. AND MARWEDEL, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: a first approach. In Proceedings of the Embedded Systems for Real-Time Multimedia, 2005. 3rd Workshop on, 115-120.
[41] WEHMEYER, L., HELMIG, U. AND MARWEDEL, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, Munich, Germany 2004 ACM.
[42] WILTON, S. AND JOUPPI, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5, 677–688.
[43] WOLF, M.E. AND LAM, M.S. 1991. A data locality optimizing algorithm. SIGPLAN Notice 26, 30-44.