| 研究生: |
李銘峯 Lee, Ming-Fong |
|---|---|
| 論文名稱: |
嵌入式處理器之高關聯性指令快取記憶體節能設計 Energy-Efficient Design for Highly-Associative Instruction Caches in Embedded Processors |
| 指導教授: |
陳中和
Chen, Chung-Ho |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 英文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 省電設計 、嵌入式處理器 、指令快取記憶體 、高關聯性 |
| 外文關鍵詞: | low-power design, instruction cache, embedded processor, highly-associative |
| 相關次數: | 點閱:116 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
耗電量在現今嵌入式處理器設計上是一個重要的考量,相關研究指出指令快取記憶體的耗電量在處理器中佔了很大的比例。以英特爾(Intel) SA-110處理器為例,指令快取記憶體佔整個處理器耗電量的27%。目前許多省電型嵌入式處理器中,指令快取記憶體不再採用傳統RAM (random access memory)-tag設計,而改用高關聯性(highly-associative)、CAM (content addressable memory)-tag、區塊化(banked)的設計來減低耗電量,故以往針對RAM-tag指令快取記憶體所提出的省電機制在高關聯性CAM-tag指令快取記憶體中並不適用。因此本研究藉著way locality的特性,以及增加分支目標緩衝器的欄位,來節省高關聯性CAM-tag指令快取記憶體的耗電量。同時本研究採用了lazy BTB 技術來減少因增加分支目標緩衝器欄位所帶來額外的耗電量。實驗結果顯示,提出的設計在32KB、32-way的CAM-tag指令快取記憶體中至少可減少34% 的耗電量,且節省了50% 查詢分支目標緩衝器所需的耗電量,而無嚴重的效能減損。
Energy consumption is an important design consideration of modern embedded processors. It has been shown that the instruction cache accounts for a significant portion of energy dissipation of the whole processor. For example, the Intel SA-110 low-power microprocessor dissipates 27 percent of total processor power in the instruction cache.
At present, the instruction caches in a lot of low-power embedded processors do not use the traditional RAM-tag design; they employ highly-associative, CAM (content addressable memory)-tag and banked design for reducing energy consumption. So the previously proposed low-power mechanisms for RAM-tag instruction caches are not applicable in the highly-associative CAM-tag structure. In this study, the proposed design reduces the energy consumptions of instruction caches by taking the characteristic of way locality. Additionally, the lazy BTB technique is used to reduce extra BTB energy overhead caused due to the added BTB fields. The experimental results show that the proposed design can reduce the energy consumptions of a 32KB, 32-way instruction cache at least by 34 percent and BTB look-up by 50 percent with negligible performance degradation.
[1]L. T. Clark, et al., “An embedded 32b microprocessor core for low-power and high-performance applications,” in Proc. of IEEE Journal Solid-State Circuits, vol.36, no.11, pp.1599–1608, November 2001.
[2]M. Zhang and K. Asanovic, “Highly-associative caches for low-power processors,” in Proc. of Kool Chips Workshop, 33rd International Symposium on Microarchitecture, December 2000.
[3]J. Montanaro, et al., “A 160 MHz, 32b 0.5W CMOS RISC microprocessor,” in Proc. of International Solid-State Circuits Conference, vol.31, no.11, pp.1703–1714, November 1996.
[4]Darrell Dunn, “Samsung claims first ARM design to exceed 1-GHz,”Available: http://www.eetimes.com/story/OEG20021017S0021, October 2002.
[5]E. Witchel, S. Larsen, C.S. Ananian, and K. Asanovic, “Direct addressed caches for reduced power consumption,” in Proc. of 34rd International Symposium on Microarchitecture, pp.124-133, 2001.
[6]A. Veidenbaum and D. Nicolaescu, “Low energy, highly-associative cache design for embedded processors,” in Proc. of Computer Design: VLSI in Computers and Processors, pp.332-335, 2004.
[7]J. E. Smith, “A study of branch prediction strategies,” in Proc. of the 8th Annual International Symposium on Computer Architecture, pp. 135–48, May 1981.
[8]J. Kin, M. Gupta, and W. H. Mangione-Simith, “The filter cache: An energy efficient memory structure,” in Proc. of 30th Annual International Symposium on Microarchitecture, pp.184-193, December 1997.
[9]N. E. Bellas and I. N. Hajj and C. D. Polychronopoulos, “Using dynamic cache management techniques to reduce energy in general purpose processors,” in Proc. of IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.8, no.6, pp.693–708, December 2000.
[10]W. Tang, R. Gupta, and A. Nicolau, “Design of a predictive filter cache for energy savings in high performance processor architectures,” in Proc. of International Conference on Computer Design (ICCD), pp.68–73, 2001.
[11]S. Bhattacharyya, T. Srikanthan and K. Vivekanandarajah, “Area and power efficient pattern prediction architecture for filter cache access prediction in the instruction memory hierarchy,” in Proc. of International Symposium on VLSI Design, Automation and Test, pp.345-348, 2005.
[12]C.L. Yang and C.H. Lee, “HotSpot cache: joint temporal and spatial locality exploitation for I-cache energy reduction,” in Proc. of IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED 2004), pp.114-119, August 2004.
[13]A.Gordon-Ross, S. Cotterell and F. Vahid, “Exploiting fixed programs in embedded systems: A loop cache example,” in Proc. of Computer Architecture, vol.1, no.1, pp.2-2, January 2002.
[14]K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory (CAM) circuits and architectures: A tutorial and survey,” in Proc. of IEEE Journal Solid-State Circuits, vol.41, no.3, pp.712–727, March 2006.
[15]R. Panwar and D. Rennels, “Reducing the frequency of tag compares for low power I-cache design,” in Proc. of International Symposium on Low Power Electronics and Design, pp.57-62, 1995.
[16]A. Efthymiou and J.D. Garside, “An adaptive serial-parallel CAM architecture for low-power cache blocks,” in Proc. of International Symposium on Low Power Electronics and Design, pp.136-141, 2002.
[17]K. Pagiamtzis and A. Sheikholeslami, “A low-power content-addressable memory (CAM) using pipelined hierarchical search scheme,” in Proc. of IEEE Journal of Solid-State Circuits, vol.39, no.9, pp.1512–1519, September 2004.
[18]K. Inoue, T. Ishihara, and K. Murakami, “Way-predicting set-associative cache for high performance and low energy consumption,” in Proc. of International Symposium on Low Power Electronics and Design, pp.273–275, August 1999.
[19]A. Ma, M. Zhang, and K. Asanovic, “Way memoization to reduce fetch energy in instruction caches,” in Proc. of ISCA Workshop on Complexity Effective Design, July 2001.
[20]B. Calder, and D. Grunwald, “Next cache line and set prediction,” in Proc. of the 22nd annual international symposium on Computer architecture, pp.287-296, 1995.
[21]W. Tang, R. Gupta, A. Nicolau, and A.V. Veidenbaum, “Simultaneous way-footprint prediction and branch prediction,” in Proc. of IEEE Workshop on Power Management for Real-Time and Embedded System, 2001.
[22]K. Ghose and M. B. Kamble, “Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation,” in Proc. of International Symposium on Low Power Electronics and Design, pp.70–75, 1999.
[23]J. L. Aragon, D. Nicolaescu, A. Veidenbaum and A. Badulescu , “Energy-efficient design for highly associative instruction caches in next-generation embedded processors,” in Proc. of the International Conference on Design, Automation and Test in Europe (DATE), February 2004.
[24]Y. J. Chang, “Lazy BTB: Reduce BTB energy consumption using dynamic profiling,” in Proc. of Asia and South Pacific Conference on Design Automation, pp.24-27, January 2006.
[25]Y. J. Chang, “An energy-efficient BTB lookup scheme for embedded processors,” in Proc. of IEEE Transactions on Circuits and Systems II, vol.53, no.9, pp. 817-821, September 2006.
[26]M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “MiBench: A free, commercially representative embedded benchmark suite,” in Proc. of IEEE 4th Annual Workshop on Workload Characterization, pp.3-14, December 2001.
[27]“SimpleScalarLLC,Simplescalar/ARMAnnounce,” Available:
http://www.eecs.umich.edu/~taustin/code/arm/ANNOUNCE.ARM.
[28]D. Tarjan, S. Thoziyoor, and N. P. Jouppi. “CACTI v4.0,” in technical report, HP laboratories, 2006.
[29]“Intel XScale Core Developer’s Manual,” January 2004.