簡易檢索 / 詳目顯示

研究生: 許家榮
Hsu, Chia-Jung
論文名稱: 利用虛擬位址壓縮減少高效能處理器之分支目標緩衝器及載入儲存佇列之面積及功率需求
Applying Virtual Address Compression in Branch Target Buffer and Load / Store Queue in high-performance processors
指導教授: 陳中和
Chen, Chung-Ho
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 79
中文關鍵詞: 虛擬位址壓縮分支目標緩衝器載入儲存佇列功率消耗
外文關鍵詞: energy reduction, BTB, branch target buffer, load store queue, LSQ, virtual address compression
相關次數: 點閱:56下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中針對高效能處理器中的分支目標暫存器(Branch Target Buffer-BTB)以及載入儲存佇列(Load / Store Queue-LSQ)所儲存及比對的虛擬位址做壓縮處理,因為BTB在處理器中是一個儲存虛擬位址的快取記憶體架構,經過虛擬位址壓縮過後可以節省BTB的面積及功率需求。LSQ在處理器中不僅僅是儲存虛擬位址,還需要利用全體搜尋(fully-associative)的Content-Address-Memory(CAM)架構使用將要被擺置到LSQ的虛擬位址尋找位址碰撞(address collision)的發生,而這樣的架構以及搜尋比對所產生的能量消耗及面積需求的問題都會隨著執行中的(in-flight)指令增加而日益重視。
    而使用虛擬位址壓縮的BTB設計可以減少53.6%-69.3%左右的面積需求,而且也可以減少BTB能量消耗4.2%-28.5%左右,不但不會讓原始的時脈週期造成額外的負擔而且Instruction Per Cycle(IPC)只減少0.4%以下。而LSQ的設計經過虛擬位址壓縮過後也可以減少35%-70%左右的面積需求以及39%-72%左右的LSQ能量消耗,在LSQ最後所採用的最佳虛擬位址壓縮設定結果中IPC減少不到0.3%。最後結合BTB和LSQ虛擬位址壓縮的設計可以減少處理器2.5%-3.1%的能量消耗,以及45%-52%的LSQ和BTB面積需求且只有0.2%以下的IPC減少比例。

    This paper proposes a virtual address compression technique for branch target buffer (BTB) and load/store queue (LSQ) that use virtual address for matching or comparisons. Since a BTB is a large address cache, applying address compression will reduce the area cost of the BTB. A load/store queue (LSQ) typically needs a fully-associative CAM structure to search the address for matching and consequently poses scalability challenges for power consumption and area cost once the number of the in-flight instructions is raised. Using the proposed approach, the BTB design is able to reduce the area usage by 53.6%-69.3% and energy consumption by 4.2%-28.5% while the LSQ can reduce the area cost by 35%-70% and energy consumption by 39%-72%. The experiment on combining the two shows that 45%-52%total area saving of the two components are achieved while providing 2.5%-3.1% overall processor energy reduction and causing only 0.2% performance loss.

    摘要 IV 目錄 VI 圖示索引 VIII 表格索引 X CHAPTER 1. 序論 1 1.1 研究動機 1 1.2 研究貢獻 2 1.3 內容編排 2 CHAPTER 2. 背景知識 3 2.1. 記憶體位址定址 3 2.1.1. 虛擬位址空間 3 2.1.2. 位址連結 (Address Binding) 4 2.1.3. 虛擬位址與實體位址的映對關係 6 2.2. 分支目標緩衝器 (Branch Target Buffer-BTB) 6 2.3. 載入與儲存佇列 (Load / Store Queue-LSQ) 7 CHAPTER 3. 相關文獻 9 CHAPTER 4. 虛擬位址壓縮 12 4.1. 經過位址壓縮過後的BTB架構 12 4.2. 經過位址壓縮過後的LSQ架構 15 CHAPTER 5. 虛擬位址壓縮的設計與實作 18 5.1. 指令位址樣本表格 (Instruction Address Pattern Table -IAPT) 18 5.2. 資料位址樣本表格 (Data Address Pattern Table-DAPT) 21 5.3. IAPT和DAPT之間的差別討論 24 5.4. 經過虛擬位址壓縮過後的處理器管線架構 26 5.4.1. 指令位址壓縮處理 27 5.4.2. 資料位址壓縮處理 28 5.5. 資料位址壓縮的程式死結處理 29 5.6. DAPT的位址樣本計數器恢復處理 30 CHAPTER 6. 模擬驗證與分析 33 6.1. 模擬環境設定 33 6.2. 標準測試程式與位址壓縮組態設定 34 6.3. 功率消耗模組 36 6.3.1. CAM (Content Address Memory)的功率消耗模組 37 6.3.2. IAPT和DAPT的功率消耗模組 39 6.3.3. 經過位址壓縮過後BTB和LSQ的功率消耗模組 39 6.4. 模擬驗證結果 42 6.4.1. 指令位址壓縮結果 42 6.4.2. 資料位址壓縮結果 55 6.4.3. 結合虛擬位址壓縮處理的結果 67 6.5. 結果比較 70 6.5.1. 減少LSQ的比對搜尋次數 71 6.5.2. LSQ相關性集合架構 73 CHAPTER 7. 結論與未來發展 76 7.1. 結論 76 7.2. 未來發展 76 REFERENCES 78

    [1] A. Park and M. K. Farrens, “Address Compression through Base Register Caching,” in Proceedings of the Annul IEEE/ACM International Symposium on Microarchitecture,1990 , pp.193-199.
    [2] D. Citron and L. Rudolph, “Creating a Wider Bus Using Caching Techniques,” in Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture, 1995, pp.90-99.
    [3] S. Palacharla, N. P. Jouppi, and J. E. Smith, “Quantifying the Complexity of Superscalar Processors,” University of Wisconsin-Madison, Tech. Rep. CS-1328, May 1997.
    [4] D. Burger and T. M. Austin, “The SimpleScalar tool set, version 2.0”, in University of Wisconsin-Madison, Jun. 1997, CS-1342.
    [5] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for architectural-level power analysis and optimizations,” in Proceedings on the 27th Annual International Symposium on Computer Architecture, 2000, pp.83-94.
    [6] G. Reinman and N. P. Jouppi, “CACTI 2.0: An Integrated Cache Timing and Power Model,” COMPAQ Western Research Lab, Palo Alto, CA, Tech. Rep., Feb. 2000.
    [7] J. L. Henning, “SPEC CPU2000: Measuring CPU performance in the new millennium,” IEEE Computer, Vol: 33, 2000, pp.28-35
    [8] L. Villa, M. Zhang, and K. Asanovic, “Dynamic Zero Compression for Cache Energy Reduction,” in Proceedings of the 33rd International Symposium on Microarchitecture, Dec.2000
    [9] R. Canal, A. González, and J. E. Smith, “Very low power pipelines using significance compression,” in Proceedings of the 33rd Annual ACM/IEEE international Symposium on Microarchitecture (Monterey, California, United States). MICRO 33. ACM Press, New York , 2000, pp.181-190
    [10] Dmitry Ponomarev, Gurhan Kucuk, Kanad Ghose, “Power Reduction in Superscalar Datapaths Through Dynamic Bit-Slice Activation,” Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01), 2001, pp.0016
    [11] I. Park, C. L. Ooi, and T. N. Vijaykumar, “Reducing Design Complexity of the Load/Store Queue,” in Proceedings of the 36th Annul IEEE/ACM International Symposium on Microarchitecture, 2003, pp.411-422.
    [12] S. Sethumadhavan, R. Desikan, D. Burger. C. R. Moore, and S. W. Keckler, “Scalable Hardware Memory Disambiguation for High ILP Processors,” in Proceedings of the 36th Annul IEEE/ACM International Symposium on Microarchitecture, 2003, pp.188-127.
    [13] H. W. Cain and M. H. Lipasti, “Memory Ordering: A Value-Based Approach,” in Proceedings on the 31st Annual International Symposium on Computer Architecture, 2004, pp.90-101.
    [14] J. Liu, K. Sundaresan and N. R. Mahapatra, “Dynamic Address Compression Schemes: A Performance, Energy, and Cost Study,” in Proceedings of the IEEE International Conference on Computer Design, 2004, pp.458-463.
    [15] R. Gonzalez, A. Critstal, D. Ortega, A. Veidembaum, and M. Valero, “A content aware integer register file organization,” in 31st Annual International Symposium on Computer Architecture, 2004, pp.314-324.
    [16] Ramon Canal, Antonio González and James E. Smith, “Value Compresson for Efficient Computation”, European Conference on Parallel Computing (Europar'05), Lisboa (Portugal); Lecture Notes in Computer Science, August 2005, pp. 519-529
    [17] Abella and A. González, “SAMIE-LSQ: Set-Associative Multiple-Instruction Entry Load/Store Queue,” in 20th IEEE International Parallel and Distributed Processing Symposium, 2006.
    [18] L. Baugh and C. Zilles, “Decomposing the Load-Store Queue by Function for Power Reduction and Scalability,” in IBM 2006 Journal of Research and Development in Computers & Technology, 2006, pp.287- 297.
    [19] F. Castro, D. Chaver, L. Pinuel, M. Prieto, M .C. Huang, and F. Tirado, “LSQ: a power efficient and scalable implementation,” in IEE proceedings Computers and digital Techniques, 2006, pp.389-398.
    [20] Kostas Pagiamtzis, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” in IEEE Journal of Solid-State Circuits, 2006, pp.712-727.
    [21] J. Liu, K. Sundaresan, and N. R. Mahapatra , “A Fast Dynamic Compression scheme for Low-Latency On-Chip Address Buses,” in the 18th International Conference on Microelectronics, 2006.
    [22] O. Rochecouste, G. Pokam, and A. Seznec, “A case for a complexity-effective, width-partitioned Microarchitecture,” in ACM Trans Archit. Code Optim, 2006, pp.295-326

    下載圖示 校內:2008-08-22公開
    校外:2008-08-22公開
    QR CODE