簡易檢索 / 詳目顯示

研究生: 楊宏偉
Yang, Hung-Wei
論文名稱: 同時多執行緒處理器之設計與模擬
Design and Simulation of Simultaneous Multithreading Processor
指導教授: 周哲民
Jou, Jer-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 67
中文關鍵詞: 同時多執行緒處理器
外文關鍵詞: Simultaneous Multithreading, Processor, SMT
相關次數: 點閱:65下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 同時多執行緒處理器相較於超純量處理器可以利用指令層級平行及執行緒層級平行來提升處理器的資源使用效率。在同時多執行緒處理器中大部分的硬體資源皆由各個執行緒分享使用,因此相較於傳統超純量處理器新增許多在設計上的考量。我們以傳統超純量架構為基礎去設計同時多執行緒架構。對同時多執行緒處理器架構有具體的描述並且從各個方面去探討會影響效能的設計。本論文分析同時多執行緒處理器架構並且以模擬器的方式去驗證設計以及評估效能。
    另外,指令提取單元被認為是同時多執行緒架構中主要的性能瓶頸,在同時多執行緒架構中指令提取單元必須有能力在多個執行緒中決定要給予哪些執行緒較高提取優先權以提升處理器使用效率。因此在論文中並對計算執行緒的優先權提出了新的想法,當處理器工作量較低時把較高的優先權給予擁有較多長延遲指令的執行緒,讓長延遲指令能儘快的被執行。利用在模擬器上的實現,我們所提出的動態切換提取機制相較於在同時多執行緒處理器普遍常用的ICOUNT提取機制平均有4.36%的效能增進。

    Simultaneous Multithreading (SMT) attacks multiple sources of lost resource utilization in wide issue processor, using both instruction-level and thread-level parallelism increase throughput. In SMT architecture majority of resource shares together by each thread, in order to decrease the performance impact, SMT method brings about some implementation challenges. We design SMT architecture take traditional superscalar architecture as the foundation and discusses the design from each aspect can affect the performance. Has the concrete description to SMT processor architecture. The present paper analyzes SMT processor architecture and to confirm the design by simulator.
    Fetch unit is one of SMT architecture main performance bottlenecks. In order to improve fetch efficiency, the fetch unit must smart enough to determine which thread tofetch. The ICOUNT fetch mechanism generally obtains the satisfactory performance. In this paper, we proposed a new idea to calculate thread’s priority. We give the highest priority to the thread with many long latency instruction counts when processor is in the low workload. Therefore in each clock cycle dynamic switch ICOUNT and this method. We called this method is dynamic switch fetch policy. We modified the SMT simulator to implement the dynamic fetch policy. According to experiment result showed dynamic switch fetch scheme compares to ICOUNT fetch scheme averagely has 4.36% performance promotion.

    Chapter 1 Introduction........................................................1  1.1 Simultaneous Multithreading...............................................3  1.2 Motivations and Purposes..................................................5  1.3 Organization of this thesis...............................................6 Chapter 2 Related Works.......................................................7  2.1 Multiple Issue Processors.................................................7  2.2 Multithreaded Processors..................................................8  2.3 Multiprocessors...........................................................10 Chapter 3 Design Issues of Simultaneous Multithreading Processor..............11  3.1 Base architecture of SMT..................................................11  3.2 Design Issues for SMT processor...........................................14   3.2.1 Threads Choice in the Fetch Unit........................................14   3.2.2 Branch predictor........................................................17   3.2.3 Register Renaming.......................................................18   3.2.4 Instruction Issue.......................................................19   3.2.5 Memory/Cache............................................................20   3.2.6 Multiprogramming and parallel applications..............................21   3.2.7 Synchronization between Threads.........................................22   3.2.8 Power...................................................................25 Chapter 4 Simultaneous Multithreading Processor Architecture..................26  4.1 Architecture overview.....................................................26  4.2 Instruction Fetch.........................................................29  4.3 Branch Predictor..........................................................31  4.4 Register Renaming.........................................................35  4.5 Instruction Issue.........................................................40  4.6 Instruction Commit........................................................43  4.7 Memory Unit...............................................................44  4.8 Synchronization Mechanism.................................................45  4.9 Threads Priority Calculate Policy.........................................48   4.9.1 ICOUNT Fetch Policy.....................................................48   4.9.2 Long Latency Count Fetch Policy.........................................50   4.9.3 Dynamic Fetch Policy....................................................52 Chapter 5 Simultaneous Multithreading Simulator and Simulation Results........55 Chapter 6 Conclusion..........................................................64 Reference.....................................................................66

    [1] D. Tullsen, S. Eggers, and H. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the International Symposium on Computer Architecture, 1995.
    [2] Tullsen D.M., Eggers S.J., Emer J.S., Levy H.M., Lo J.L., and Stamm R.L Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. In: Proc. of 23nd Annual International Symposium on Computer Architecture, 1996. 191-202
    [3] J. E. Smith. A study of branch prediction strategies. In: Proc. Of the 8th ISCA, 1981
    [4] S.Pan, K.So, and J.Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. In: Proc. of ASPLOS V, 1992
    [5] S. McFarling. Combining branch predictors. TN 36, Digital Western Research Lab., 1993
    [6] S.Hily and A.Seznec. Branch prediction and Simultaneous multithreading. In: Proc. of International Conference on Parallel Computer Architecture and Compilation Technology, 1996
    [7] S. Hily, A. Seznec. Contention on 2nd level cache may limit the effectiveness of simultaneous multithreading , IRISA Report No 1086, 1997
    [8] S. Hily, A. Seznec. Standard memory hierarchy does not fit simultaneous multithreading. In: Proc. Of MTEAC'98 Workshop, 1998
    [9] F. Baboescu, D.M. Tullsen. Memory subsystem design for multithreaded processors. Technical Report UCSD. 1997
    [10] D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In 34th International Symposium on Microarchitecture, Dec. 2001.
    [11] D.M. Tullsen, Simulation and Modeling of a Simultaneous Multithreading Processor, In the 22nd Annual Computer Measurement Group Conference, December, 1996
    [12] Jack L. Lo, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Rebecca L. Stamm, and Dean M. Tullsen. Converting Thread-Level Parallelism into Instruction-Level Parallelism via Simultaneous Multithreading. ACM Transactions on Computer Systems, pp. 322-354, August 1997
    [13] D. Tullsen, J.Lo, S.Eggers, and H.Levy. Supporting fine-grained synchronization on
    a simultaneous multithreading processor. Technical Report CS98-587, UCSD, 1998
    [14] Edmondson J. and Rubinfield P. An overview of the 21164 AXP microprocessor. In
    Hot Chips VI, 1994. 1-8
    [15] Denman M. PowerPC 604. In Hot Chips VI, 1994. 193-200
    [16] Microprocessor Report. October 24, 1994
    [17] Microprocessor Report. October 3, 1994
    [18] A. Kumar. The HP PA-8000 RISC CPU.In Hot Chips VIII, 1996
    [19] Klaiber A. The technology behind crusoe processors. Transmeta Corporation, Santa Clara, California. 2000
    [20] Smith. B.J. Architecture and applications of the HEP multiprocessor computer system. SPIE Real-Time Signal Processing IV 298, 1981. 241-248
    [21] Smith. B.J. The architecture of HEP. In Parallel MIMD Computation: HEP Supercomputer and Its Applications, J.S.Kowalik, Ed. MIT Press. Cambridge, MA, 1985.41-55
    [22] Halstead R.H. and Fujita .T. MASA: A multithreaded processor architecture for parallel symbolic computing. In: Proc. of 15th International Symposium on Computer Architecture(Honolulu, HI), 1988. 443-451
    [23] Alverson R., Callahan D., Cummings D., Koblenz B., Porterfield A. and Smith B. The Tera computer system.In: Proc. of International Conference on Supercomputing, 1990. 1-6
    [24] Agarwal A., Bianchini R., Chaiken D., et al. The MIT Alewife machine: architecture and performance. In:Proc. of the 22th Annual International Symposium on Computer Architecture, 1995. 2-13
    [25] Hammond .L., Hubbert B., et al. The Stanford Hydra CMP, IEEE Micro, 2000, 20(2):71-84.
    [26] Diefendorff K. Power4 focus on memory bandwidth, Microprocessor Report, 1999. 11-17
    [27] Tremblay M. MAJC: an architecture for the new millennium, In: Proc. of the 11th Hot Chips, 1999. 275-288
    [28] Ali El-Moursy, David H. Albonesi Front-End Policies for Improved Issue Efficiency in SMT Processors, 2003

    下載圖示 校內:2007-08-24公開
    校外:2010-08-24公開
    QR CODE