成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	莊武憲 Chuang, Wu-Hsien
論文名稱：	以雙執行緒機器支援動態二進制碼轉譯 Dynamic Binary Translation on a Dual-Threaded Machine
指導教授：	陳中和 CHEN, CHUNG-HO
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2003
畢業學年度：	91
語文別：	英文
論文頁數：	63
中文關鍵詞：	雙執行緒機器、動態二進制碼轉譯
外文關鍵詞：	Dynamic Binary Translation, Dual-Threaded Machine
相關次數：	點閱：188 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

動態二進制碼轉譯(Dynamic Binary Translation) 的技術，可以用來轉譯已經編譯成給特定處理器架構的二進制碼（或稱作機械碼），成為另一種處理器架構所規範的二進制碼。由於軟體相容性的問題延緩了新的處理器架構和微架構的採用，舊的指令集架構(Instruction Set Architecture)沒辦法很快地被取代掉。將程式碼由舊的指令集架構轉移到新的指令架構又相當地耗費時間。而動態二進制碼轉譯則提供了一種自動轉譯的機制，不需要事先改變已經編譯給舊的指令集架構的二進制碼，直接由動態轉譯機制轉譯並執行。Transmeta Crusoe和IBM DAISY提供了兩個成功的範例，它們各自在自己新設計的VLIW處理器上轉譯了不同的系統：Crusoe轉譯並模擬了x86 IA-32的指令及架構，而DAISY則轉譯了來自PowerPC的架構。

在本篇論文中，我們提出了一個結合動態二進制碼轉譯技術和多執行緒處理器特點的架構。這個架構的主要目的是提供一個雙執行緒二進制碼轉譯(Dual-threaded Binary Translation)的機制，在一個類似MIPS的雙執行緒微處理器模擬平台上轉譯並執行已經編譯成x86 IA-32的二進制程式碼。在這個雙執行緒處理器的模型中，一條叫做Translation的執行緒，負責轉譯由x86程式而來的來源指令，成為可以在目的處理器執行的目的指令群。另外一條稱作Execution的執行緒，則負責執行轉譯過後的二進制碼。除此之外，執行緒同步的輔助機制則用來管控切換硬體資源給這兩個執行緒的時機，以及負責在兩個執行緒間傳遞轉譯過程中所需的資訊。為了減少在同步機制上所花費的時間，同步機制中整合了一個窺控(Snooper)的硬體來增進效能。轉譯的快取記憶體(Translation Cache) 的概念則被實現成一個以區塊為單位的快取硬體和一個選擇性的軟體快取記憶體架構。為了提升整體模擬系統的效能，論文中亦分析了各個硬體快取記憶體的結構和跳躍預測器對效能的影響。

Dynamic binary translation is a technology that translates executable machine code, also called binary code, from original Instruction Set Architecture (ISA) to the machine code of another ISA on the fly. Because software compatibility slows the adoption of new processor architectures and micro-architecture, old ISA can’t be replaced easily. Porting code from one ISA to another is also time-consuming. Dynamic binary translation offers a solution for automatically converting binary code of legacy architecture to run on a new ISA without re-porting the source code. Transmeta Crusoe and IBM DAISY provide the successful examples of binary translation in their own VLIW processors. One emulates the x86 IA-32 system and the other emulates the POWER PC machine.

In this thesis, we propose a mechanism that combines the ideas of dynamic binary translation and features of multithreaded processor. A dual-threaded binary translation (DT-BT) mechanism is implemented to execute the x86 IA-32 binary programs on a dual-threaded MIPS-like microprocessor simulator. One thread in this dual-threaded processor, called translation thread, is responsible for translating the source instructions form x86 programs. The other thread, called execution thread, takes charge of executing translated binary code. Besides, a synchronization mechanism is implemented to transfer translation information (including translation request, response and translated result) and switches hardware resource between these two threads. A snooping hardware is integrated with the synchronization mechanism to eliminate the synchronization overhead. The concept of translation cache in this BT-BT model is implemented as a block-based hardware cache and an optional software cache. The influence of each organization of hardware cache and BTB predictor is analyzed to achieve high emulation performance.

Abstract in Chinese	i
Abstract	iii
Acknowledgements	iv
List of Tables	vii
List of Figures	viii
Chapter 1 Introduction	1
1 Motivation	2
2 Contribution of the thesis	2
3 Organization of the thesis	2
Chapter 2 Background & Related Work	3
1 Binary translation concept and terminology	3
2 Current binary translation technology	4
3 The chosen source and target ISA	6
3.1 Source ISA: Intel x86 IA-32	6
3.2 Target ISA: MIPS-like PISA	8
Chapter 3 Dual-Threaded Binary Translation Model	11
1 Basic dual-threaded binary translation scenario	11
2 Dynamic binary translation software	13
2.1 Linker & loader	14
2.2 Parser & decoder	14
2.3 Binary translator	16
2.4 Synchronization window	17
3 Dual-threaded hardware model for BT	18
3.1 Dual-threaded processor features	18
3.2 Thread Synchronization	19
4 Instruction set architecture mapping issue	21
4.1 Semantics mapping	21
4.2 Precise architecture state	22
4.3 Condition code	23
4.4 Self-modifying code	25
Chapter 4 Performance Enhancing Mechanism	27
1 Hardware translated instruction cache	28
1.1 Blocked cache for translated ECS	29
1.2 Blocked cache for oversized ECS	30
2 Software translated instruction cache	31
2.1 Software cache organization	31
2.2 Software cache performance	33
3 Snooping	35
4 Branch prediction	37
Chapter 5 Performance Evaluation	39
1 Performance analysis	39
2 Simulation result	42
Chapter 6 Conclusion	59
1 Summary	59
2 Future work	60
References	61
Vita	63
                                    

[1] Theo Ungerer, Borut Robič, and Jurij Šilc, “A survey of processors with explicit multithreading,” ACM Computing Surveys, Vol. 35, No.1, March 2003, pp 29-63.
[2] Gregory T. Byrd and Mark A. Holliday, “Multithread processor architectures,” IEEE Sp, August 1995, pp. 38-46.
[3] Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy, “Simultaneous multithreading: maximizing on-chip parallelism,” Proceedings of the 22nd annual international symposium on Computer architecture, May 1995, pp. 392-403.
[4] H. Hirata, et al., “An elementary processor architecture with simultaneous instruction issuing from multiple threads,” Proc. of the 19th annual international symposium on Computer architecture, April 1992, pp. 136-145.
[5] Intel Corporation, “IA-32 Intel Architecture Software Developers Manual - Volume 1: Basic Architecture,” 2002.
[6] Intel Corporation, “IA-32 Intel Architecture Software Developers Manual - Volume 2: Instruction Set Reference,” 2002.
[7] Intel Corporation, “IA-32 Intel Architecture Software Developers Manual – Volume: 3 System Programming Guide,” 2002.
[8] MIPS Corporation, “MIPS32 Architecture For Programmers - Volume I: Introduction to the MIPS32 Architecture,” 2001.
[9] MIPS Corporation, “MIPS32 Architecture For Programmers - Volume II: The MIPS32 Instruction Set,” 2001
[10] SimpleScalar Tool Set 3.0. Information available at http://www.simplescalar.com
[11] E. Kelly, R. Cmelik, and M. Wing, “Memory controller for a microprocessor for detecting a failure of speculation on the physical nature of a component being addressed,” U.S. Patent 5832205, Nov .1998.
[12] R. Cmelik, et al., “Combining hardware and software to provide an improved microprocessor,” U.S. Patent6031992, Feb .2000.
[13] Alexander Klaiber, “The Technology Behind the Crusoe Processors,” White Paper, http://www.transmeta.com/pdf/white_papers/paper_aklaiber_19jan00.pdf, Jan. 2000.
[14] James C. Dehnert, et al., “The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges,” Proceedings of the First Annual IEEE/ACM International Symposium on Code Generation and Optimization, March 2003, pp.15-24
[15] E. R. Altman, D. Kaeli, and Y. Sheffer, "Welcome to the Opportunities of Binary Translation," IEEE Computer, March 2000, pp. 40-45.
[16] M. Gschwind, E. Altman, S. Sathaye, P. Ledak, and D. Appenzeller, "Dynamic and transparent binary translation," IEEE Computer, pp. 54--59, March 2000.
[17] Cindy Zheng, and Carol Thompson, ”PA-RISC to IA-64: Transparent Execution, No Recompilation,” IEEE Computer, March 2000, pp. 47-52.
[18] Erik R. Altman, Kemal Ebcioglu, Michael Gschwind, and Sumedh W. Sathaye, "Advances and Future Challenges in Binary Translation and Optimization," Proceedings of the IEEE, Vol. 89, No. 11, Nov. 2001, pp. 1710-1722.
[19] Kemal Ebcioglu, Erik R. Altman, Michael Gschwind, and Sumedh Sathaye, “Dynamic Binary Translation and Optimization,” IEEE Trans. on Computers 50 (6), June 2001, pp. 529-548.
[20] Kemal Ebcioglu and Erik R. Altman, “DAISY: Dynamic Compilation for 100% Architectural Compatibility,” Proc. of the 24th Annual Int’l Symp. on Computer Architecture, June 1997, pp. 26-37.
[21] John L. Hennessy, and David A. Patterson, “Computer Architecture A Quantitative Approach,” 3rd edition.
[22] Bryan Black , Bohuslav Rychlik , and John P. Shen, ”The block-based trace cache,” Proceedings of the 26th annual international symposium on Computer architecture, May 1999, pp. 196-207.
[23] Stephan Jourdan, et al., "eXtended block cache," Sixth International Symposium on High-Performance Computer Architecture, 2000, pp. 61-70.
[24] Baruch Solomon, et al., “Micro-operation cache: a power aware frontend for variable instruction length ISA,” Proceedings of the 2001 international symposium on Low power electronics and design, August 2001, pp. 4-9
[25] Norman P. Jouppi, “Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,” Proceedings of the 17th annual international symposium on Computer Architecture, May 1990, pp.364-373.

2003-07-25公開

簡易檢索 / 詳目顯示

相關論文