簡易檢索 / 詳目顯示

研究生: 陳彥甫
Chen, Yen-Fu
論文名稱: 藉由多層式及時編譯器加速 RISC-V 指令集模擬
Accelerate RISC-V Instruction Set Simulation by Tiered JIT Compilation
指導教授: 黃敬群
Huang, Ching-Chun
涂嘉恒
Tu, Chia-Heng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 106
中文關鍵詞: RISC-V指令集模擬器及時編譯器動態二進位轉譯器
外文關鍵詞: RISC-V, instruction set simulator, just-in-time compiler, dynamic binary translation
相關次數: 點閱:58下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • RISC-V 指令集架構是一個開源標準,為作業系統、開發工具和上層應用提供了全面的支持。在過去的十年中,越來越多的硬體架構設計和研究投入於 RISC-V 指令集中,而用於支援硬體架構設計、性能評估和軟體開發的 RISC-V 模擬器的開發也在不斷增長。
    不同的模擬器提供各種功能,我們的研究重點在於指令集模擬器。指令集模擬器主要用於架構評估和驗證,是與架構相關的軟體開發的基礎工具。指令集模擬器提供了一個虛擬環境來執行和測試軟體,使開發者能夠在沒有對應硬體的情況下進行程式分析與偵錯。這一系列功能在硬體設計的初期階段扮演重要的角色,因為實際的硬體在此階段可能尚未產出。
    另外,開發和研究的效率在很大程度上取決於模擬器的性能,愈快取得模擬相關的資訊,開發效率也會相應提升。因此,一個具備高性能的 RISC-V 模擬器變得至關重要,而本論文的主軸在於研究如何提高 RISC-V 模擬器的性能,並考慮了系統資源管理。我們引入了一種高效的 RISC-V 模擬器,並藉由多層式及時編譯器技術近一步提升模擬效能。研究成果顯示,我們的模擬器在模擬速度上不僅超越了其他開源的 RISC-V 模擬器專案,模擬期間所消耗的系統資源也更少。

    Over the past decade, hardware design and research have increasingly focused on the RISC-V instruction set.Consequently, the development of RISC-V simulators has been growing.Different simulators offer various functionalities. Our research focuses on instruction set simulators, which are primarily used for architecture evaluation and verification during the early stages of hardware design. ISS provides a virtual environment to execute and test software, enabling developers to analyze and debug programs without the need for physical hardware.
    The performance of the simulator is critical, as the efficiency of development and research is heavily dependent on the speed at which information is generated by the simulator.Therefore, the necessity for a high-performance RISC-V simulator that meets these criteria is paramount. This thesis focuses on improving the performance of RISC-V simulators through dynamic beinary transition and considers efficient system resource management. We introduce a high-performance RISC-V simulator with a tiered JIT compilation approach, outperforming other open-source implementations in simulation speed while utilizing fewer system resources.

    摘要 i Abstract ii Table of Contents iii List of Tables v List of Figures vi Chapter 1. Introduction 1 1.1. Contributions 3 1.2. Thesis Organization 4 Chapter 2. Background 5 2.1. RISC-V Instruction Set Architecture(ISA) Simulation 5 2.1.1. Spike 6 2.1.2. Static Binary Translation (SBT) 6 2.1.3. Dynamic binary translation (DBT) 7 2.1.4. QEMU 9 2.1.5. rvdbt 10 2.2. Compiler Optimizations 10 2.2.1. Basic Block 11 2.2.2. Static Single Assignment (SSA) 11 2.2.3. Constant Optimization 12 2.2.4. Common Subexpression Elimination (CSE) 12 2.2.5. Register Allocation 13 2.3. Multi-tier JIT Compilers 14 Chapter 3. Design 17 3.1. Interpreter Desgin 19 3.1.1. Linear Intermediate Representation (IR) 19 3.1.2. Block Chaining 21 3.2. Runtime Profiler Design 23 3.3. Tier-1 JIT Compiler Desgin 25 3.4. Tier-2 JIT Compiler Desgin 27 Chapter 4. Implementation 29 4.1. Interpreter 29 4.1.1. Lower Instruction Dispatch Overhead 29 4.1.2. Block Chaining 34 4.1.3. Macro-Operation Fusion(MOP fusion) 34 4.1.4. Constant Propagation and Constant Folding 36 4.1.5. C Routine Substitution 37 4.1.6. Facilitate CPU Register 38 4.1.7. System Call and WebAssembly-based Translation 39 4.1.8. Basic Block Cache 40 4.2. Runtime Profiler 42 4.2.1. Threshold Determination 42 4.2.2. Loop detection 43 4.3. Tier-1 JIT Compiler 47 4.3.1. Domain-Specific Language (DSL) for Unified Code Generation 47 4.3.2. Linear-Scan Register Allocation 49 4.3.3. Indirect Jump Improving 52 4.3.4. Floating Point Instruction Simulation 54 4.4. Tier-2 JIT Compiler 55 4.4.1. Background Compilation 57 4.4.2. Terminate Tier-1 Execution Upon Completion of Tier-2 Compilation 57 Chapter 5. Performance Evaluation and Discussion 59 5.1. Experimental Setup 59 5.2. Interpreter-only Performance 59 5.3. Tiered JIT compilation 65 5.3.1. LLVM Optimization Level Determination 66 5.3.2. Background Compilation 69 5.3.3. Threshold Determination 70 5.4. Resource usage 75 5.5. Discussion 81 5.5.1. T1C 82 5.5.2. T2C 84 5.5.3. Indirect jump 84 5.5.4. Arm64 Architecture 86 5.6. Limitation 88 5.6.1. Control and Status Register(CSR) instruction 88 5.6.2. T1C lacks runtime profiling 88 5.6.3. Terminate Tier-1 Execution Upon Completion of Tier-2 Compilation 90 Chapter 6. Conclusion 91 References 93

    [1] Rafael Auler and Edson Borin. A llvm just-in-time compilation cost analysis. Technical report, Technical Report, 13-2013 IC-UNICAMP. 2013., 2013.
    [2] Spenser Bauman, Carl Friedrich Bolz, Robert Hirschfeld, Vasily Kirilichev, Tobias Pape, Jeremy G Siek, and Sam Tobin-Hochstadt. Pycket: a tracing jit for a functional language. In Proceedings of the 20th ACM SIGPLAN international conference on functional programming, pages 22–34, 2015.
    [3] Fabrice Bellard. QEMU, a fast and portable dynamic translator. In FREENIX Track: 2005 USENIX Annual Technical Conference, 2005.
    [4] Eli Bendersky. Computed goto for efficient dispatch tables. https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables, 2012. [Online].
    [5] Marc Berndl, Benjamin Vitale, Mathew Zaleski, and Angela Demke Brown. Context threading: A flexible and efficient dispatch technique for virtual machine interpreters. In International Symposium on Code Generation and Optimization, pages 15–26. IEEE, 2005.
    [6] Carl Friedrich Bolz, Antonio Cuni, Maciej Fijałkowski, and Armin Rigo. Tracing the meta-level: PyPy’s tracing JIT compiler. In Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems, pages 18–25. ICOOOLPS, 2009.
    [7] Preston Briggs, Keith D Cooper, Timothy J Harvey, and L Taylor Simpson. Practical improvements to the construction and destruction of static single assignment form. Software: Practice and Experience, 28(8):859–881, 1998.
    [8] Christopher Celio, Palmer Dabbelt, David A Patterson, and Krste Asanović. The renewed case for the reduced instruction set computer: Avoiding ISA bloat with Macro-Op Fusion for RISC-V. arXiv preprint arXiv:1607.02318, 2016.
    [9] Gregory J Chaitin. Register allocation & spilling via graph coloring. ACM Sigplan Notices, 17(6):98–101, 1982.
    [10] Gregory J Chaitin, Marc A Auslander, Ashok K Chandra, John Cocke, Martin E Hopkins, and Peter W Markstein. Register allocation via coloring. Computer languages, 6(1):47–57, 1981.
    [11] Jiunn-Yeu Chen, Wuu Yang, Wei-Chung Hsu, Bor-Yeh Shen, and Quan-Huei Ou. On static binary translation of arm/thumb mixed isa binaries. ACM Transactions on Embedded Computing Systems (TECS), 16(3):1–25, 2017.
    [12] Cifuentes and Malhotra. Binary translation: Static, dynamic, retargetable? In 1996 Proceedings of International Conference on Software Maintenance, pages 340–349. IEEE, 1996.
    [13] Michael Clark and Bruce Hoult. rv8: a high performance RISC-V to x86 binary translator. In First Workshop on Computer Architecture Research with RISC-V (CARRV), 2017.
    [14] Valerio Costamagna and Cong Zheng. Artdroid: A virtual-method hooking framework on android art runtime. In IMPS@ ESSoS, pages 20–28, 2016.
    [15] Gengwu Du. Evaluating a risc-v processor running benchmarks using the qemu virtual platform tool., 2022.
    [16] Kemal Ebcioglu, Erik Altman, Michael Gschwind, and Sumedh Sathaye. Dynamic binary translation and optimization. IEEE Transactions on computers, 50(6):529–548, 2001.
    [17] Alexis Friedrich Engelke. Optimizing Performance Using Dynamic Code Generation. PhD thesis, Technische Universität München, 2021.
    [18] M Anton Ertl and David Gregg. The structure and performance of efficient interpreters. Journal of Instruction-Level Parallelism, 5:1–25, 2003.
    [19] Martin Farach-Colton and Vincenzo Liberatore. On local register allocation. Journal of Algorithms, 37(1):37–65, 2000.
    [20] B Mahesh Goud, Dilip Lilaramani, and Mahendra Swain. Generation and authentication of digital certificates using ethereum based decentralized mechanism for mitigating data fraud on risc-v. In 2021 International Conference on Computational Performance Evaluation (ComPE), pages 905–909. IEEE, 2021.
    [21] Sebastian Hack, Daniel Grund, and Gerhard Goos. Register allocation for programs in ssa-form. In Compiler Construction: 15th International Conference, CC 2006, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, Vienna, Austria, March 30-31, 2006. Proceedings 15, pages 247–262. Springer, 2006.
    [22] Ding-Yong Hong, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Wei-Chung Hsu, Pangfeng Liu, Chien-Min Wang, and Yeh-Ching Chung. Hqemu: a multi-threaded and retargetable dynamic binary translator on multicores. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, pages 104–113, 2012.
    [23] Jian Huang and David J Lilja. Extending value reuse to basic blocks with compiler support. IEEE Transactions on Computers, 49(4):331–347, 2000.
    [24] Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu, and Toshio Nakatani. Adaptive multi-level compilation in a trace-based java jit compiler. ACM SIGPLAN Notices, 47(10):179–194, 2012.
    [25] Yusuke Izawa, Hidehiko Masuhara, and Carl Friedrich Bolz-Tereick. Two-level just-in-time compilation with one interpreter and one engine. arXiv preprint arXiv:2201.09268, 2022.
    [26] Michael R Jantz and Prasad A Kulkarni. Exploring single and multilevel jit compilation policy for modern machines. ACM Transactions on Architecture and Code Optimization (TACO), 10(4):1–29, 2013.
    [27] William Kahan. Ieee standard 754 for binary floating-point arithmetic. Lecture Notes on the Status of IEEE, 754(94720-1776):11, 1996.
    [28] David Kanter. Risc-v offers simple, modular isa. Microprocessor Report, 1:1–5, 2016.
    [29] Andreas Krall. Efficient javavm just-in-time compilation. In Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No. 98EX192), pages 205–212. IEEE, 1998.
    [30] Chandra J Krintz, David Grove, Vivek Sarkar, and Brad Calder. Reducing the overhead of dynamic compilation. Software: Practice and Experience, 31(8):717–738, 2001.
    [31] Prasad A Kulkarni. Jit compilation policy for modern machines. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, pages 773–788, 2011.
    [32] Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, pages 1–6, 2015.
    [33] Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In International symposium on code generation and optimization, 2004. CGO 2004., pages 75–86. IEEE, 2004.
    [34] Allen Leung and Lal George. Static single assignment form for machine code. ACM SIGPLAN Notices, 34(5):204–214, 1999.
    [35] Yeong-Kyu Lim, Sharfudheen Parambil, Cheong-Ghil Kim, and See-Hyung Lee. A selective ahead-of-time compiler on android device. In 2012 International Conference on Information Science and Applications, pages 1–6. IEEE, 2012.
    [36] Leandro Lupori, Vanderson Rosario, and Edson Borin. Towards a high-performance risc-v emulator. In 2018 Symposium on High Performance Computing Systems (WS-CAD), pages 213–220. IEEE, 2018.
    [37] Michael Paleczny, Christopher Vick, and Cliff Click. The Java HotSpot server compiler. In Java Virtual Machine Research and Technology Symposium (JVM 01), 2001.
    [38] Mike Pall. Suggestions on implementing an efficient instruction set simulator in LuaJIT2. http://lua-users.org/lists/lua-l/2011-02/msg00742.html, 2011. [Online].
    [39] Mathias Payer and Thomas Gross. Fast binary translation: Translation efficiency and runtime efficiency. In 2nd Workshop on Architectural and Microarchitectural Support for Binary Translation (AMAS-BT’09), Austin, Texas, USA, 2009.
    [40] Massimiliano Poletto and Vivek Sarkar. Linear scan register allocation. ACM Transactions on Programming Languages and Systems, 21(5):895–913, 1999.
    [41] Mark Probst. Dynamic binary translation. In UKUUG Linux Developer’s Conference, volume 2002, 2002.
    [42] Vsevolod Pukhov. A tiny proof-of-concept riscv32i to amd64 usermode binary translator. https://github.com/pukhovv/rvdbt/blob/main/docs/rvdbt.md, May 2024. [Online].
    [43] John R. Hauser. Berkeley softfloat release 3e: Library interface. http://www.jhauser.us/arithmetic/SoftFloat-3/doc/SoftFloat.html, May 2024. [Online].
    [44] Samuel Riedel, Fabian Schuiki, Paul Scheffler, Florian Zaruba, and Luca Benini. Banshee: A fast llvm-based risc-v binary translator. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pages 1–9. IEEE, 2021.
    [45] Armin Rigo and Samuele Pedroni. PyPy’s approach to virtual machine construction. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications, pages 944–953, 2006.
    [46] Alec Roelke and Mircea R Stan. RISC5: Implementing the RISC-V ISA in gem5. In First Workshop on Computer Architecture Research with RISC-V (CARRV), volume 7, 2017.
    [47] Suyog Sarda and Mayur Pandey. LLVM essentials. Packt Publishing Ltd, 2015.
    [48] Manuel Serrano. Javascript aot compilation. ACM SIGPLAN Notices, 53(8):50–63, 2018.
    [49] Bor-Yeh Shen, Jiunn-Yeu Chen, Wei-Chung Hsu, and Wuu Yang. Llbt: an llvm-based static binary translator. In Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems, pages 51–60, 2012.
    [50] Ben L Titzer. A fast in-place interpreter for WebAssembly. Proceedings of the ACM on Programming Languages, 6:646–672, 2022.
    [51] Yu-Hsin Tsai, I-Wei Wu, I-Chun Liu, and Jean Jyh-Jiun Shann. Improving performance of jna by using llvm jit compiler. In 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS), pages 483–488. IEEE, 2013.
    [52] Vlad Vergu and Eelco Visser. Specializing a meta-interpreter: Jit compilation of dynsem specifications on the graal vm. In Proceedings of the 15th International Conference on Managed Languages & Runtimes, pages 1–14, 2018.
    [53] April W Wade, Prasad A Kulkarni, and Michael R Jantz. Aot vs. jit: impact of profile data on code quality. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, pages 1–10, 2017.
    [54] Andreas Markus Wälchli, Oscar Nierstrasz, and Olivier Flückiger. A sampling profiler for a jit compiler. 2020.
    [55] David W Wall. Global register allocation at link time. ACM SIGPLAN Notices, 21(7):264–275, 1986.
    [56] Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovic, Volume I User level Isa, Andrew Waterman, Yunsup Lee, and David Patterson. The risc-v instruction set manual. Volume I: User-Level ISA, version, 2:1–79, 2014.
    [57] Hans Wennborg, Krister Walfridsson, ARM Sweden, and Jonas Skeppstedt. Emulator Speed-up Using JIT and LLVM. Department of Computer Science, Faculty of Engineering, LTH, Lund University, 2010.
    [58] Mathew Zaleski, Angela Demke Brown, and Kevin Stoodley. YETI: a gradually extensible trace interpreter. In Proceedings of the 3rd international conference on Virtual execution environments, pages 53–63, 2007.
    [59] Jianwen Zhu and Daniel D Gajski. A retargetable, ultra-fast instruction set simulator. In Proceedings of the conference on Design, automation and test in Europe, pages 62–es, 1999.
    [60] Jianwen Zhu and Daniel D Gajski. An ultra-fast instruction set simulator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 10(3):363–373, 2002.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE