簡易檢索 / 詳目顯示

研究生: 陳冠宇
Chen, Kuan-Yu
論文名稱: 改良除錯資訊涵蓋範圍與精準度之研究
A Study on Improving the Coverage and Accuracy of Debugging Information
指導教授: 陳敬
Chen, Jing
戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 113
中文關鍵詞: 源碼層級除錯編譯器程式分析
外文關鍵詞: Source-level debugging, Compiler, Program analysis
相關次數: 點閱:30下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在進行源碼層級除錯時,會使用由編譯器產生之除錯資訊以建立執行檔與原始碼之間的對應關係。然而,現行主流編譯器所產生的除錯資訊並不完備而且可能遺失有用的資訊。為解決這個問題,本研究研製一個可擴展既有的除錯資訊的框架,稱為 Saucer,主要目標為改善現有除錯資訊處理方式的缺陷,以讓軟體開發者能夠更容易理解與分析編譯器產生之目標檔 (object file) 內容。

    Saucer 有四個主要部件:(1) 針對編譯器的補丁 (patch),用以保留編譯過程中被捨棄有助於除錯之資料; (2) 處理並且分析補丁所產生之資料的分析器; (3) 傳遞資料並且設立網頁的網路伺服器; (4) 將處理過後的資料以容易讓使用者讀取的方式呈現的互動式使用者介面。Saucer 將既有除錯資訊所建立的一對一之對應關係擴展為一對多之形式,以實現增強除錯資訊涵蓋範圍與準確度之目的。使用者可以透過網頁界面上傳 C 語言之程式碼並編譯,透過 Saucer 之使用者界面查看目標程式中機器指令與原始碼之詳細對應關係。 藉由數個範例程式的測試,本研究驗證了 Saucer 之有效性。

    本研究主要的貢獻為:(1) 深入探討將更多有助於除錯與程式分析之資料加入既有除錯資訊的可能性; (2) 展示可以透過擴充 LLVM 編譯器架構的方式而產生更精準的除錯資訊。

    Source-level debugging utilizes the debug information generated by compiler to create mappings from the object code to the source code. However, the debug information generated by current mainstream compilers is not complete and might discard or hide useful information. To remedy this, this work proposes Saucer, a framework that can extend the capability of a compiler to preserve additional information for debugging, in order to help analyze the relationship of compiler produced instructions and source code.

    Saucer has 4 main components: (1) a set of compiler patches that modifies the compiler to preserve more information for tracking transformations, (2) an analyzer that processes and analyzes the acquired information, (3) a web server that passes data between various programs in addition to serving as the web endpoints, and (4) a user interface that presents the processed data on an interactive web page that can be easily viewed by the user.

    The main contributions of this work include: (1) providing a deeper look into the possibility of preserving comprehensive source code information to help debug and program analysis information, and (2) demonstrating that, by extending the LLVM infrastructure, it is possible to generate more accurate source code information mappings.

    中文摘要 i Abstract ii Acknowledgements iii Contents iv List of Tables ix List of Figures x List of Code Listings xii 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Problem definition and goals 4 1.4 Methodology 5 1.5 Thesis Organization 5 2 Related Works 6 2.1 Debugging 6 2.1.1 Debugging and Optimization 6 2.1.2 DWARF 7 2.1.3 Debug information validation 9 2.2 LLVM 10 2.2.1 Overview 10 2.2.2 Intermediate Representation 12 2.2.3 Passes 14 2.2.4 Updating debug location 15 2.3 Formal methods 16 2.3.1 Program equivalence checking 16 2.3.2 CompCert 16 2.3.3 Alive2 17 2.4 Profilers 17 2.4.1 perf 17 2.4.2 VTune 18 2.5 Compiler Explorer 18 3 Design 20 3.1 Background 20 3.1.1 The baseline 21 3.1.2 Generalizing the mapping 22 3.1.3 Creating mappings 24 3.1.4 Transformation decomposition 27 3.1.5 Matching Instructions 29 3.1.6 Tracking instruction dependencies 31 3.1.7 Backend tracing 33 3.2 System Architecture 34 3.3 Patch 35 3.3.1 Data format 36 3.3.2 Record types 37 3.3.3 The Support Library 38 3.3.4 Referencing Instructions in Logs 38 3.4 Analyzer 40 3.5 Server 41 3.6 User Interface 43 4 Implementation 47 4.1 Setup 47 4.2 The patch 48 4.2.1 Structure of the patch 48 4.2.2 Common functions 49 4.2.3 Tracking addresses of instructions 51 4.2.4 Tracking instruction replacements 51 4.2.5 Actions before and after passes 52 4.2.6 Actions inside passes 53 4.2.7 Backend Tracing 54 4.2.8 Frontend Tracing 55 4.2.9 Pass modifications 56 4.2.10 Distribution 62 4.3 Analyzer 63 4.3.1 Overview 63 4.3.2 Data format 63 4.3.3 Module preprocessing 65 4.3.4 Aligning two modules 66 4.3.5 Combining alignment results 67 4.3.6 Dependency generation 68 4.4 Server 69 4.4.1 Overview 69 4.4.2 Log processing 69 4.4.3 Data passing and preparing 72 4.5 User interfaces 74 4.5.1 Overview 74 4.5.2 Handling user program input 74 4.5.3 Handling user queries 75 4.6 Building and running 76 4.6.1 Patch 77 4.6.2 Analyzer 77 4.6.3 Server 78 4.6.4 User interfaces 78 5 Evaluation 80 5.1 Overview 80 5.2 Functionality Tests 80 5.2.1 Example 1 81 5.2.2 Example 2 83 5.2.3 Example 3 84 5.3 Validity of Intermediate results 87 5.4 Target Independence 89 5.5 Latency 89 6 Conclusions 91 6.1 Conclusions 91 6.2 Future Work 91 References 93

    [1] V. A. Alfred, S. L. Monica, and D. U. Jeffrey, Compilers principles, techniques & tools. pearson Education, 2007.
    [2] F. Artuso, G. A. Di Luna, and L. Querzoni, “Debugging debug information with neural networks,” IEEE Access, vol. 10, pp. 54 136–54 148, 2022, Conference Name: IEEE Access, issn: 2169-3536. doi: 10.1109/ACCESS.2022.3176617.
    [3] C. Assaiante, D. C. D’Elia, G. A. Di Luna, and L. Querzoni, “Where did my variable go? poking holes in incomplete debug information,” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, Vancouver BC Canada: ACM, Jan. 27, 2023, pp. 935–947, isbn: 978-1-4503-9916-6. doi: 10.1145/3575693.3575720.
    [4] “Bash GNU project free software foundation,” [Online]. Available: https://www.gnu.org/software/bash/ (visited on 08/01/2024).
    [5] “Binutils GNU project free software foundation,” [Online]. Available: https://www.gnu.org/software/binutils/ (visited on 08/01/2024).
    [6] M. Braun, “LLVM machine representation,” Instruction Selection, Oct. 2017.
    [7] B. Churchill, O. Padon, R. Sharma, and A. Aiken, “Semantic program alignment for equivalence checking,” in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, Phoenix AZ USA: ACM, Jun. 8, 2019, pp. 1027–1040, isbn: 978-1-4503-6712-7. doi: 10.1145/3314221. 3314596.
    [8] “CMake upgrade your software build system,” [Online]. Available: https://cmake.org/ (visited on 08/01/2024).
    [9] G. A. Di Luna, D. Italiano, L. Massarelli, S. Österlund, C. Giuffrida, and L. Querzoni, “Who’s debugging the debuggers? exposing debug information bugs in optimized binaries,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual USA: ACM, Apr. 19, 2021, pp. 1034–1045, isbn: 978-1-4503-8317-2. doi: 10.1145/3445814.3446695.
    [10] “Django,” Django Project, [Online]. Available: https://www.djangoproject.com/ (visited on 08/01/2024).
    [11] DWARF Debugging Information Format Committee. “DWARF debugging information format,” [Online]. Available: https://dwarfstd.org/index.html (visited on 08/01/2024).
    [12] “DWARF page,” [Online]. Available: https://www.prevanders.net/dwarf. html (visited on 08/01/2024).
    [13] “Fix performance bottlenecks with intel® VTune™ profiler,” Intel, [Online]. Available: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html (visited on 08/01/2024).
    [14] “Flame graphs,” [Online]. Available: https://www.brendangregg.com/flamegraphs. html (visited on 08/01/2024).
    [15] “GCC, the GNU compiler collection GNU project,” [Online]. Available: https://gcc.gnu.org/ (visited on 08/01/2024).
    [16] “Git,” [Online]. Available: https://git-scm.com/ (visited on 08/01/2024).
    [17] M. Godbolt. “Compiler explorer,” [Online]. Available: https://godbolt.org/ (visited on 08/01/2024).
    [18] “Graphviz,” Graphviz, [Online]. Available: https://graphviz.org/ (visited on 08/01/2024).
    [19] S. Gupta, A. Rose, and S. Bansal, “Counterexample-guided correlation algorithm for translation validation,” Proceedings of the ACM on Programming Languages, vol. 4, pp. 1–29, OOPSLA Nov. 13, 2020, issn: 2475-1421. doi: 10 . 1145 / 3428289.
    [20] J. Hennessy, “Symbolic debugging of optimized code,” ACM Transactions on Programming Languages and Systems, vol. 4, no. 3, pp. 323–344, Jul. 1982, issn: 0164-0925, 1558-4593. doi: 10.1145/357172.357173.
    [21] “Installing GCC: Testing GNU project,” [Online]. Available: https://gcc.gnu.org/install/test.html (visited on 08/01/2024).
    [22] V. K. Kurhe, P. Karia, S. Gupta, A. Rose, and S. Bansal, “Automatic generation of debug headers through BlackBox equivalence checking,” in 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Seoul, Korea, Republic of: IEEE, Apr. 2, 2022, pp. 144–154, isbn: 978-1-66540-584-3. doi: 10.1109/CGO53902.2022.9741273.
    [23] C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong program analysis & transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004., San Jose, CA, USA: IEEE, 2004, pp. 75–86, isbn: 978-0-7695-2102-2. doi: 10.1109/CGO.2004.1281665.
    [24] C. Lattner. “The architecture of open source applications (volume 1)LLVM,” [Online]. Available: https://aosabook.org/en/v1/llvm.html (visited on 08/01/2024).
    [25] X. Leroy, “Formal verification of a realistic compiler,” Communications of the ACM, vol. 52, no. 7, pp. 107–115, Jul. 2009, issn: 0001-0782, 1557-7317. doi: 10.1145/1538788.1538814.
    [26] V. Livinskii, D. Babokin, and J. Regehr, “Random testing for c and c++ compilers with YARPGen,” Proceedings of the ACM on Programming Languages, vol. 4, pp. 1–25, OOPSLA Nov. 13, 2020, issn: 2475-1421. doi: 10.1145/3428264.
    [27] “LLVM testing infrastructure guide — LLVM 20.0.0git documentation,” [Online]. Available: https://llvm.org/docs/TestingGuide.html (visited on 08/01/2024).
    [28] “LLVM’s analysis and transform passes — LLVM 18.0.0git documentation,” [Online]. Available: https://llvm.org/docs/Passes.html (visited on 08/01/2024).
    [29] N. P. Lopes, J. Lee, C.-K. Hur, Z. Liu, and J. Regehr, “Alive2: Bounded translation validation for LLVM,” in Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Canada: ACM, Jun. 19, 2021, pp. 65–79, isbn: 978-1-4503-8391-2. doi: 10.1145/3453483.3454030.
    [30] “Make GNU project free software foundation,” [Online]. Available: https://www.gnu.org/software/make/ (visited on 08/01/2024).
    [31] H. Mehta, R. M. Owens, M. J. Irwin, R. Chen, and D. Ghosh, “Techniques for low energy software,” in Proceedings of the 1997 international symposium on Low power electronics and design, 1997, pp. 72–75.
    [32] “Monaco editor,” [Online]. Available: https://microsoft.github.io/monaco-editor/ (visited on 08/01/2024).
    [33] “Ninja, a small build system with a focus on speed,” [Online]. Available: https://ninja-build.org/ (visited on 08/01/2024).
    [34] “Perf wiki,” [Online]. Available: https://perf.wiki.kernel.org/index.php/ Main_Page (visited on 08/01/2024).
    [35] M. Sipser, “Introduction to the theory of computation,” ACM Sigact News, vol. 27, no. 1, pp. 27–29, 1996, Publisher: ACM New York, NY, USA.
    [36] “Svelte • cybernetically enhanced web apps,” [Online]. Available: https://svelte.dev/ (visited on 08/01/2024).
    [37] “The linux kernel archives,” [Online]. Available: https://kernel.org/ (visited on 08/01/2024).
    [38] “Welcome to flask — flask documentation (3.0.x),” [Online]. Available: https://flask.palletsprojects.com/en/3.0.x/ (visited on 08/01/2024).
    [39] “Z3,” Microsoft Research, [Online]. Available: https://www.microsoft.com/en-us/research/project/z3-3/ (visited on 08/01/2024).
    [40] A. Zaks and A. Pnueli, “CoVaC: Compiler validation by program analysis of the cross-product,” in FM 2008: Formal Methods, J. Cuellar, T. Maibaum, and K. Sere, Eds., vol. 5014, Series Title: Lecture Notes in Computer Science, Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 35–51, isbn: 978-3-540-68235-6 978-3-540-68237-0. doi: 10.1007/978-3-540-68237-0_5.
    [41] P. T. Zellweger, Interactive source-level debugging for optimized programs (compilation, high-level). University of California, Berkeley, 1984.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE