| 研究生: |
蔡岳霖 Tsai, Yueh-Lin |
|---|---|
| 論文名稱: |
Dylinx: 同步鎖的效能分析與最佳化工具 Dylinx: A lock profiling and optimization tool |
| 指導教授: |
張天豪
Chang, Tien-Hao |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 互斥鎖 、LLVM-Xray 、異質同步 、排隊理論 |
| 外文關鍵詞: | mutual exclusive lock, mutex, queueing theory, heterogeneous synchronization, LLVM-Xray |
| 相關次數: | 點閱:56 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多執行緒是現代應用程式加速的主要手段,多執行緒程式須仰賴同步工具來確保運算結果的正確,其中,互斥鎖是最常見的同步工具。然而,互斥鎖有許多實作方式,以往的研究表明互斥鎖會因軟硬架構造成不同的效能損失,並沒有一個最佳的實作方式。因此,目前只能透過實測來找出在特定軟硬體架構下最佳的互斥鎖實作。為了加速搜尋最佳實作的過程,Hugo和Renaud在2016年提出了LiTL這個自動化工具,它可以在執行時期改變互斥鎖的實作方式,讓程式開發者能在不修改原始碼的情況下,快速測試不同互斥鎖實作對程式整體效能損失。LiTL的原理是透過動態連結器的插入機制(linker interposition),將函式標誌(symbol)重新導向至另一個函式實體,這意味著若程式碼中含有多個鎖時,因為共用同樣的函式標誌,所有的鎖都會共享同樣的實作機制。
考量到有可能每個互斥鎖有相對應的最佳實作,本研究提出了一個新的工具(Dylinx),它為每道鎖在編譯的前處理階段分配專屬的實作方式,相較於LiTL只能提供同質鎖的抽換,Dylinx的異質鎖配置大幅擴展了搜尋空間,而且比起執行時期的重導向,也顯著降低對整體效能的影響。實驗結果表明比起同質鎖,採用異質鎖的多執行緒程式具有較高的效能,在執行緒數量增加時,異質鎖帶來的效能增長變化(scalability)也比較大。此外,Dylinx採用了強大的分析工具LLVM-Xray,它能觀測多個鎖之間的交互作用,讓使用者可以在程式執行時期進行動態分析。隨著程式碼中鎖的數量增加,搜尋空間也會呈現指數成長,對於過大的搜尋空間,本研究也提出了一套分段搜尋策略,實驗結果顯示該搜尋策略大幅降低窮舉所造成時間過長的問題。
Multithreading is a major approach enhancing application performance nowadays. To ensure the correct output, multithreading applications heavily rely on synchronization mechanisms and mutexes are the most common synchronization tool. However, there are various mutex implementations. Previous studies reveal that the performance loss of distinct mutex implementations depends on the software and hardware architecture. Currently, the only way to identify the best mutex implementation is via pragmatic measurement. To accelerate the process of identifying best implementation, Hugo and Renaud proposed an automatic tool named LiTL in 2016. LiTL is capable to replace mutex implementations during execution time without modifying source code and measure performance difference. LiTL leverages linker interposition to redirect the interface symbols to another function instance. However, LiTL provides only “homogeneous mutex replacement”. Namely, when there are multiple mutexes in an application, LiTL would replace all mutexes with a common mutex implementation.
Considering each mutex may have their corresponding optimal implementation, this study introduces a new tool called Dylinx. Dylinx replaces mutex implementations during the preprocessing stage of compilation. Compared to homogeneous replacement offered by LiTL, Dylinx’s heterogeneous mutex replacement enlarges the search space of lock arrangement. In contrast to linker interposition, replacement during compilation also improve the runtime performance. Experiments conducted in this study show that heterogeneous lock arrangement improves both performance and scalability. In addition, Dylinx enables runtime analysis of lock interaction by integrating a powerful profiler, LLVM-Xray. As the number of the mutexes increases, the search space of heterogeneous lock arrangement grows exponentially. To solve this problem, this study also proposes a segment search strategy to deal with the oversized search space.
1. Lomet DB: Process structuring, synchronization, and recovery using atomic actions. ACM SIGSOFT Software Engineering Notes 1977, 2(2):128-137.
2. Gupta A, Tucker A, Urushibara S: The impact of operating system scheduling policies and synchronization methods of performance of parallel applications. In: Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems: 1991. 120-132.
3. Guiroux H, Lachaize R, Quéma V: Multicore Locks: The Case Is Not Closed Yet. In: USENIX Annual Technical Conference: 2016. 649-662.
4. Atlidakis V, Andrus J, Geambasu R, Mitropoulos D, Nieh J: POSIX abstractions in modern operating systems: The old, the new, and the missing. In: Proceedings of the Eleventh European Conference on Computer Systems: 2016. 1-17.
5. Eyerman S, Eeckhout L: Modeling critical sections in Amdahl's law and its implications for multicore design. In: Proceedings of the 37th annual international symposium on Computer architecture: 2010. 362-370.
6. Pan X, Lindén J, Jonsson B: Predicting the Cost of Lock Contention in Parallel Applications on Multicores using Analytic Modeling. IEEE Concurrency 2012.
7. Adan I, van der Wal J: Mean values techniques. In: Queueing Networks. Springer; 2011: 561-586.
8. Lattner C: LLVM and Clang: Next generation compiler technology. In: The BSD conference: 2008.
9. The C Programming Language [https://www.tiobe.com/tiobe-index/c/]
10. Plum T: C11: The New C Standard. In.; 2013.
11. Berris DM, Veitch A, Heintze N, Anderson E, Wang N: XRay: A function call tracing system. 2016.
12. Mellor-Crummey JM, Scott ML: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS) 1991, 9(1):21-65.
13. What is PTHREAD_MUTEX_ADAPTIVE_NP? [https://stackoverflow.com/a/25168942]
14. Andrews GR: Foundations of multithreaded, parallel, and distributed programming, vol. 11: Addison-Wesley Reading; 2000.
15. Van Laarhoven PJ, Aarts EH: Simulated annealing. In: Simulated annealing: Theory and applications. Springer; 1987: 7-15.
16. Dice D, Marathe VJ, Shavit N: Lock cohorting: a general technique for designing NUMA locks. ACM SIGPLAN Notices 2012, 47(8):247-256.
17. Michael MM, Scott ML: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing: 1996. 267-275.
18. Cao W, Sahin S, Liu L, Bao X: Evaluation and analysis of in-memory key-value systems. In: 2016 IEEE International Congress on Big Data (BigData Congress): 2016. IEEE: 26-33.
19. Libmemcached [https://libmemcached.org/libMemcached.html]
20. Gramoli V: More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming: 2015. 1-10.
21. Herlihy M, Shavit N, Luchangco V, Spear M: The art of multiprocessor programming: Newnes; 2020.
22. Heller S, Herlihy M, Luchangco V, Moir M, Scherer WN, Shavit N: A lazy concurrent list-based set algorithm. In: International Conference On Principles Of Distributed Systems: 2005. Springer: 3-16.
23. Bienia C, Kumar S, Singh JP, Li K: The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques: 2008. 72-81.
24. Sakalis C, Leonardsson C, Kaxiras S, Ros A: Splash-3: A properly synchronized benchmark suite for contemporary research. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS): 2016. IEEE: 101-111.
25. Virius M: THE NEW C++ STANDARD AND MULTITHREADING.
26. Ries DR, Stonebraker MR: Locking granularity revisited. ACM Transactions on Database Systems (TODS) 1979, 4(2):210-227.