簡易檢索 / 詳目顯示

研究生: 邱永昌
Chiu, Yung-Chang
論文名稱: 虛擬共享式記憶體多處理機系統除錯機制之研究
Study of Debugging Mechanisms for Virtual Shared Memory Multiprocessor Systems
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導教授: 黃祖基
Huang, Tzu-Chi
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 97
中文關鍵詞: 資料競爭確定性重播
外文關鍵詞: data race, deterministic replay, distributed shared memory, debugger, abnormal behavior
相關次數: 點閱:132下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 虛擬共享式記憶體多處理機(VSMM)系統提供程式設計者在分散式機器上一個共享記憶體的概念,然而由於多個執行緒或處理程序平行執行的特性,VSMM程式也許會遭受到資料競爭(data race)錯誤的干擾,目前程式設計者不能滿足於傳統的除錯工具透過檢查每個執行緒的及時資訊來找出在他們的VSMM程式中異常執行緒所產生的錯誤,特別是當他們的VSMM程式擁有大量的執行緒時。在本論文中提出了一個除錯的機制包含了資料競爭避免與重播機制(DRARS)與根據症狀分類定位異常執行緒(ATLAS)來使程式設計者更容易除錯他們的VSMM程式。DRARS能夠避免VSMM程式在執行時發生某些資料競爭(data racing)的錯誤,對於那些無法避免的資料競爭的錯誤,DRARS透過執行確定性重播(deterministic replay)的機制來解決,也就是忠實的重現VSMM程式執行時的行為。
    ATLAS根據症狀分類可以找出異常的執行緒,其症狀包含了函數執行的順序、變數存取的順序、變數變化量與執行緒的執行時間,ATLAS能夠幫助程式設計者在他們錯誤的VSMM程式中精確地找出異常執行緒的行為,且在除錯應用程式時有許多優點如高移植性、低額外的成本與高可行性。本論文不只介紹DRARS與ATLAS的機制,且提供在Linux上它們的額外的效能成本與其測量它們的效能相關的數據。

    Virtual shared memory multiprocessor (VSMM) systems provide programmers with an abstraction of shared memory on distributed machines. However, VSMM programs may suffer from data race bugs due to the concurrent execution of multiple threads or processes. Currently, programmers cannot be satisfied with traditional debugging tools to locate abnormal threads committing bugs in their VSMM programs by checking run-time information of each thread, especially when their VSMM programs have massive threads. A debugging mechanism including Data Race Avoidance and Replay Scheme (DRARS) as well as Abnormal Threads Locator based on Assorted Symptoms (ATLAS) is proposed in this dissertation to facilitate programmers to debug their VSMM programs. DRARS can prevent VSMM programs from occurring certain data racing bugs at runtime. For some data racing bugs that cannot be prevented, DRARS provides programmers with a deterministic replay mechanism that can faithfully reproduce run-time behaviors of VSMM programs. ATLAS can capture abnormal threads according to assorted symptoms embodied in function execution order, variable access order, variance in variable, and thread execution time. ATLAS can help programmers to precisely locate the abnormal thread behaviors in their buggy VSMM programs. ATLAS has several benefits, e.g. high portability, low overhead, and amazing practicability in debugging popular applications. This dissertation not only introduces DRARS and ATLAS, but also shows their overheads and evaluates their performances on Linux.

    摘要 I ABSTRACT III 誌謝 V Contents VI Figures VIII Tables IX CHAPTER 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 5 1.3 Organization of the Dissertation 7 CHAPTER 2 Related Works 8 2.1 Data Racing Detection 8 2.2 Deterministic Replay 10 2.3 Visualizing and Handling Enormous Information 11 2.4 Anomaly Detection System 12 2.4.1 Dickinson et al 12 2.4.2 Yuan et al 13 2.4.3 Mirgorodskiy et al 13 2.4.4 DMTracker 14 2.4.5 Lan et al 15 CHAPTER 3 Background 17 3.1 Eager Release Consistency Protocol 19 3.2 Data Race in Eager Release Consistency Protocol 21 CHAPTER 4 Design 23 4.1 Data Race Avoidance and Replay Scheme (DRARS) 23 4.1.1 DRARS Overview 23 4.1.2 Components of Data Race Avoidance and Replay Scheme 26 4.2 Abnormal Threads Locator based on Assorted Symptoms (ATLAS) 31 4.2.1 ATLAS Overview 31 4.2.2 Assorted Symptoms 33 4.2.3 ATLAS Components 37 CHAPTER 5 Implementation 41 5.1 DRARS Implementation 41 5.2 ATLAS Implementation 46 CHAPTER 6 Performance 50 6.1. DRARS Performance 50 6.1.1. DRARS Overhead Breakdown 50 6.1.2. Solving Page Replica Internal Access Conflict 54 6.1.3. Solving Diff Update Internal Access Conflict 56 6.1.4. Solving External Access Conflict 58 6.1.5. Solving a Mix of Access Conflicts 59 6.1.6. Performance Impact 63 6.2. ATLAS Performance 66 6.2.1. ATLAS Overhead Breakdown 66 6.2.2. Demonstration of Locating Bug with ATLAS 69 6.2.3. Detecting Abnormality in Popular Applications with ATLAS 78 6.2.4. Performance Impact 88 CHAPTER 7 Conclusions and Future work 90 Bibliography 93

    [1] R. Stallman, R. Pesch, S. Shebs, Debugging with GDB, Free Software Foundation, Boston, 1993.
    [2] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, T. Anderson, Eraser: A Dynamic Data Race Detector for Multithreaded Programs, ACM Trans. Comput. Syst., 15 (1997) 391-411.
    [3] D. Engler, K. Ashcraft, RacerX: Effective, Static Detection of Race Conditions and Deadlocks, in: Proceedings of the 19th ACM Symposium on Operating Systems Principles, ACM, Bolton Landing, NY, USA, 2003, pp. 237-252.
    [4] T.A. Henzinger, R. Jhala, R. Majumdar, Race Checking by Context Inference, in: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, ACM, Washington DC, USA, 2004, pp. 1-13.
    [5] N. Sterling, WARLOCK-A Static Data Race Analysis Tool, in: Proceedings of USENIX Winter Technical Conference, Usenix Association, San Diego, California, USA, 1993, pp. 97.
    [6] S.V. Adve, M.D. Hill, B.P. Miller, R.H.B. Netzer, Detecting Data Races on Weak Memory Systems, in: Proceedings of the 18th Annual International Symposium on Computer Architecture, ACM, Toronto, Ontario, Canada, 1991, pp. 234-243.
    [7] J.-D. Choi, B.P. Miller, R.H.B. Netzer, Techniques for Debugging Parallel Programs with Flowback Analysis, ACM Trans. Program. Lang. Syst., 13 (1991) 491-530.
    [8] M. Christiaens, K.D. Bosschere, TRaDe, A Topological Approach to On-the-fly Race Detection in Java Programs, in: Proceedings of the 2001 Symposium on Java Virtual Machine Research and Technology Symposium, USENIX Association, Monterey, California, 2001, pp. 15-15.
    [9] A. Dinning, E. Schonberg, An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection, in: Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, Seattle, Washington, United States, 1990, pp. 1-10.
    [10] J. Mellor-Crummey, On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism, in: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, ACM, Albuquerque, New Mexico, United States, 1991, pp. 24-33.
    [11] M. Ronsse, K.D. Bosschere, RecPlay: A Fully Integrated Practical Record/Replay System, ACM Trans. Comput. Syst., 17 (1999) 133-152.
    [12] E. Schonberg, On-the-fly Detection of Access Anomalies, SIGPLAN Notices, 39 (2004) 313-327.
    [13] R. Agarwal, A. Sasturkar, L. Wang, S.D. Stoller, Optimized Run-Time Race Detection and Atomicity Checking Using Partial Discovered Types, in: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ACM, Long Beach, CA, USA, 2005, pp. 233-242.
    [14] H. Nishiyama, Detecting Data Races Using Dynamic Escape Analysis Based on Read Barrier, in: Proceedings of the 3rd Conference on Virtual Machine Research And Technology Symposium, USENIX Association, San Jose, California, 2004, pp. 10-10.
    [15] C.v. Praun, T.R. Gross, Object Race Detection, in: Proceedings of the 16th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM, Tampa Bay, FL, USA, 2001, pp. 70-82.
    [16] C.v. Praun, T.R. Gross, Static Conflict Analysis for Multi-Threaded Object-Oriented Programs, in: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, ACM, San Diego, California, USA, 2003, pp. 115-128.
    [17] J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, M. Sridharan, Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs, in: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, ACM, Berlin, Germany, 2002, pp. 258-269.
    [18] S. Narayanasamy, G. Pokam, B. Calder, BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging, in: Proceedings of the 32nd Annual International Symposium on Computer Architecture, IEEE Computer Society, Madison, Wisconsin, USA, 2005, pp. 284-295.
    [19] F. Cornelis, M. Ronsse, K. Bosschere, Tornado: A Novel Input Replay Tool, in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, CSREA Press, Las Vegas, Nevada, USA, 2003, pp. 1598-1604.
    [20] D.J. Sorin, M.M.K. Martin, M.D. Hill, D.A. Wood, SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery, in: Proceedings of the 29th Annual International Symposium on Computer Architecture, IEEE Computer Society, Anchorage, Alaska, 2002, pp. 123-134.
    [21] T.J. LeBlanc, J.M. Mellor-Crummey, Debugging Parallel Programs with Instant Replay, IEEE Trans. Comput., 36 (1987) 471-482.
    [22] J.-D. Choi, H. Srinivasan, Deterministic replay of Java multithreaded applications, in: Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, ACM, Welches, Oregon, United States, 1998, pp. 48-59.
    [23] M. Ronsse, W. Zwaenepoel, Execution Replay for TreadMarks, in: Proceedings of the Euromicro Workshop on Parallel and Distributed Processing, IEEE Computer Society, London, UK, 1997, pp. 343.
    [24] D. Arnold, D. Ahn, B. de Supinski, G. Lee, B. Miller, M. Schulz, Stack trace analysis for large scale debugging, in: the 21st International Parallel and Distributed Processing Symposium (IPDPS'07), Long Beach, CA, 2007, pp. 1-10.
    [25] C. Chen, The parallel debugging architecture in the intel debugger, Parallel Computing Technologies, (2003) 444-451.
    [26] M. Geimer, F. Wolf, B. Wylie, B. Mohr, A scalable tool architecture for diagnosing wait states in massively parallel applications, Parallel Computing, 35 (2009) 375-388.
    [27] J. Labarta, J. Gimenez, E. Martinez, P. Gonzalez, H. Servat, G. Llort, X. Aguilar, Scalability of visualization and tracing tools, in: Proceedings Parallel Computing, 2005, Malaga, Spain, pp. 869-876.
    [28] M. Noeth, P. Ratn, F. Mueller, M. Schulz, B. de Supinski, ScalaTrace: Scalable compression and replay of communication traces for high-performance computing, Journal of Parallel and Distributed Computing, 69 (2009) 696-710.
    [29] P. Roth, B. Miller, On-line automated performance diagnosis on thousands of processes, in: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, ACM, 2006, pp. 80.
    [30] A.V. Mirgorodskiy, N. Maruyama, B.P. Miller, Problem diagnosis in large-scale computing environments, in: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, ACM, Tampa, Florida, 2006, pp. 88.
    [31] P. Barham, A. Donnelly, R. Isaacs, R. Mortier, Using Magpie for request extraction and workload modelling, in: Proceeding OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, USENIX Association, San Francisco, CA, 2004, pp. 18-18.
    [32] M. Chen, E. Kiciman, E. Fratkin, A. Fox, E. Brewer, Pinpoint: Problem determination in large, dynamic internet services, in: Proceedings of Dependable Systems and Networks (DSN 2002), IEEE Computer Society, 2002, pp. 595-604.
    [33] W. Dickinson, D. Leon, A. Podgurski, Finding failures by cluster analysis of execution profiles, in: Proceedings of the 23rd International Conference on Software Engineering, IEEE Computer Society, Toronto, Ontario, Canada, 2001, pp. 339-348.
    [34] Q. Gao, F. Qin, D.K. Panda, Dmtracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements, in: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, Nevada, 2007, pp. 1-12.
    [35] Z. Lan, Z. Zheng, Y. Li, Toward automated anomaly identification in large-scale systems, IEEE Transactions on Parallel and Distributed Systems, (2010) 174-187.
    [36] C. Yuan, N. Lao, J. Wen, J. Li, Z. Zhang, Y. Wang, W. Ma, Automated known problem diagnosis with event traces, ACM SIGOPS Operating Systems Review, 40 (2006) 375-388.
    [37] A. Nataraj, A.D. Malony, A. Morris, D.C. Arnold, B.P. Miller, A framework for scalable, parallel performance monitoring, Concurr. Comput. : Pract. Exper., 22 (2010) 720-735.
    [38] R. Buyya, High Performance Cluster Computing: Architectures and Systems, Prentice Hall PTR, 1999.
    [39] K. Li, P. Hudak, Memory Coherence in Shared Virtual Memory Systems, ACM Trans. Comput. Syst., 7 (1989) 321-359.
    [40] J.-B. Chang, C.-K. Shieh, T.-Y. Liang, A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors, J. Supercomput., 37 (2006) 145-160.
    [41] A. Dash, B. Demsky, Software Transactional Distributed Shared Memory, in: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, Raleigh, NC, USA, 2009, pp. 297-298.
    [42] A. Yahya, R. Bader, Distributed Shared Memory Consistency Object-based Model, Journal of Computer Science, 3 (2007) 57-61.
    [43] W. Gropp, E. Lusk, N. Doss, A. Skjellum, A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard, Parallel Comput., 22 (1996) 789-828.
    [44] B.D. Martino, D. Kranzlmuller, J. Dongarra, Editorial: Special Section: Grid Computing and the Message Passing Interface, Future Gener. Comput. Syst., 24 (2008) 119-120.
    [45] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, W. Zwaenepoel, TreadMarks: Shared Memory Computing on Networks of Workstations, IEEE Computer, 29 (1996) 18-28.
    [46] J.B. Carter, J.K. Bennett, W. Zwaenepoel, Techniques for Reducing Consistency-related Communication in Distributed Shared-memory Systems, ACM Trans. Comput. Syst., 13 (1995) 205-243.
    [47] D. Galli, Distributed Operating Systems, Prentice Hall PTR, Upper Saddle River, NJ, USA, 1999.
    [48] F. Mueller, Distributed Shared-Memory Threads: DSM-Threads, in: Workshop on Run-Time Systems for Parallel Programming Citeseer, Switzerland, 1997, pp. 31-40.
    [49] I. Foster, C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, USA, 2004.
    [50] J. Yu, R. Buyya, A Taxonomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, 3 (2005) 171-200.
    [51] A. Weiss, Computing in the Clouds, netWorker, 11 (2007) 16-25.
    [52] R. Buyya, C. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility, Future Generation Computer Systems, 25 (2009) 599-616.
    [53] C. Amza, A.L. Cox, W. Zwaenepoel, S. Dwarkadas, Software DSM Protocols that Adapt between Single Writer and Multiple Writer, in: Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture, IEEE Computer Society, San Antonio, Texas, USA, 1997, pp. 261.
    [54] J.B. Carter, J.K. Bennett, W. Zwaenepoel, Implementation and performance of Munin, in: Proceedings of the 13th ACM Symposium on Operating Systems Principles, ACM, Pacific Grove, California, United States, 1991, pp. 152-164.
    [55] L. Silva, J. Silva, S. Chapple, Implementation and Performance of DSMPI, Scientific Programming, 6 (1997) 201-214.
    [56] H. Harada, Y. Ishikawa, A. Hori, H. Tezuka, S. Sumimoto, T. Takahashi, Dynamic Home Node Reallocation on Software Distributed Shared Memory System, in: Proceedings of IEEE 4th HPC ASIA 2000, Beijing, China, 2000, pp. 158-163.
    [57] B. Wilkinson, M. Allen, Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall, Upper Saddle River, NJ, USA, 1998.
    [58] S. Woo, M. Ohara, E. Torrie, J. Singh, A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, in: Proceedings of the 22nd Annual International Symposium on Computer Architecture, ACM, S. Margherita Ligure, Italy, 1995, pp. 24-36.
    [59] D.H. Bailey, FFTs in External or Hierarchical Memory, J. Supercomput., 4 (1990) 23-35.
    [60] H.-O. Peitgen, H. Jurgens, D. Saupe, Chaos and Fractals, SpringerVerlag, New York, 2004.

    下載圖示 校內:2013-07-14公開
    校外:2013-07-14公開
    QR CODE