簡易檢索 / 詳目顯示

研究生: 蔣致祥
Chiang, Chih-Hsiang
論文名稱: 一個可呈現全域狀態變遷的平行程式除錯機制
A Parallel Debugging Mechanism for Exhibiting Global State Transition
指導教授: 謝錫堃
Shieh, Ce-Kuen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 37
中文關鍵詞: 除錯平行程式重播變遷全域狀態
外文關鍵詞: debug parallel program, global state, replay, transition
相關次數: 點閱:65下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於近年來的多核心中央處理單元與多個中央處理單元的發展,平行程式已經是一項流行的趨勢。而平行程式的優點: 第一,平行程式在多個程序之間可以分享資源。第二,當平行程式運行在多個行程/執行緒上可以增進程式的效能。第三,平行程式可以利用在現有的大型運算系統,像是格狀系統和雲端運算系統。然而,在發展一個平行程式在多個行程/執行緒卻會遇到所多問題。它沒有一個共有的全域時鐘與行程/執行緒由於在不同的中央處理單元會有不同的執行速度,所以平行程式的開發者很難去預測多個行程/執行緒的執行順序。由於上述的原因,平行程式的開發者必須要去除錯一個沒有固定的執行順序的平行程式執行在多個行程/執行緒之上導致每一次執行的結果也都不盡相同。而在這篇論文中,我們提供了一個全域狀態變遷的機制使平行程式開發者容易去除錯一個平行程式。藉由呈現全域狀態的執行順序讓平行程式開發者能夠清楚地明白平行程式的行為而且平行程式開發者也能夠更容易的去控制一個平行程式。

    Parallel application programming has become popular especially due to the emergences of recent multi-core CPUs and many-core GPUs in the computer market. Advantages of parallel application programming: First, parallel application programming can share resource among processes. Second, parallel application programming can increase application performance with the process/thread cooperation. Third, parallel application programming can utilize present large-scale computing systems distributed over the world such as Gird systems and Cloud computing systems.

    However, parallel application programmers have many troubles in programming multiple processes/threads that cooperate for the application. It cannot use a global clock and processes/threads running on different speed CPUs so it is hard for parallel program developers to predict the execution order of the processes/threads. Because of these reasons, parallel application programmers have to debug the parallel application committing a nondeterministic execution result without obtaining a total order among the processes/threads.
    In this paper, we propose the Global State Transition (GST) to facilitate debugging parallel applications. Parallel program developers can clearly understand the behavior of a parallel application and easily control the parallel application by modifying the appearing order of global states.

    Chapter 1: Introduction - 1 - Chapter 2: Background - 4 - Chapter 3: Related work - 7 - Chapter 4: Global State Transition (GST) - 11 - 4.1 GST Overview - 11 - 4.2 GST Working Principle - 13 - 4.3 Components of Global State Transition - 16 - 4.3.1 GST Agent - 17 - 4.3.2 GST Manager - 18 - Chapter 5: Implementation - 20 - 5.1 System architecture - 20 - 5.2 Implementation of Synchronization Point Monitor - 22 - 5.3 Implementation of GDB Relay - 23 - 5.4 Implementation of Synchronization Point Replayer - 24 - 5.5 Implementation of Transition Interval Locator - 25 - 5.6 Implementation of State Transition Tracer - 27 - Chapter 6: Performance - 28 - 6.1 Overhead of inter and intra site - 29 - 6.1.1 Intra site - 29 - 6.1.2 Inter site - 30 - 6.2 Tracing lock primitive - 31 - 6.3 Screen shot - 32 - Chapter 7: Conclusion - 34 - References - 35 -

    [1] F. Mattern. Virtual time and global states of distributed systems. In Michel Cosnard, Yves Robert, Patrice Quinton, and Michel Raynal, editors, Parallel & Distributed Algorithms, pages 215–226. Elsevier Science Publishers, Amsterdam, 1989.

    [2] K. Mani Chandy and Leslie Lamport. Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems, 3(1), February 1985. 63-75.

    [3] Dieter Haban and WolfgangWeigel. Global Events and Global Breakpoints in Distributed Systems. In Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences, pages 166{175, January 1988.

    [4] Basten T: Breakpoints and time in distributed computations. In: Tel G, Vit’anyi PMB (eds) Distributed Algorithms, 8th. International Workshop, WDAG ’94, Proceedings, Vol 857 of Lecture Notes in Computer Science, pp 340–355, Terschelling, The Netherlands, September/October 1994, Springer Berlin, Heidelberg, New York 1994.

    [5] K. Marzullo and G. Neiger, “Detection of Global State Predicates”, Proc. of 5th Int. Workshop WDAG ’91, Delphi, Greece, 1991, LNCS 579 Springer 1992.

    [6] O. Babaoglu and K. Marzullo, Consistent global states of distributed systems: fundamental concepts and mechanisms, in: Distributed Systems (Addison-Wesley, 1995) Consistent global states of distributet systems: Fundamental Concepts and Mechanisms, O. Babouglu and K. Marzullo.

    [7] J.-M. H’elary. Observing global states of asynchronous distributed applications. In Proceedings of the 3rd International Workshop on Distributed Algorithms. pringer-Verlag, LNCS 392, 1989.

    [8] R. Cooper and K. Marzullo, “Consistent detection of global predicates,” Proc. ACM/ONR Workshop on Parallel Distributed Debugging, pp 163-173, 1991.

    [9] J. Borkowski, “Strongly Consistent Global State Detection for On-line Control of Distributed Applications”, 12-th Euromicro Conference on Parallel Distributed and Network-Based Processing, PDP 2004, La Coruna, Spain, Feb., 2004, IEEE Computer Society, pp. 126-133.

    [10] J. Borkowski, “Global Predicates for On-line Control of Distributed Applications”, in Parallel Programming and Applied Mathematics PPAM ’03, Sept. 2003, Czestochowa, Poland, Springer LNCS 3019.

    [11] S. Narayanasamy, G. Pokam, and B. Calder, “BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging,” Proc. 32nd Ann. Int’l Symp. Computer Architecture (ISCA 05), IEEE CS Press, 2005, pp. 284-295.

    [12] E. M. Moreira, R. H. C. Santana, and M. J. Santana. Using consistent global checkpoints to synchronize processes in distributed simulation. In Proceedings of the 9th IEEE International Symposium on Distributed Simulation and Real-Time Applications (DS-RT), pages 43?50, 2005.

    [13] N. Neves, M. Castro, and P. Guedes. A checkpoint protocol for an entry consistent shared memory system. In Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing, August 1994.

    [14] Greg Bronevetsky, Daniel Marques, Keshav Pingali, Peter Szwed, and Martin Schulz. Application-level checkpointing for shared memory programs. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 235–247, 2004.

    [15] M. Xu, R. Bodik, and M. Hill, “A Flight Data Recorder for Enabling Full System Multiprocessor Deterministic Replay,” Proc. 30th Ann. Int’l Symp. Computer Architecture (ISCA 03), ACM Press, 2003, pp. 122-135.

    [16] L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM (CACM), 21 (7):558-565, July 1978.

    [17] T.-Y. Liang, C.-Y. Wu, J.-B. Chang, and C.-K. Shieh, “Teamster-G: A Grid-Enabled Software DSM System,” Proc. Fifth IEEE Symp. Cluster Computing and the Grid (CCGrid ’05), pp. 905-912, 2005.

    下載圖示 校內:2010-07-29公開
    校外:2010-07-29公開
    QR CODE