| 研究生: |
張文賢 Chang, Wen-Hsien |
|---|---|
| 論文名稱: |
多核心系統之動態資料管理器設計 Design of a Run-Time Dynamic Data Manager for Multi-Core Systems |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 中文 |
| 論文頁數: | 96 |
| 中文關鍵詞: | 多核心系統 、平行執行 、預測式平行執行 、Transactional Memory |
| 外文關鍵詞: | Multi-Core System, Parallel Execution, Speculative Parallel Execution, Transactional Memory |
| 相關次數: | 點閱:134 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在多核心系統中,執行緒平行執行是一種提高程式運算速度與平行度的程式執行方式。由於執行緒在平行執行時會存取與運算共用資料,因此需要同步。以往的平行程式經常使用互斥鎖同步機制,只允許持有互斥鎖的執行緒存取與運算共用資料。礙於系統無法主動解決平行執行緒的同步問題,使得程式設計師需自行處理,反而讓平行程式變得複雜,以致同步效率不佳,難以提高平行度。
本論文以Transactional Memory平行執行概念為基礎,提出一個動態資料管理器,用以主動處理平行執行緒的同步問題,進而提高平行程式的執行速度,以及降低平行程式的複雜度。Transaction平行執行時動態產生的衝突可被動態資料管理器偵測出來,並藉由積極型衝突偵測與積極型版本管理加以解決。另外,多核心系統中多階層的快取結構會衍生出快取一致性的問題,這部分則由動態資料管理器與各處理器內的資料存取控制器共同解決。根據實驗結果,動態資料管理器可以有效同步平行執行緒,transaction化的平行程式將比循序版本提高約兩倍的執行速度。
In multi-core systems, the parallel execution of threads improves the performance of parallel programs and exploits parallelism. The parallel execution of threads requires synchronization when access to and computing on shared data. In conventional parallel programs, the mutex lock was often used for thread synchronization, only the thread holding the lock can access to and compute on shared data. Without the run-time support of the system, thread synchronization must be handled by programmer which results in higher complexity and poor performance.
This thesis introduces a Run-Time Dynamic Data Manager (RDM) which provides run-time support for thread synchronization by leveraging the concept of Transactional Memory, thereby improving performance and reducing complexity of parallel programs. All conflicts generated by transactions can be detected by the RDM and resolved by the eager conflict detection and the eager version management. The cache coherence is also considered and co-managed by the RDM and the per-core Data Access Controller. The average performance improvement of selected benchmarks running in parallel is about 2 times faster than the sequential ones.
[1] P. Gepner and M. F. Kowalik, "Multi-Core Processors: New Way to Achieve High System Performance," presented at the Proceedings of the international symposium on Parallel Computing in Electrical Engineering, 2006.
[2] D. A. Patterson and J. L. Hennessy, Computer architecture: a quantitative approach: Morgan Kaufmann Publishers Inc., 1990.
[3] B. A. Nayfeh and K. Olukotun, "A single-chip multiprocessor," Computer, vol. 30, pp. 79-85, 1997.
[4] P. B. Hansen, Operating system principles: Prentice-Hall, Inc., 1973.
[5] M. Herlihy, J. Eliot, and B. Moss, "Transactional Memory: Architectural Support For Lock-free Data Structures," in Computer Architecture, 1993., Proceedings of the 20th Annual International Symposium on, 1993, pp. 289-300.
[6] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar, "Next Generation Intel Core Micro-Architecture (Nehalem) Clocking," Solid-State Circuits, IEEE Journal of, vol. 44, pp. 1121-1129, 2009.
[7] M. Gschwind, "Chip multiprocessing and the cell broadband engine," presented at the Proceedings of the 3rd conference on Computing frontiers, Ischia, Italy, 2006.
[8] F. Baskett, T. Jermoluk, and D. Solomon, "The 4D-MP graphics superworkstation: computing+graphics=40 MIPS+MFLOPS and 100000 lighted polygons per second," in Compcon Spring '88. Thirty-Third IEEE Computer Society International Conference, Digest of Papers, 1988, pp. 468-471.
[9] M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," SIGARCH Comput. Archit. News, vol. 12, pp. 348-354, 1984.
[10] Advanced Micro Devices, Inc. AMD64 architecture programmer's manual volume 2: System programming. Publication No. 24593, Revision: 3.17, June 2010. .
[11] J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: a hybrid memory model for accelerators," SIGARCH Comput. Archit. News, vol. 38, pp. 429-440, 2010.
[12] J. H. Kelm, M. R. Johnson, S. S. Lumettta, and S. J. Patel, "WAYPOINT: scaling coherence to thousand-core architectures," presented at the Proceedings of the 19th international conference on Parallel architectures and compilation techniques, Vienna, Austria, 2010.
[13] L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun, "Programming with transactional coherence and consistency (TCC)," SIGPLAN Not., vol. 39, pp. 1-13, 2004.
[14] K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood, "LogTM: log-based transactional memory," in High-Performance Computer Architecture, 2006. The Twelfth International Symposium on, 2006, pp. 254-265.
[15] L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, W. Honggo, C. Kozyrakis, and K. Olukotun, "Transactional memory coherence and consistency," in Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on, 2004, pp. 102-113.
[16] J. G. Steffan, "Hardware support for thread-level speculation," Carnegie Mellon University, 2003.
[17] L. S. Hammond, "Hydra: a chip multiprocessor with support for speculative thread-level parallelization," Stanford University, 2002.
[18] W. Baek, N. Bronson, C. Kozyrakis, and K. Olukotun, "Making nested parallel transactions practical using lightweight hardware support," presented at the Proceedings of the 24th ACM International Conference on Supercomputing, Tsukuba, Ibaraki, Japan, 2010.
[19] L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood, "LogTM-SE: Decoupling Hardware Transactional Memory from Caches," in High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, 2007, pp. 261-272.
[20] K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz, "Smart Memories: a modular reconfigurable architecture," SIGARCH Comput. Archit. News, vol. 28, pp. 161-171, 2000.
[21] N. A. Kurd, J. S. Barkatullah, R. O. Dizon, T. D. Fletcher, and P. D. Madland, "Multi-GHz clocking scheme for Intel(R) Pentium(R) 4 microprocessor," in Solid-State Circuits Conference, 2001. Digest of Technical Papers. ISSCC. 2001 IEEE International, 2001, pp. 404-405.
[22] B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner, "POWER5 system microarchitecture," IBM Journal of Research and Development, vol. 49, pp. 505-521, 2005.
[23] F. Bellard, "QEMU, a fast and portable dynamic translator," presented at the Proceedings of the annual conference on USENIX Annual Technical Conference, Anaheim, CA, 2005.
[24] M. Chi Cao, C. JaeWoong, C. Kozyrakis, and K. Olukotun, "STAMP: Stanford Transactional Applications for Multi-Processing," in Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, 2008, pp. 35-46.
[25] D. Dice, O. Shalev, and N. Shavit, "Transactional locking II," presented at the Proceedings of the 20th international conference on Distributed Computing, Stockholm, Sweden, 2006.