成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張文賢 Chang, Wen-Hsien
論文名稱：	多核心系統之動態資料管理器設計 Design of a Run-Time Dynamic Data Manager for Multi-Core Systems
指導教授：	周哲民 Jou, Jer-Min
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2012
畢業學年度：	100
語文別：	中文
論文頁數：	96
中文關鍵詞：	多核心系統、平行執行、預測式平行執行、Transactional Memory
外文關鍵詞：	Multi-Core System, Parallel Execution, Speculative Parallel Execution, Transactional Memory
相關次數：	點閱：215 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在多核心系統中，執行緒平行執行是一種提高程式運算速度與平行度的程式執行方式。由於執行緒在平行執行時會存取與運算共用資料，因此需要同步。以往的平行程式經常使用互斥鎖同步機制，只允許持有互斥鎖的執行緒存取與運算共用資料。礙於系統無法主動解決平行執行緒的同步問題，使得程式設計師需自行處理，反而讓平行程式變得複雜，以致同步效率不佳，難以提高平行度。
本論文以Transactional Memory平行執行概念為基礎，提出一個動態資料管理器，用以主動處理平行執行緒的同步問題，進而提高平行程式的執行速度，以及降低平行程式的複雜度。Transaction平行執行時動態產生的衝突可被動態資料管理器偵測出來，並藉由積極型衝突偵測與積極型版本管理加以解決。另外，多核心系統中多階層的快取結構會衍生出快取一致性的問題，這部分則由動態資料管理器與各處理器內的資料存取控制器共同解決。根據實驗結果，動態資料管理器可以有效同步平行執行緒，transaction化的平行程式將比循序版本提高約兩倍的執行速度。

In multi-core systems, the parallel execution of threads improves the performance of parallel programs and exploits parallelism. The parallel execution of threads requires synchronization when access to and computing on shared data. In conventional parallel programs, the mutex lock was often used for thread synchronization, only the thread holding the lock can access to and compute on shared data. Without the run-time support of the system, thread synchronization must be handled by programmer which results in higher complexity and poor performance.
This thesis introduces a Run-Time Dynamic Data Manager (RDM) which provides run-time support for thread synchronization by leveraging the concept of Transactional Memory, thereby improving performance and reducing complexity of parallel programs. All conflicts generated by transactions can be detected by the RDM and resolved by the eager conflict detection and the eager version management. The cache coherence is also considered and co-managed by the RDM and the per-core Data Access Controller. The average performance improvement of selected benchmarks running in parallel is about 2 times faster than the sequential ones.

摘要	I
Abstract	II
誌謝	III
目錄	IV
圖目錄	VIII
表目錄	X
第1章 緒論	1
1	研究背景	1
2	研究動機與目的	2
3	論文架構	3
第2章 背景知識與相關研究	5
1	單晶片多處理器	5
2	快取一致性	6
3	執行緒的同步	7
4	Transactional Memory	10
4.1	Transaction之定義	10
4.2	衝突偵測機制	11
4.3	版本管理機制	13
5	執行緒層級之預測式執行	14
第3章 動態資料管理器之設計	17
1	動態資料管理器之設計考量	17
2	動態資料管理器之架構	18
3	動態資料管理器的衝突偵測單元	20
3.1	衝突偵測機制	20
3.2	Transaction存取權限的過濾: 降低衝突偵測的次數	23
3.3	Transaction一致性協議	24
4	版本管理機制	25
5	Transaction的平行執行方式	27
6	預測式transaction平行執行	30
7	動態資料管理器之演算法	31
7.1	動態資料管理器之詳細架構	31
7.2	動態資料管理器接收與發送之請求與訊息	33
7.3	動態資料管理器之演算法	34
第4章 動態平行執行架構之設計	43
1	動態平行執行架構之設計考量	43
2	記憶體系統之設計考量	44
3	快取一致性協議之設計考量	45
3.1	L1快取的一致性協議：MESI 協議	45
3.2	Level 2快取的管理協議	46
第5章 動態平行執行架構之實現	48
1	動態平行執行架構	48
2	處理器	49
2.1	處理器核	50
2.2	處理器核演算法及使用之資料結構	51
2.2.1	Task Descriptor	52
2.2.2	Instruction Descriptor	52
2.2.3	處理器核演算法	53
3	資料存取控制器	55
3.1	資料存取控制器架構	56
3.2	資料存取控制器傳送與接收之訊息	57
3.3	資料存取控制器的請求執行流程	58
3.4	資料存取控制器演算法	59
4	存取權限過濾器	62
4.1	存取權限過濾器的架構	62
4.2	存取權限過濾器接收與傳送之訊息	63
4.3	存取權限過濾器的請求執行流程	64
4.4	存取權限過濾器的演算法	65
5	記憶體控制器	67
6	控制處理器	67
第6章 實驗環境與數據分析	69
1	環境架設	69
2	測試程式	72
2.1	Histogram Benchmark	72
2.2	STAMP Benchmark	73
3	STM實驗數據與結果分析	74
3.1	低資料量：一千萬筆測試資料	74
3.2	高資料量：一億筆測試資料	77
4	動態平行執行架構及動態資料管理器之實驗數據與結果分析	78
4.1	在QEMU上量測執行時間的方法	78
4.2	效能分析	79
4.3	執行緒與資料同步情形之分析	82
5	實驗總結	89
第7章 結論與未來展望	91
1	結論	91
2	未來展望	92
參考文獻	94
                                    

[1] P. Gepner and M. F. Kowalik, "Multi-Core Processors: New Way to Achieve High System Performance," presented at the Proceedings of the international symposium on Parallel Computing in Electrical Engineering, 2006.
[2] D. A. Patterson and J. L. Hennessy, Computer architecture: a quantitative approach: Morgan Kaufmann Publishers Inc., 1990.
[3] B. A. Nayfeh and K. Olukotun, "A single-chip multiprocessor," Computer, vol. 30, pp. 79-85, 1997.
[4] P. B. Hansen, Operating system principles: Prentice-Hall, Inc., 1973.
[5] M. Herlihy, J. Eliot, and B. Moss, "Transactional Memory: Architectural Support For Lock-free Data Structures," in Computer Architecture, 1993., Proceedings of the 20th Annual International Symposium on, 1993, pp. 289-300.
[6] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar, "Next Generation Intel Core Micro-Architecture (Nehalem) Clocking," Solid-State Circuits, IEEE Journal of, vol. 44, pp. 1121-1129, 2009.
[7] M. Gschwind, "Chip multiprocessing and the cell broadband engine," presented at the Proceedings of the 3rd conference on Computing frontiers, Ischia, Italy, 2006.
[8] F. Baskett, T. Jermoluk, and D. Solomon, "The 4D-MP graphics superworkstation: computing+graphics=40 MIPS+MFLOPS and 100000 lighted polygons per second," in Compcon Spring '88. Thirty-Third IEEE Computer Society International Conference, Digest of Papers, 1988, pp. 468-471.
[9] M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," SIGARCH Comput. Archit. News, vol. 12, pp. 348-354, 1984.
[10] Advanced Micro Devices, Inc. AMD64 architecture programmer's manual volume 2: System programming. Publication No. 24593, Revision: 3.17, June 2010. .
[11] J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: a hybrid memory model for accelerators," SIGARCH Comput. Archit. News, vol. 38, pp. 429-440, 2010.
[12] J. H. Kelm, M. R. Johnson, S. S. Lumettta, and S. J. Patel, "WAYPOINT: scaling coherence to thousand-core architectures," presented at the Proceedings of the 19th international conference on Parallel architectures and compilation techniques, Vienna, Austria, 2010.
[13] L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun, "Programming with transactional coherence and consistency (TCC)," SIGPLAN Not., vol. 39, pp. 1-13, 2004.
[14] K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood, "LogTM: log-based transactional memory," in High-Performance Computer Architecture, 2006. The Twelfth International Symposium on, 2006, pp. 254-265.
[15] L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, W. Honggo, C. Kozyrakis, and K. Olukotun, "Transactional memory coherence and consistency," in Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on, 2004, pp. 102-113.
[16] J. G. Steffan, "Hardware support for thread-level speculation," Carnegie Mellon University, 2003.
[17] L. S. Hammond, "Hydra: a chip multiprocessor with support for speculative thread-level parallelization," Stanford University, 2002.
[18] W. Baek, N. Bronson, C. Kozyrakis, and K. Olukotun, "Making nested parallel transactions practical using lightweight hardware support," presented at the Proceedings of the 24th ACM International Conference on Supercomputing, Tsukuba, Ibaraki, Japan, 2010.
[19] L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood, "LogTM-SE: Decoupling Hardware Transactional Memory from Caches," in High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, 2007, pp. 261-272.
[20] K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz, "Smart Memories: a modular reconfigurable architecture," SIGARCH Comput. Archit. News, vol. 28, pp. 161-171, 2000.
[21] N. A. Kurd, J. S. Barkatullah, R. O. Dizon, T. D. Fletcher, and P. D. Madland, "Multi-GHz clocking scheme for Intel(R) Pentium(R) 4 microprocessor," in Solid-State Circuits Conference, 2001. Digest of Technical Papers. ISSCC. 2001 IEEE International, 2001, pp. 404-405.
[22] B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner, "POWER5 system microarchitecture," IBM Journal of Research and Development, vol. 49, pp. 505-521, 2005.
[23] F. Bellard, "QEMU, a fast and portable dynamic translator," presented at the Proceedings of the annual conference on USENIX Annual Technical Conference, Anaheim, CA, 2005.
[24] M. Chi Cao, C. JaeWoong, C. Kozyrakis, and K. Olukotun, "STAMP: Stanford Transactional Applications for Multi-Processing," in Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, 2008, pp. 35-46.
[25] D. Dice, O. Shalev, and N. Shavit, "Transactional locking II," presented at the Proceedings of the 20th international conference on Distributed Computing, Stockholm, Sweden, 2006.

2013-08-29公開

簡易檢索 / 詳目顯示

相關論文