| 研究生: |
謝政宏 Hsieh, Cheng-Hung |
|---|---|
| 論文名稱: |
資料驅動多執行緒之多核心系統資源管理器設計 Design of a Resource Manager for Data-Driven Multithreading on Multicore Systems |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 127 |
| 中文關鍵詞: | 多核心系統 、循序程式平行化 、資料驅動多執行緒 、資源管理器 |
| 外文關鍵詞: | Multicore System, Parallel Execution of Sequential Program, Data-Driven Multithreading, Resource Manager |
| 相關次數: | 點閱:94 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多核心系統的興起迫使SoC設計師必須考慮應用程式執行緒之間的平行性,因為應用程式只有在被分割成執行緒且平行執行在多個核心上時才能夠妥善利用多核心系統的資源。編譯階段的平行將會無法應付多核心環境中的動態變化且使得程式執行效率低落。為了使應用程式在眾多的軟體與硬體環境中得以有效率地平行執行,在本論文中,我們將提出一個基於軟硬體共同設計的運行時資源管理器來動態、持續且智慧地在動態的執行情況下管理應用程式的平行執行。管理器將會在多核心環境中對平行應用程式的執行緒進行有效率地排程、映射、同步與負載平衡。在本運算平台的實驗結果顯示,循序程式可以經由所提出的方法轉換為平行程式,且這些平行程式在同時執行時能比他們的循序版本提高約1.37倍的執行速度。
The rise of multicore systems forces SoC designers to take advantage of parallelism among threads in the applications, because an application can use multicore resource appropriately only when it can be separated as the threads which could be executed in the cores in parallel. Compiled-time parallel execution can fail to be efficient and unable to account for dynamic changes in the multicore run-time environment. In order to make time-efficient parallel execution of applications in a wide range of hardware and software environments, in this thesis, we propose a hardware-software co-design based run-time resource manager to dynamically, continuously and judiciously manage program's parallel execution in the dynamic execution conditions. It will schedule, map, synchronize, do load balance among threads of parallel applications in the multicore environment efficiently. Experimental results show that the sequential programs could be transformed into parallel programs as proposed, and those programs could be executed simultaneously with 1.37x faster than their sequential version.
[1]Arvind and Gostelow, K. P. "The U-interpreter." Computer 15.2 (1982): 42-49.
[2]Amdahl, Gene M. "Validity of the single processor approach to achieving large scale computing capabilities." Proceedings of the April 18-20, 1967, spring joint computer conference. ACM, 1967.
[3]C. Kyriacou, P. Evripidou, and P. Trancoso. "Data-driven multithreading using conventional microprocessors." Parallel and Distributed Systems, IEEE Transactions on 17.10 (2006): 1176-1188.
[4]D. Sanchez, R. M. Yoo, and C. Kozyrakis. "Flexible architectural support for fine-grain scheduling." ACM Sigplan Notices. Vol. 45. No. 3. ACM, 2010.
[5]D. Tam , R. Azimi , M. Stumm. "Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors." ACM SIGOPS Operating Systems Review. Vol. 41. No. 3. ACM, 2007.
[6]Foster Ian, "Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering", Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1995
[7]Fiduccia, Charles M., and Robert M. Mattheyses. "A linear-time heuristic for improving network partitions." Design Automation, 1982. 19th Conference on. IEEE, 1982.
[8]Feng Li, A. Pop, and A. Cohen. "Automatic extraction of coarse-grained data-flow threads from imperative programs." Micro, IEEE 32.4 (2012): 19-31.
[9]H. Vandierendonck, P. Polyvios, and S. N. Dimitrios. "Parallel programming of general-purpose programs using task-based programming models." Proceedings of the 3rd USENIX conference on hot topic in parallelism. USENIX Association, 2011.
[10]H. Vandierendonck, G. Tzenakis, and D. S. Nikolopoulos, "A unified scheduler for recursive and task dataflow parallelism", Parallel Architectures and Compilation Techniques, 2011 International Conference on. IEEE, 2011.
[11]Kernighan, B. W., and S. Lin. "An eflicient heuristic procedure for partitioning graphs." Bell system technical journal 49.2 (1970): 291-307.
[12]M.R. Guthaus et al., "MiBench: A free, commercially representative embedded benchmark suite," Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on. IEEE, 2001.
[13]M. Frigo, C. E. Leierson, and K. H. Randall, "The implementation of the Cilk-5 multi-threaded language", ACM Sigplan Notices. Vol. 33. No. 5. ACM, 1998.
[14]P. Evripidou and J. Gaudiot. "A Decoupled graph/computation Data-Driven architecture with variable resolution actors". No. CONF-900874-5. University of Southern California, Los Angeles, CA (United States). Dept. of Electrical Engineering, 1990.
[15]R. Giorgi, Z. Popovic, N. Puzovic, "DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems", Computer Architecture and High Performance Computing, 2007. SBAC-PAD 2007. 19th International Symposium on. IEEE, 2007.
[16]R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn, "Using OS observations to improve performance in multicore systems", IEEE micro 28.3 (2008): 54-66.
[17]Stavrou, Kyriakos, Paraskevas Evripidou, and Pedro Trancoso. "DDM-CMP: data-driven multithreading on a chip multiprocessor." Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer Berlin Heidelberg, 2005. 364-373.
[18]Stavrou, Kyriakos, et al. "Tflux: A portable platform for data-driven multithreading on commodity multicore systems." Parallel Processing, 2008. ICPP'08. 37th International Conference on. IEEE, 2008.
[19]S. Sridharan , G. Gupta , G.S. Sohi, "Holistic run-time parallelism management for time and energy efficiency", Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 2013.
[20]S. Campanoni, T. Jones, et al., "HELIX: automatic parallelization of irregular programs for chip multiprocessing", Proceedings of the Tenth International Symposium on Code Generation and Optimization. ACM, 2012.
[21]Trancoso, Pedro, Kyriakos Stavrou, and Paraskevas Evripidou. "DDMCPP: The data-driven multithreading C pre-processor." The 11th Workshop on Interaction between Compilers and Computer Architectures. 2007.
[22]Wang, Wei, et al. "REEact: a customizable virtual execution manager for multicore platforms." ACM SIGPLAN Notices. Vol. 47. No. 7. ACM, 2012.
[23]Wei Wang, et al, "Performance analysis of thread mappings with a holistic view of the hardware resources", Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on. IEEE, 2012.
[24]Z. Zhang , D.S. Katz , M. Ripeanu , M. Wilde , I.T. Foster, "AME: an anyscale many-task computing engine", Proceedings of the 6th workshop on Workflows in support of large-scale science. ACM, 2011.
[25]Blaise B., "Introduction to Parallel Computing", Lawrence Livermore National Laboratory
[26]Quinn, Michael J., "Parallel Programming in C with MPI and OpenMP", McGrawHill
[27]Die.Net linux main pages. http://linux.die.net/man/
[28]GCC, the GNU Compiler Collection. http://gcc.gnu.org/
[29]IEEE. Threads Extension for Portable Operating Systems (Draft 6), February 1992. P1003.4a/D6.
[30]OpenMP: Simple, portable, scalable SMP programming. http://www.openmp.org, 2006.
[31]IBM DeveloperWorks. http://www.ibm.com/developerworks/
[32]Linux Kernel Document. https://www.kernel.org/
[33]Linux kernel profiling with perf. https://perf.wiki.kernel.org/index.php/Tutorial
[34]SLOC Counter. http://www.dwheeler.com/sloccount/