| 研究生: |
黃翔 Huang, Xiang |
|---|---|
| 論文名稱: |
平行任務產生器的設計與實現 Design and Implementation of A Parallel Tasks Generator |
| 指導教授: |
周哲民
Jou, Jer-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 中文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 循序程式平行化 、平行萃取 、資料驅動多執行緒 、訊息傳遞介面 |
| 外文關鍵詞: | Parallel Execution of Sequential Code, Parallel Extraction, Data-Driven Multithreading, Message Passing Interface |
| 相關次數: | 點閱:139 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多核心系統的興起促使程式設計師必須考慮循序程式中指令之間的關聯性,因為只有在循序程式能夠被分割且分配至多個核心上執行時,才能夠妥善地利用多核心系統的資源。在靜態任務分配時,程式設計者只能找出任務間明顯的相依關係,然而若在程式中隱藏著大量的相依關係時,將使得系統無法有效地利用系統資源;而編譯階段的相依性偵測將會無法應付多核心環境中的動態變化且降低程式的執行效率。為了使循序程式在眾多的軟體與硬體環境中得以有效地平行執行,我們提出一個於多核心系統上運行的平行任務產生器,透過動態的萃取、通訊與同步分析,將循序程式轉換成適用於平行計算的任務同步圖,以此作為後續在各個平行運算模型下動態配置的依據,近一步來提升系統整體的效能。而實驗的結果顯示平行任務產生器的效率在4~8個執行緒時可達循序執行時的2倍;而所產生的任務同步圖在多核心系統上能提高約1.37倍的執行速度;在計算機叢集上,能在僅花費額外33%的通訊時間下提高約1.3倍任務執行的速度。
The rise of multicore system forces programmers to consider dependences among instructions in the sequential codes, because a sequential code can use multicore resources appropriately only when it can be seperated as the instructions which could be assigned to the cores. Programmers can only observe obvious dependences between tasks statically, however, the sequential codes which hid lots of dependences can not use the system resources efficiently. Compiled-time dependences detection can fail to be efficient and unable to account for dynamic changes in run-time. In order to make time-efficient paarallel execution of sequential codes in wide range of hardware and software, we propose a parallel tasks generator, which can generate the task synchronization graphs applicated in parallel computing by using extraction amd analization of communication and synchronization in run-time. Accroding to the task synchronization graphs, we can schedule tasks dynamically on the parallel computing models, and further promote the enntire system efficacy. Experimental results show that a parallel tasks generator used 4 to 8 threads could be executed with 2x faster than its sequential version. The generated task synchronization graphs could be executed with 1.37x faster than their sequential version on multicore system, and with 1.3x faster by only extra 33% communicaation penalty on computer clusters.
[1] Kobbe, S., Bauer, L., Lohmann, D., Schroder-Preikschat, W., Henkel, J. “DistRM: Distributed Resorce Management for On-Chip Many-Core Systems”. In: CODES+ISSS, 2011, pp. 119-128.
[2] Huang, Jialu, et al. "Automatically exploiting cross-invocation parallelism using runtime information." Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE, 2013.
[3] Anantpur, Jayvant, and R. Govindarajan. "Runtime dependence computation and execution of loops on heterogeneous systems." Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE, 2013.
[4] Raman, Arun, et al. "Parallelism orchestration using DoPE: the degree of parallelism executive." ACM SIGPLAN Notices 46.6 (2011): 26-37.
[5] 孫大群, 嚴義, 鄔惠峰, 梯形圖資料依賴關係分析與並行提取,杭州電子科技大學智慧與軟體技術研究所,2014。
[6] Vallerio, K. S. and Jha, N. K., “Task Graph Extraction for Embedded System Synthesis,”. In Proceedings of the 16th international Conference on VLSI Design, 480.
[7] Liu, A. and Dick, R. P. 2006, “Automatic run-time extraction of communication graphs from multithreaded applications,” In Proceedings of CODES+ISSS '06. ACM, New York, NY, 46-51.
[8] K. Ganeshpure, S. Kundu, “On Run-time Task Graph Extraction of SoC,” International SoC design Conference, ISOCC 2010.
[9] Sherwood, T., Perelman, E., and Calder, B. 2001. Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications. In Proceedings of PACT ‘01., 3-14.
[10] Denning, P. J. 1967. The working set model for program behavior. In Proceedings of. SOSP '67. ACM, New York, NY, 15.1-15.12.
[11] Arvind and Gostelow, K. P. "The U-interpreter." Computer 15.2 (1982): 42-49.
[12] Amdahl, Gene M. "Validity of the single processor approach to achieving large scale computing capabilities." Proceedings of the April 18-20, 1967, spring joint computer conference. ACM, 1967.
[13] C. Kyriacou, P. Evripidou, and P. Trancoso. "Data-driven multithreading using conventional microprocessors." Parallel and Distributed Systems, IEEE Transactions on 17.10 (2006): 1176-1188.
[14] Foster Ian, "Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering", Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1995.
[15] Fiduccia, Charles M., and Robert M. Mattheyses. "A linear-time heuristic for improving network partitions." Design Automation, 1982. 19th Conference on. IEEE, 1982.
[16] Kernighan, B. W., and S. Lin. "An eflicient heuristic procedure for partitioning graphs." Bell system technical journal 49.2 (1970): 291-307.
[17] M.R. Guthaus et al., "MiBench: A free, commercially representative embedded benchmark suite," Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on. IEEE, 2001.
[18] Stavrou, Kyriakos, Paraskevas Evripidou, and Pedro Trancoso. "DDM-CMP: data-driven multithreading on a chip multiprocessor." Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer Berlin Heidelberg, 2005. 364-373.
[19] Stavrou, Kyriakos, et al. "Tflux: A portable platform for data-driven multithreading on commodity multicore systems." Parallel Processing, 2008. ICPP'08. 37th International Conference on. IEEE, 2008.
[20] S. Campanoni, T. Jones, et al., "HELIX: automatic parallelization of irregular programs for chip multiprocessing", Proceedings of the Tenth International Symposium on Code Generation and Optimization. ACM, 2012.
[21] Trancoso, Pedro, Kyriakos Stavrou, and Paraskevas Evripidou. "DDMCPP: The data-driven multithreading C pre-processor." The 11th Workshop on Interaction between Compilers and Computer Architectures. 2007.
[22] Blaise B., "Introduction to Parallel Computing", Lawrence Livermore National Laboratory.
[23] Quinn, Michael J., "Parallel Programming in C with MPI and OpenMP", McGrawHill.
[24] IEEE. Threads Extension for Portable Operating Systems (Draft 6), February 1992. P1003.4a/D6.
[25] OpenMP: Simple, portable, scalable SMP programming. http://www.openmp.org, 2006.
[26] 謝政宏. “資料驅動多執行緒之多核心系統資源管理器設計.” 成功大學電機工程學系學位論文(2014):1-126.
[27] P. Evripidou and J. Gaudiot. "A Decoupled graph/computation Data-Driven architecture with variable resolution actors". No. CONF-900874-5. University of Southern California, Los Angeles, CA (United States). Dept. of Electrical Engineering, 1990.