| 研究生: |
黃于鑄 Huang, Yu-Chu |
|---|---|
| 論文名稱: |
在CUDA系統上兩階段任務排程方法 A Two-Stage Task Scheduling Scheme on CUDA Systems |
| 指導教授: |
朱治平
Chu, Chih-Ping |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 中文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 圖形處理器 、CUDA 、排程 、裝箱問題 |
| 外文關鍵詞: | GPU, CUDA, Scheduling, Bin-Packing Problem |
| 相關次數: | 點閱:91 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
圖形處理器在目前已是一個重要的高速計算平台。在許多應用領域中,利用圖形處理器作為加速器來提高效能已是相當常見。然而在圖形處理器上的許多獨立的任務並沒有充分利用到圖形處理器的運算資源,所以如何排程任務可以充分且有效利用圖形處理器的運算資源是個重要研究的議題。本研究針對CUDA系統提出了一種兩階段任務排程方法,第一階段中,將任務分配的問題對應到裝箱問題,採用漸減先適演算法來解決此問題。在第二階段利用非同步傳輸與傳輸運算重疊的技巧去執行任務。結果顯示,由本研究所提出之兩階段排程方法比一般循序方法平均增速(Speedup)達到1.57,相較於Guevara et al.[9]所提出合併任務的方法平均增速達到1.2。
Recently graphics processing units (GPUs) have become an important high-performance computing platform. In many application fields, using GPU to accelerate computation has proven to be feasible. However, many independent tasks do not fully utilize the GPU resources, suggesting scheduling independent tasks is an important issue worth to be studied for platform with GPUs. This thesis proposes a two-stage task scheduling scheme on CUDA systems. In the first stage, the task allocation problem is mapped to the bin packing problem and the First-Fit Decreasing algorithm is selected to solve it. In the second stage, we use the asynchronous transfers and overlap transfers with computation to execute tasks. Base on the experimental results, we show our method is on average 1.57 times faster than the sequential method and 1.2 times faster than the merge kernel method proposed [9] by Guevara et al.
[1] 邱柏憲,多元服務物流轉運中心模糊車輛派遣之研究,國立雲林科技大學碩士論文,2001。
[2] AMD. ATI Stream. http://www.amd.com
[3] NVIDIA. CUDA. http://www.nvidia.com
[4] NVIDIA. CUDA Programming Guide Version 1.1 , 2007
[5] NVIDIA. CUDA Programming Guide Version 3.0 , 2010
[6] NVIDIA. CUDA Best Practices Guide Version 3.0 , 2010
[7] Wen-mei W. Hwu , NVIDIA CUDA 大量平行處理器程式設計 https://edu.nchc.org.tw/ , 2008
[8] Zhiyi Yang, Yating Zhu, Yong Pu. “Parallel Image Processing Based on CUDA,” Proceedings of International Conference on Computer Science and Software Engineering, 2008, pp.198-201.
[9] M. Guevara, C. Gregg, and S. K. “Enabling Task Parallelism in the CUDA Scheduler,” Proceedings of the Workshop on Programming Models for Emerging Architectures (PMEA), 2009, pp 69-76.
[10] A. Munshi. OpenCL. SIGGRAPH 2008: Beyond Programmable Shading Course, 2008.
[11] Shubhabrata Sengupta , Mark Harris , Yao Zhang , John D. Owens. “Scan primitives for GPU computing,” Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, August 04-05, 2007, pp.97-106.
[12] François Vanderbeck. “Computational Study of a Column Generation Algorithm for Bin Packing and Cutting Stock Problems, ” Mathematical Programming, Vol.86, 1999, pp.565-594
[13] A. K. Bhatia, M. Hazra and S. K. Basu, “Better-Fit Heuristic for One-Dimensional Bin-Packing Problem, ” Proceedings of the IEEE International Advance Computing Conference, 2009, pp.193-196.
[14] E. G. Coffman,Jr, G. Galambos, S. Martello, and D. Vigo, “Bin Packing Approximation Algorithms: Combinatorial Analysis, ” in Handbook of Combinatorial Optimization, D.-Z. Du and P. Pardalos, Eds. Kluwer Academic Publishers, 1998.
[15] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, Vol.28, Issue 2, 2008, pp. 39-55.
[16] J. Nickolls, I. Buck, M. Garland, and K. Skadron. “Scalable Parallel Programming with CUDA,” Queue 6, 2, 2008, pp. 40-53.
[17] N. Satish, M. Harris, and M. Garland. “Designing Efficient Sorting Algorithms for Manycore GPUs,” Proceedings of the 23rd IEEE Int’l Parallel & Distributed Processing Symposium, May 2009, pp.1-10.
[18] D. Cederman and P. Tsigas, “A Practical Quicksort Algorithm for Graphics Processors,” in Proc. European Symposium on Algorithms (ESA), ser. LNCS, vol. 5193, 2008, pp. 246-258.
[19] Mark Harris, Shubhabrata Sengupta, John D. Owens, “Parallel Prefix Sum (Scan) with CUDA,” In GPU Gems 3, Nguyen H., (Ed.). Addison Wesley, Aug. 2007.