簡易檢索 / 詳目顯示

研究生: 黃于鑄
Huang, Yu-Chu
論文名稱: 在CUDA系統上兩階段任務排程方法
A Two-Stage Task Scheduling Scheme on CUDA Systems
指導教授: 朱治平
Chu, Chih-Ping
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 46
中文關鍵詞: 圖形處理器CUDA排程裝箱問題
外文關鍵詞: GPU, CUDA, Scheduling, Bin-Packing Problem
相關次數: 點閱:91下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 圖形處理器在目前已是一個重要的高速計算平台。在許多應用領域中,利用圖形處理器作為加速器來提高效能已是相當常見。然而在圖形處理器上的許多獨立的任務並沒有充分利用到圖形處理器的運算資源,所以如何排程任務可以充分且有效利用圖形處理器的運算資源是個重要研究的議題。本研究針對CUDA系統提出了一種兩階段任務排程方法,第一階段中,將任務分配的問題對應到裝箱問題,採用漸減先適演算法來解決此問題。在第二階段利用非同步傳輸與傳輸運算重疊的技巧去執行任務。結果顯示,由本研究所提出之兩階段排程方法比一般循序方法平均增速(Speedup)達到1.57,相較於Guevara et al.[9]所提出合併任務的方法平均增速達到1.2。

    Recently graphics processing units (GPUs) have become an important high-performance computing platform. In many application fields, using GPU to accelerate computation has proven to be feasible. However, many independent tasks do not fully utilize the GPU resources, suggesting scheduling independent tasks is an important issue worth to be studied for platform with GPUs. This thesis proposes a two-stage task scheduling scheme on CUDA systems. In the first stage, the task allocation problem is mapped to the bin packing problem and the First-Fit Decreasing algorithm is selected to solve it. In the second stage, we use the asynchronous transfers and overlap transfers with computation to execute tasks. Base on the experimental results, we show our method is on average 1.57 times faster than the sequential method and 1.2 times faster than the merge kernel method proposed [9] by Guevara et al.

    摘要........................................................iii Abstract ...................................................iv 目錄........................................................vi 圖目錄......................................................viii 表目錄......................................................x 第一章 緒論................................................1 1.1 研究背景與動機......................................1 1.2 研究目的............................................2 1.3 論文架構............................................2 第二章 研究背景............................................3 2.1 計算統一架構(Compute Unified Device Architecture)...3 2.1.1 CUDA簡介............................................3 2.1.2 CUDA硬體架構........................................3 2.1.3 CUDA程式架構........................................7 2.1.4 CUDA記憶體架構......................................11 2.2 裝箱問題(Bin Packing Problem).......................13 2.2.1 問題描述............................................13 2.2.2 相關演算法..........................................15 2.2.3 演算法比較..........................................18 2.3 CUDA排程............................................20 第三章 兩階段任務排程方法..................................26 3.1 方法概述............................................26 3.2 第一階段............................................27 3.3 第二階段............................................29 第四章 實驗結果與討論......................................35 4.1 實驗環境............................................35 4.2 實驗結果及探討......................................36 第五章 結論與未來工作......................................44 參考文獻....................................................45

    [1] 邱柏憲,多元服務物流轉運中心模糊車輛派遣之研究,國立雲林科技大學碩士論文,2001。
    [2] AMD. ATI Stream. http://www.amd.com
    [3] NVIDIA. CUDA. http://www.nvidia.com
    [4] NVIDIA. CUDA Programming Guide Version 1.1 , 2007
    [5] NVIDIA. CUDA Programming Guide Version 3.0 , 2010
    [6] NVIDIA. CUDA Best Practices Guide Version 3.0 , 2010
    [7] Wen-mei W. Hwu , NVIDIA CUDA 大量平行處理器程式設計 https://edu.nchc.org.tw/ , 2008
    [8] Zhiyi Yang, Yating Zhu, Yong Pu. “Parallel Image Processing Based on CUDA,” Proceedings of International Conference on Computer Science and Software Engineering, 2008, pp.198-201.
    [9] M. Guevara, C. Gregg, and S. K. “Enabling Task Parallelism in the CUDA Scheduler,” Proceedings of the Workshop on Programming Models for Emerging Architectures (PMEA), 2009, pp 69-76.
    [10] A. Munshi. OpenCL. SIGGRAPH 2008: Beyond Programmable Shading Course, 2008.
    [11] Shubhabrata Sengupta , Mark Harris , Yao Zhang , John D. Owens. “Scan primitives for GPU computing,” Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, August 04-05, 2007, pp.97-106.
    [12] François Vanderbeck. “Computational Study of a Column Generation Algorithm for Bin Packing and Cutting Stock Problems, ” Mathematical Programming, Vol.86, 1999, pp.565-594
    [13] A. K. Bhatia, M. Hazra and S. K. Basu, “Better-Fit Heuristic for One-Dimensional Bin-Packing Problem, ” Proceedings of the IEEE International Advance Computing Conference, 2009, pp.193-196.
    [14] E. G. Coffman,Jr, G. Galambos, S. Martello, and D. Vigo, “Bin Packing Approximation Algorithms: Combinatorial Analysis, ” in Handbook of Combinatorial Optimization, D.-Z. Du and P. Pardalos, Eds. Kluwer Academic Publishers, 1998.
    [15] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, Vol.28, Issue 2, 2008, pp. 39-55.
    [16] J. Nickolls, I. Buck, M. Garland, and K. Skadron. “Scalable Parallel Programming with CUDA,” Queue 6, 2, 2008, pp. 40-53.
    [17] N. Satish, M. Harris, and M. Garland. “Designing Efficient Sorting Algorithms for Manycore GPUs,” Proceedings of the 23rd IEEE Int’l Parallel & Distributed Processing Symposium, May 2009, pp.1-10.
    [18] D. Cederman and P. Tsigas, “A Practical Quicksort Algorithm for Graphics Processors,” in Proc. European Symposium on Algorithms (ESA), ser. LNCS, vol. 5193, 2008, pp. 246-258.
    [19] Mark Harris, Shubhabrata Sengupta, John D. Owens, “Parallel Prefix Sum (Scan) with CUDA,” In GPU Gems 3, Nguyen H., (Ed.). Addison Wesley, Aug. 2007.

    下載圖示 校內:2012-08-19公開
    校外:2012-08-19公開
    QR CODE