簡易檢索 / 詳目顯示

研究生: 楊朝凱
Yang, Chao-Kai
論文名稱: 適用通用計算圖形處理器架構之動態電壓及頻率調整之資料流量感知動態功率管理
A Data-Traffic Aware Dynamic Power Management for General-Purpose Computing on Graphics Processing Units with Dynamic Voltage and Frequency Scaling
指導教授: 邱瀝毅
Chiou, Lih-Yih
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 44
中文關鍵詞: 通用計算圖形處理器動態電壓頻率調整能源效率動態功率管理
外文關鍵詞: GPGPU, Dynamic Voltage and Frequency Scaling (DVFS), Energy Efficiency, Power Management
相關次數: 點閱:112下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著現今機器學習、人工智慧技術的發展,通用計算圖形處理器技術已經被廣泛運用在各種應用層面。而這樣的技術就是仰賴圖形處理器比一般中央處理器提供了更大量的運算單元以及非常高的記憶體頻寬來滿足大量的平行運算需求。例如NVIDIA於2016年所發布的TITAN X繪圖卡就提供了3584個CUDA核心以及480GB/s的記憶體頻寬,這樣的配置提供了大量的運算能力,而同時衍生的問題就是帶來大量的功率消耗及熱能。所以當硬體規格持續成長下去所伴隨的功率消耗持續上升時,如何更有效的運用硬體資源來控制功率消耗成為一個需要解決的問題。
    本論文提出可適用於通用計算圖形處理器架構之自適性主動式動態電壓及頻率的動態功率管理來解決所面臨的能源效率以及功率消耗的問題,以主動方式偵測圖形處理器在不同應用程式下的硬體使用狀況,調整不同硬體層級之間的操作電壓及頻率來達到降低整體晶片功率消耗的目標。在此方法下,可以減少約15%的功率消耗,同時將效能付出控制在5%之內,而由於通用計算圖形處理器功率消耗等級都在數百瓦,因此本論文之方法可以節省數十瓦,為相當可觀之功率消耗。

    As the development of machine learning and artificial intelligent (AI) technology, general purpose computation on graphics processing units (GPGPUs) has been widely used on various applications. It is because these applications rely on more computation unit and memory bandwidth provided by GPGPUs than those by the central processing unit (CPU). For example, NVIDIA released the TITAN X GPU with 3584 CUDA cores and 480GB/s memory bandwidth in 2016. This specification brought a lot of arithmetic capability, but also have caused huge power consumption. As the hardware developing rapidly, the power efficiency becomes an important issue.

    We propose an adaptive power management scheme for GPGPU to improve the power efficiency and reduce the unnecessary power consumption. We proactively detect activities of hardware resource dynamically when running to select appropriate voltage and frequency of each component. We can reduce power consumption by about 15%, which is about dozens of watt in GPGPUs with under 5% performance overhead, when compared with the baseline approach.

    摘 要 i 致 謝 vi 目錄 vii 表目錄 ix 圖目錄 x 第 1 章 緒論 1 1.1 研究背景 1 1.1.1 通用計算圖形處理器 1 1.1.2 高功率消耗之影響 1 1.1.3 動態功率管理概觀 4 1.2 研究動機 5 1.3 研究貢獻 6 1.4 論文架構 6 第 2 章 相關研究背景 7 2.1 圖形計算圖形處理器之架構探勘 7 2.2 操作電壓和頻率對於功率消耗和效能之影響 9 第 3 章 相關文獻探討 11 3.1 圖形處理器架構動態功率之分析 12 3.2 GPGPU動態管理 14 3.2.1 GPUWattch 14 3.2.2 Equalizer 17 3.2.3 Adaptive Model Predictive Control 19 第 4 章 自適性主動式動態電壓及頻率的動態功率管理 23 4.1 問題描述 23 4.2 主動式管理機制 23 4.2.1 動態硬體資料流量偵測 24 4.2.2 動態電壓頻率之調整 29 4.2.3 效能付出控制 31 第 5 章 實驗結果與分析 33 第 6 章 結論和未來工作 39 6.1 結論 39 6.2 未來工作 39 參考文獻 40

    [1] S. Liu, Jingyi Zhang, Qing Wu and Qinru Qiu, "Thermal-aware job allocation and scheduling for three dimensional chip multiprocessor," in Proc. 11th International Symposium on Quality Electronic Design, pp. 390-398, 2010.
    [2] A. H. Ajami, K. Bnerjee, M. Pedram and L. P. P. P. van Ginneken, "Analysis of non-uniform temperature-dependent interconnect performance in high performance ICs," in Proc. 38th Design Automation Conference, pp. 567-572, 2001.
    [3] A. H. Ajami, K. Banerjee and M. Pedram, "Non-uniform chip-temperature dependent signal integrity," in Proc. Symposium on VLSI Technology. Digest of Technical Papers, pp. 145-146, 2001.
    [4] A. T. Winther, Wei Liu, A. Nannarelli and S. Vrudhula, "Temperature dependent wire delay estimation in floorplanning," in Proc. NORCHIP, pp. 1-4, 2011.
    [5] M. Pedram and S. Nazarian, "Thermal Modeling, Analysis, and Management in VLSI Circuits: Principles and Methods," in Proc. IEEE, vol. 94, no. 8, pp. 1487-1501, 2006.
    [6] N. S. Kim et al., “Leakage current: Moore’s law meets static power,” in Proc. IEEE Computer, pp. 68–75, 2003.
    [7] J. Lee, V. Sathisha, M. Schulte, K. Compton and N. S. Kim, "Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling," in Proc. International Conference on Parallel Architectures and Compilation Techniques, pp. 111-120, 2011
    [8] A. Majumdar, L. Piga, I. Paul, J. L. Greathouse, W. Huang and D. H. Albonesi, "Dynamic GPGPU Power Management Using Adaptive Model Predictive Control," in Proc. IEEE International Symposium on High Performance Computer Architecture, pp. 613-624, 2017.
    [9] A. Sethia and S. Mahlke, "Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution," in Proc. 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 647-658, 2014.
    [10] J. Leng, et al. “GPUWattch: enabling energy optimizations in GPGPUs,” in Proc. IEEE International Symposium on Computer Architecture, pp. 487–498, 2013
    [11] G. Dhiman, K. K. Pusukuri, and T. Rosing, “Analysis of dynamic voltage scaling for system level energy management,” in Proc. Power aware Comput. Syst., p. 9, 2008.
    [12] Q. Wang and X. Chu, “GPGPU Performance Estimation with Core and Memory Frequency Scaling,” 2017. [Online] Avalible: https://arxiv.org/abs/1701.05308
    [13] M. Lee et al., "Improving GPGPU resource utilization through alternative thread block scheduling," in Proc. IEEE 20th International Symposium on High Performance Computer Architecture, pp. 260-271, 2014.
    [14] M. Awatramani, J. Zambreno and D. Rover, "Increasing GPU throughput using kernel interleaved thread block scheduling," in Proc. IEEE 31st International Conference on Computer Design (ICCD), pp. 503-506, 2013.
    [15] H. Zhang, M. Putic and J. Lach, "Low power GPGPU computation with imprecise hardware," in Proc. 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1-6, 2014.
    [16] G. Singla, G. Kaur, A. K. Unver and U. Y. Ogras, "Predictive dynamic thermal and power management for heterogeneous mobile platforms," in Proc. Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 960-965, 2015.
    [17] R. Nath, R. Ayoub and T. S. Rosing, "Temperature aware thread block scheduling in GPGPUs," in Proc. 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1-6, 2013.
    [18] X. Mei, L. S. Yung, K. Zhao, and X. Chu, “A measurement study of GPU DVFS on energy conservation,” in Proc. Workshop on Power-Aware Computing and Systems, pp. 1–5, 2013.
    [19] H. K. Mondal, G. N. S. Harsha and S. Deb, "An Efficient Hardware Implementation of DVFS in Multi-core System with Wireless Network-on-Chip," in Proc. IEEE Computer Society Annual Symposium on VLSI, pp. 184-189, 2014.
    [20] S. Hong and H. Kim, “An integrated gpu power and performance model,” in Proc. 37th annual International Symposium on Computer Architecture, pp. 280–289, 2010.
    [21] A. Marmin, C. H. Lai, H. Tago, H. L. Huang and J. M. Lu, "Architecture agnostic energy model for GPU-based design," in Proc. International Symposium on VLSI Design, Automation and Test (VLSI-DAT), pp. 1-4, 2016.
    [22] I. Lin, B. Jeff and I. Rickard, "ARM platform for performance and power efficiency — Hardware and software perspectives," in Proc. International Symposium on VLSI Design, Automation and Test (VLSI-DAT), pp. 1-5, 2016.
    [23] O. Naji, C. Weis, M. Jung, N. Wehn and A. Hansson, "A high-level DRAM timing, power and area exploration tool," in Proc. International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 149-156, 2015.
    [24] R. Ge, R. Vogt, J. Majumder, A. Alam, M. Burtscher and Z. Zong, "Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU," in Proc. 42nd International Conference on Parallel Processing, pp. 826-833, 2013.
    [25] I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia,” in Proc. 42nd Annu. Int. Symp. Comput. Archit. - ISCA ’15, pp. 54–65, 2015.

    [26] J. V. Escamilla, M. R. Casu and J. Flich, "Increasing the Efficiency of Latency-Driven DVFS with a Smart NoC Congestion Management Strategy," in Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC), pp. 241-248, 2016.
    [27] G. Wang, Y. Lin and W. Yi, "Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU," in Proc. IEEE/ACM International Conference on Cyber, Physical and Social Computing (CPSCom), pp. 344-350, 2010.
    [28] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proc. 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 469-480, 2009.
    [29] NVIDIA Corporation, “NVIDIA Tesla P100 Whitepaper,” p. 45, 2016. [Online]. Avalible: https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
    [30] Adwait Jog, Onur Kayiran, et al. “OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance,” in Proc. 21th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 395–406, 2013.
    [31] T. Komoda, S. Hayashi, T. Nakada, S. Miwa and H. Nakamura, "Power capping of CPU-GPU heterogeneous systems through coordinating DVFS and task mapping," in Proc. IEEE 31st International Conference on Computer Design (ICCD), pp. 349-356, 2013.
    [32] Y. Wang, S. Roy and N. Ranganathan, "Run-time power-gating in caches of GPUs for leakage energy savings," in Proc. Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 300-303, 2012.
    [33] Y. Wen, Z. Wang and M. F. P. O'Boyle, "Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms," in Proc. 21st International Conference on High Performance Computing (HiPC), pp. 1-10, 2014.
    [34] Ali Bakhoda, George Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt,” Analyzing CUDA Workloads Using a Detailed GPU Simulator,” in Proc. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2009.

    無法下載圖示 校內:2022-09-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE