| 研究生: |
蔡宗蕙 Tsai, Zong-Hui |
|---|---|
| 論文名稱: |
使用任務發派管理及基於強化學習之動態電壓及頻率調整之高能源效率通用計算圖形處理器 An Energy-Efficient GPGPU using Task Management and DVFS Based on Reinforcement Learning |
| 指導教授: |
邱瀝毅
Chiou, Lih-Yih |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 圖形處理器功率管理 、動態電壓頻率調整 、圖形處理器工作分配 、強化學習 |
| 外文關鍵詞: | GPGPU power management, dynamic voltage and frequency scaling, task dispatch, reinforcement learning |
| 相關次數: | 點閱:92 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著終端運算市場蓬勃發展,圖型處理器(GPU)高功耗和高能量消耗的帶來的問題也逐漸變得不可忽視,起因於這些裝置無額外冷卻機制且使用壽命及可靠度取決於電池容量。然電池容量受到材料化學特性的限制下發展緩慢,提升能量的使用效率成為更實際的解法。
我們使用double deep Q-learning動態調整GPGPU的電壓和頻率降低能量消耗,並調控各個串流處理器(streaming processor; SM)的工作數量緩解快取記憶體資源搶佔的問題以提升性能。我們的方法不僅可以提升18%的性能,還可以省下59%的能量消耗。此外無論兩種極端特性的程式如何排列,皆可以低於5%的性能損失節省53%的能量消耗。
GPGPU power and energy issues arise with expansion of edge-computing markets, because there are no embedded cooling devices in these devices and their lifetime and reliability is limited by battery capacity. Since capacity is restricted by chemical characteristics of battery materials, increasing energy-efficiency is a much more practical solution.
We propose a dynamic voltage and frequency scaling mechanism based on reinforcement learning at GPGPU runtime for energy-saving. Furthermore, we reduce cache contention by tuning the number of tasks in every SM to boost performance. With various benchmarks, the mechanism achieves 59% energy savings with 18% performance improvement on average. Moreover, the model shows strong adaptability when running programs interleaved by compute-intensive and memory-intensive benchmarks. The energy-saving is more than 53% with less than 5% performance loss for all sequences under experiments.
[1] A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew, "Deep Learning with COTS HPC Systems," in Proc. International Conference on Machine Learning, 2013, pp. 1337-1345.
[2] R. Raina, A. Madhavan, and A. Y. Ng, "Large-Scale Deep Unsupervised Learning Using Graphics Processors," in Proc. 26th Annual International Conference on Machine Learning, 2009, pp. 873-880.
[3] S. Shi, Q. Wang, P. Xu, and X. Chu, "Benchmarking state-of-the-art deep learning software tools," in Proc. International Conference on Cloud Computing and Big Data (CCBD), 2016, pp. 99-104.
[4] M. McNaughton, C. Urmson, J. M. Dolan, and J.-W. Lee, "Motion Planning for Autonomous Driving with a Conformal Spatiotemporal Lattice," in Proc. IEEE International Conference on Robotics and Automation, 2011, pp. 4889-4895.
[5] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, and P. Li, "Energy-Efficient Scheduling for Real-Time Systems Based on Deep Q-Learning Model," IEEE Trans. on Sustainable Computing, vol. 4, no. 1, pp. 132-141, 2019.
[6] S. F. Region. "Edge Computing Market Size, Share & Trends Analysis Report By Component." https://www.grandviewresearch.com/industry-analysis/edge-computing-market (accessed Oct. 12, 2020).
[7] M. Gebhart et al., "Energy-Efficient Mechanisms for Managing Thread Context in Throughput Processors," in Proc. 38th Annual International Symposium on Computer Architecture (ISCA), 2011, pp. 235-246.
[8] M. Gebhart, S. W. Keckler, B. Khailany, R. Krashinsky, and W. J. Dally, "Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor," in Proc. 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012, pp. 96-106.
[9] S. Hong and H. Kim, "An Integrated GPU Power and Performance Model," in Proc. 37th Annual International Symposium on Computer Architecture, Saint-Malo, France, 2010, pp. 280–289.
[10] Y. Wang and N. Ranganathan, "An Instruction-Level Energy Estimation and Optimization Methodology for GPU," in Proc. IEEE 11th International Conference on Computer and Information Technology, 2011, pp. 621-628.
[11] G. Wang, Y. Lin, and W. Yi, "Kernel Fusion: an Effective Method for Better Power Efficiency on Multithreaded GPU," in Proc. IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, 2010, pp. 344–350.
[12] Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "Fixing Performance Bugs: an Empirical Study of Open-Source GPGPU Programs," in Proc. 41th International Conference on Parallel Processing, 2012, pp. 329-339.
[13] Y. Jiao, H. Lin, P. Balaji, and W. Feng, "Power and Performance Characterization of Computational Kernels on the GPU," in Proc. IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, 2010, pp. 221-228.
[14] K. Ma, X. Li, W. Chen, C. Zhang, and X. Wang, "GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures," in Proc. International Conference on Parallel Processing, 2012, pp. 48-57.
[15] X. Mei, L. S. Yung, K. Zhao, and X. Chu, "A Measurement Study of GPU DVFS on Energy Conservation," in Proc. Workshop on Power-Aware Computing and Systems, Farmington, Pennsylvania, 2013, pp. 89-100.
[16] A. Sethia and S. Mahlke, "Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution," in Proc. 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014, pp. 647-658.
[17] Q. Jiao, M. Lu, H. P. Huynh, and T. Mitra, "Improving GPGPU Energy-Efficiency Through Concurrent Kernel Execution and DVFS," in Proc. IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2015, pp. 1-11.
[18] J. Li, B. Guo, Y. Shen, D. Li, and Y. Huang, "A Modeling Approach for Energy Saving Based on GA-BP Neural Network," Journal of Electrical Engineering and Technology, vol. 11, pp. 1289-1298, Sep. 2016.
[19] A. Majumdar, L. Piga, I. Paul, J. L. Greathouse, W. Huang, and D. H. Albonesi, "Dynamic GPGPU Power Management Using Adaptive Model Predictive Control," in Proc. IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017, pp. 613-624.
[20] K. Fan, B. Cosenza, and B. Juurlink, "Predictable GPUs Frequency Scaling for Energy and Performance," in Proc. 48th International Conference on Parallel Processing, Kyoto, Japan, 2019, pp. 1-10.
[21] J. Guerreiro, A. Ilic, N. Roma, and P. Tomás, "DVFS-aware Application Classification to Improve GPGPUs Energy Efficiency," Parallel Computing, vol. 83, pp. 93-117, Apr. 2019.
[22] S. Dey, A. Singh, X. Wang, and K. McDonald-Maier, "User Interaction Aware Reinforcement Learning for Power and Thermal Efficiency of CPU-GPU Mobile MPSoCs," in Proc. Design, Automation, and Test in Europe 2020 (DATE 2020), Grenoble, France, 2020, pp. 1728-1733.
[23] A. Choudhary. "A Hands-On Introduction to Deep Q-Learning using OpenAI Gym in Python." https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/ (accessed 2020, Aug. 28).
[24] A. Oppermann. "Self Learning AI-Agents III:Deep (Double) Q-Learning." https://towardsdatascience.com/deep-double-q-learning-7fca410b193a (accessed 2020, Aug. 28).
[25] J.-T. Syu, "Intelligent Power Management Based on Reinforcement Learning for General-Purpose Computing on Graphics Processing Units," Master's Thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, 2019.
[26] W.-S. Hsieh, "Micro-Architecture Optimization of HSA-Compatible GPU," Master's Thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, 2016.
[27] J.-H. Jheng, "Design of Cycle-accurate SIMT Core and Implementation," Master's Thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, 2018.
[28] DDR4 SDRAM, J. S. S. T. Association, 2012. [Online]. Available: http://www.softnology.biz/pdf/JESD79-4_DDR4_SDRAM.pdf?fbclid=IwAR1doOEVrs8Y6UK3JsNjaiUs9gvu1OhXSxNVUjVbKFAc8kxROPRAAN8T58A
[29] C.-H. Chang, "An Intelligent Thermal Control Method for General-Purpose Graphic Processing Units," Master's Thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, 2020.
[30] C.-H. Chiang, "Design Exploration of a Lightweight Hardware Secure Microsystem using Physical Unclonable Function at Electronic System Level," Master's Thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, 2018.
[31] C.-W. Chang, "A Nonvolatile Processor Virtual Platform for Energy Efficiency and Performance Exploration," Master's Thesis, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, 2018.
[32] J. Shi, Y.-C. Hsu, E. Soenen, A. Roth, and J. Gaither, "A Wide-Range DC/DC Converter with 2nd Order Digital Compensation and Direct Battery Connection in 40nm CMOS," in Proc. Custom Integrated Circuits Conference, 2011, pp. 1-4.
[33] Y. Hengzhou, G. Yang, and M. Zhuo, "A 40nm/65nm Process Adaptive Low Jitter Phase-Locked Loop," in Proc. International Symposium on Integrated Circuits (ISIC), 2014, pp. 500-503.