| 研究生: |
陳世聰 Chen, Shih-Tsung |
|---|---|
| 論文名稱: |
應用編譯器最佳化技術於低功率嵌入式系統設計之研究 The Study of Applying Compiler Optimization Techniques in Low-Power Embedded System Design |
| 指導教授: |
陳 敬
Chen, Jing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 107 |
| 中文關鍵詞: | 嵌入式系統 、編譯器 、最佳化 、低功率 |
| 外文關鍵詞: | Embedded Systems, Compiler, Optimizations, Low Power |
| 相關次數: | 點閱:73 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在二十一世紀重視環保與節能的年代中,降低能量消耗能成為重要的議題。行動裝置是嵌入式系統最重要的應用範圍之一,電池的續航力與系統的功率消耗是影響行動裝置使用時間的關鍵因素,低功率的硬體與軟體成為嵌入式系統設計的重點。本論文主要研究將現有GCC編譯器的最佳化參數分類與組合並且配合實作迴圈展開與合併演算法以減低軟體程式在嵌入式系統執行時的能量消耗。
GCC提供若干最佳化參數以產生有效率之目標程式碼,並將部份最佳化參數組合為{-O1, -O2, -O3}以方便使用。本論文依據最佳化參數的功能與性質以系統化方法將其分類,目的是找出能夠更有效地產生能量消耗較低的目標程式碼之分類組合。經由SimpleScalar模擬器執行測試程式的結果顯示,影響能量消耗因素之最佳化參數組合確實存在,本論文將其中效果較為顯著之最佳化參數歸納組合成為”-OE”參數並且加入為GCC之新功能。基於將巢狀迴圈展開後再合併可以增進指令管線化與快取(Cache)命中率故能減少程式消耗的能量,而且將展開的迴圈有效地合併之後可較單純迴圈展開減少更多的能量消耗,本論文亦對GCC實作迴圈展開與合併之目標程式最佳化演算法,並將其加入為最佳化參數”-funroll-and-jam”以增進GCC之最佳化功能。
藉由對GCC最佳化參數組合的研究結果與使用模擬器驗證,應用最佳化參數”-OE”與”-funroll-and-jam”編譯後之目標程式碼確實可達到減少能量消耗之效果。”-OE”可以提供程式設計師在編譯低功率嵌入式系統的程式時有迅速且簡易的參考依據,且無需修改GCC編譯器。使用本論文實作之”-funroll-and-jam”最佳化參數可以有效減少程式執行時之能量消耗。整體而言,應用本論文之研究成果將可協助延長低功率嵌入式系統之電池使用時間。
Due to the trend of eco-awareness and environment friendliness, how to reduce energy consumption has become a worldwide issue. Mobile devices are the essential application arena of embedded systems and the lifetime of battery, which is directly impacted by energy consumption, is an important factor in their success. This thesis presents an approach to achieving generation of energy-saving object code through leveraging the optimization parameters of the popular GCC compiler. The motivation is based on the observation that the proper combination of optimization parameters can guide GCC to generate object code that consumes less energy at runtime. In addition, this thesis presents an implementation of loop unroll-and-jam algorithm on GCC to enhance its capability of generating energy-saving object code.
GCC provides a number of optimization parameters which, when applied, might impose effect directly or indirectly upon the efficiency of generated code. In addition, some optimization parameters are grouped into optimization switches, namely {-O1, -O2, -O3}, for the convenience of usage. In the work of this thesis, the optimization parameters are classified according to their functionality and property in order to find a group suitable, in general case, for energy-saving optimization. The results from simulation using SimpleScalar indicate the existence of such grouping. A new switch “-OE” is introduced to apply the group most effective in generating energy-saving code. To enhance the capability of GCC in loop optimization which, in most cases, can increase instruction pipelining and cache hit ratio, the loop unroll-and-jam algorithm is implemented on GCC. The result is represented by the new optimization parameter “-funroll-and-jam”.
Based on the research on the optimization parameters of GCC and verifying the results using simulation, compiling programs with the optimization parameters “-OE” and “-funroll-and-jam” indeed can generate object program code with less energy consumption. “-OE” can serve as a quick and simple reference for programmers and there is no need to modify GCC. The optimization parameter “-funroll-and-jam” can further reduce energy consumption at runtime. In general, applying the results of this thesis would help extend the battery lifetime for embedded system products.
[1]GCC manual, GCC online document, http://gcc.gnu.org/onlinedocs/.
[2]A. Koseki, H. Komastu, Y. Fukazawa, “A Method for Estimating Optimal Unrolling Times for Nested Loops“, Third International Symposium on Parallel Architectures, Algorithms, and Networks, Pages 376-382, December 1997.
[3]Vivek Sarkar, “Optimized Unrolling of Nested Loops”, International Journal of Parallel Programming, Vol. 29, No. 5, October 2001.
[4]Huzefa Mehta, Robert Michael Owens, Mary Jane Irwin, Rita Chen, Debashree Ghosh, “Techniques for Low Energy Software”, Proceedings of the 1997 International Symposium on Low Power Electronics and Design, August 1997.
[5]Srikanth Kurra, Neeraj Kumar Singh, Preeti Ranjan Panda, “The Impact of Loop Unrolling on Controller Delay in High Level Synthesis”, Proceedings of the Conference on Design, Automation and Test in Europe, April 2007.
[6]Randy Allen, Ken Kennedy, “Automatic Loop Inter change”, 20 Years (1979-1999) of the ACM/SIGPLAN Conference on Programming Language Design and Implementation, A Selection, ACM, 2003.
[7]Meilin Liu, Qingfeng Zhuge, Zili Shao, Edwin H.-M. Sha, “General Loop Fusion Technique for Nested Loops Considering Timing and Code Size”, Fifth International Conference on Compiler Architecture and Synthesis for Embedded Systems(CASES’04), Washington DC, USA, September 2004.
[8]Saeed Parsa, Shahriar Lotfi, “A New Genetic Algorithm for Loop Tiling”, The Journal of Supercomputing, Vol. 37, Pages 249-269, June 2006.
[9]Preeti Ranjan Panda, Hiroshi Nakamura, Nikil D. Dutt, Alexandru Nicolau, “Augmenting Loop Tiling with Data Alignment for Improved Cache Performance”, IEEE Transactions on Computers, Vol. 48, No. 2, February 1999.
[10]Chandan Kumar Behera, Pawan Kumar, “An Improved Algorithm for Loop Dead Optimization”, ACM Sigplan Notices, Vol. 40, November 2005.
[11]Xianglong Huang, Steve Carr, Philip Sweany, “Loop Transformations for Architectures with Partitioned Register Banks”, LCTES 2001, Snowbird, Utah,
USA, ACM, July 2001.
[12]Sissades Tongsima, Edwin H.-M. Sha, Chantana Chantrapornchai, David R, Surma, Nelson Luiz Passos, “Probabilistic Loop Scheduling for Applications with Uncertain Execution Time”, IEEE Transactions on Computers, Vol. 49, Pages 1, January 2000.
[13]Qubo Hu, Arnout Vandecappelle, Martin Palkovic, Per Gunnar Kjeldsberg, Erik Brockmeyer, Francky Catthoor, “Hierarchical Memory Size Estimation for Loop Fusion and Loop Shifting in Data-Dominated Applications”, Asia and South Pacific Conference on Design Automation, Pages 6, 2006.
[14]Mahmut Kandemir, N. Vijaykrishnan, Mary Jane Irwin, “Compiler Optimizations for Low Power Systems”, Kluwer Academic Publisher, January 2002.
[15]L. N. Chakrapani, P. Korkmaz, V. J. Mooney III, K. V. Palem, K. Puttaswamy, W. F. Wong, “The Emerging Power Crisis in Embedded Processors: What can a poor compiler do?”, International Conference on Compiler Architecture and Synthesis for Embedded Systems(CASES’01), Atlanta, Georgia, USA, November 2001.
[16]Ulrich Kremer, “Low Power/Energy Compiler Optimizations”, Conference on Power Aware Computing, January 2002.
[17]I. Kadayif, M. Kandemir, G. Chen, N. Vijykrishnan, M. J Irwin, A. Sivasubramaniam, “Compiler-Directed High-Level Energy Estimation and Optimization”, ACM Transactions on Embedded Computing Systems, Vol. 4, No. 4, Pages 819-850, November 2005.
[18]Chingren Lee, Jenq Kuen Lee, Tingting Hwang, Shi-Chen Tsai, “Compiler Optimization on VLIW Instruction Scheduling for Low Power”, ACM Transactions on Design Automation of Electronic Systems, Vol. 8, No. 2, Pages 252-268, April 2003.
[19]Yi-Ping You, Chingren Lee, and Jenq Kenq Lee, “Compilers for Leakage Power Reduction”, ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, Pages 147-164, January 2006.
[20]Stefan Valentin Gheorghita, Henk Corporaal, and Twan Basten, “Iterative Compilation for Energy Reduction”, Journal of Embedded Computing, Vol. 1, Issue 4, Pages 509-520, 2005.
[21]Mahmut Kandemir, N. Vijaykrishnan, Mary Jane Irwin, and Wu Ye, “Influence of Compiler Optimizations on System Power”, IEEE Trancactions on Very Large Scale Integration (VLSI) Systems, Vol. 9, No. 6, December
2001.
[22]David A. Ortiz, Nayda G. Santiago, “High-Level Optimization for Low Power Consumption on Microprocessor-Based Systems”, Symposium on Circuit and Systems, Pages 1265-1268, August 2007.
[23]John S. Seng, Dean M. Tullsen, “The Effect of Compiler Optimizations on Pentium 4 Power Consumption”, the 7th Annual Workshop on Interaction between Compilers and Computer Architectures, February 2003.
[24]M. Kandemir, J. Ramanujam, U. Sezer, “Improving the Energy Behavior of Block Buffering Using Compiler Optimizations”, ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, Pages 228-250, January 2006.
[25]Rajiv A. Ravindran1, Pracheeti D. Nagarkar1, Ganesh S. Dasika1, “Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache”, the International Symposium on Code Generation and Optimization, October 2005.
[26]Srikanth Kurra, Neeraj Kumar Singh, Preeti Ranjan Panda, “The Impact of Loop Unrolling on Controller Delay in High Level Synthesis”, Design Automation and Test in Europe Conference, April 2007.
[27]M. Haneda, P. M. W. Knijnenburg, H. AG. Wijshoff, “Generating New General Compiler Optimization Settings”, Proceedings of the 19th Annual International Conference on Supercomputing, June 2005.
[28]M. Haneda, P. M. W. Knijnenburg, H. AG. Wijshoff, “Generating New General Optimizing General Purpose Compiler Optimization”, CF’05, Ischia, Italy, May 2005.
[29]Zhelong Pan, Rudolf Eigenmann, “Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning”, International Symposium on Code Generation and Optimization, 2006.
[30]Zhelong Pan, Rudolf Eigenmann, “PEAK-A Fast and Effective Performance Tuning System via Compiler Optimization Orchestration”, ACM Transactions on Programming Languages and Systems,Vol. 30, No. 3, Article 17, May 2008.
[31]Zhelong Pan, Rudolf Eigenmann, “Rating Compiler Optimizations for Automatic Performance Tuning”, Proceedings of the ACM/IEEE Supercomputing 2004.
[32]John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael F.P, O’Boyle, Olivier Temam, “Rapidly Selecting Good Compiler Optimizations Using Performance Counters”, International Symposium on Code Generation and Optimization, 2007.
[33]Sorin Lerner, Todd Millstein, Craig Chambers, “Automatically Proving the Correctness of Compiler Optimizations”, Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI’03), San Diego, California, USA, June 2003.
[34]SimpleScalar manuals, http://www.simplescalar.com/.
[35]MiBench, http://www.eecs.umich.edu/mibench/.
[36]ARM, http://www.arm.com/.
[37]XSCALE, http://www.intel.com/design/intelxscale/.
[38]Zigbee, http://www.zigbee.org/en/index.asp.
[39]Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, “Compilers Principles Techniques, and Tools”, Pearson Education, Inc., USA, 1986.
[40]Sorceforge, http://sourceforge.net/.
[41]Jack W. Davidson, and Anne M. Holler, “Subprogram Inlining: A Study of its Effectson Program Execution Time”, IEEE Transaction on Sorfware Engineering , Vol. 18, Pages 2, February 1992.
[42]J. S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M. J. Irwin, “Scheduling Reusable Instructions for Power Reduction”, Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, 2004.
[43]Stefan Ciobaca and Liviu Ciortuz, “Learning to Unroll Loops Optimally”, Technical Report, University Alexandru Ioan Cuza of Iasi Faculty of Computer Science, December 2008.
[44]Mark Stephenson, Saman Amarasinghe, “Predicting Unroll Factors Using Supervised Classification”, Proceedings of the 3rd International Symposium on Code Generation and Optimization, San Jose, California, March 2005.
[45]Yunyang Dai, Qing Li, Qi Zhang and C.-C. Jay Kuo, “SIMD-Efficient Loop Unrolling Design for Embedded Multimedia Applications”, IEEE Internatoinal Conference on Multimedia and Expo, Vol. 3, Pages 1851-1854, June 2004.[46]Randy Allen, Ken Kennedy, “Optimizing Compilers for Modern Architectures”, Morgan Kaufmann Publishers Inc., San Francisco,
USA, 2001.
[47]Nicholas Nethercote, Doug Burger, and Kathryn S. McKinley, “Convergent Compilation Applied to Loop Unrolling”, Technical Reprot, University of Texas in Austin, 2007.
[48]Mahmut Kandemir, N. Vijaykrishnan, Mary Jane Irwin, and Wu Ye, “Compiler Optimization for Low Power System”, Chapter 10, Pages 191-210, November 2000.
[49]C. Brandolese, W. Fornaciari, F. Salice, D. Sciuto, “Analysis and Modeling of Energy Reducing Source Code Transformations”, Proceedings of Design Automation and Test in Europe Conference and Exhibition, 2004.
[50]Nikolaos Bellas, Ibrahim N. Hajj, “Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors”, IEEE Transactions on Very Large Scale Integration Systems, Vol. 8, Pages 3, June 2000.
[51]V. Tiwari, S. Malik, and A. Wolfe, “Compilation Techniques for Low Energy: An overview," Proceedings of IEEE Symposium on Low Power Electronics and Design, October 1994.
[52]H. Mehta, R. M. Owens, M. K. Irwin, R. Chen, and D. Ghosh, “Techniques for low energy software" , Proceedings of IEEE Symposium on Low Power Electronics and Design, Pages 72-75, August 1997.
[53]Marian Stanca, Henk Corporaal, Sorin Cotofana, and Stamatis Vassiliadis, “Array Based Structure Loop Transformations For Cache Miss Reduction”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
June 1996.
[54]Reiley Jeyapaul, Sandeep Marathe, Aviral Shrivastava, “Code Transformations for TLB Power Reduction”, VLSI Design, 2009 22nd International Conference, Pages 413-418, January 2009.
[55]Yong Kang Zhu, Grigorios Magklis, Michael L. Scott, Chen Ding, and David H. Albonesi, “The Energy Impact of Aggressive Loop Fusion”, Parallel Architecture and Compilation Techniques, Proceedings of 13th International Conference, Septemper 2004.
[56]John Mellor-Crummey, “Harnessing the Power of Emerging Petascale Platforms”, Journal of Physics, IOP Publishing 2007.
[57]SimpleScalar power, http://www.eecs.umich.edu/.
[58]H. S. Kim, M. J. Irwin, N. Vijaykrishnan, and M. Kandemir, “Effect of Compiler Optimizations on Memory Energy”, IEEE Workshop on Signal Processing Systems, Pages 663-672, October 2000.
[59]Mark Stephenson and Saman Amarasinghe, “Predicting Unroll Factors Using Supervised Classification”, Proceedings of the International Symposium on Code Generation and Optimization, 2005.
[60]ARM11, http://www.arm.com/products/CPUs/ARM11
[61]Majid Sarrafzadeh, Foad Dabiri, Roozbeh Jafari, Tammara Massey, Ani Nahapetan, “Low Power Light-weight Embedded Systems”, Proceedings of the 2006 International Symposium on Low Power Electronics and Design, (ISLPED’06), Tegernsee, Germany, October 2006.
[62]Wen-Tsong Shiue, Chaitali Chakrabarti, “Memory Design and Exploration for Low Power, Embedded Systems”, Journal of VLSI Signal Processing Systems, Volume 29, Issue 3, November 2001.
[63]Energy, http://academic.csuohio.edu/yuc/talks/low-energy2k1021.pdf.
[64]Te-Shin Yang, ”Improving ILP with the Vectorized Computing Machanism in VLIW DSP Architecture”, Master Degree Thesis, Department of Electrical Engineering, National Sun Yat-Sen University, June 2003.
[65]Compiler, http://en.wikibooks.org/GNU_C_Compiler_Architecture.
[66]Ken Kennedy, Kathryn MacKinley, “Typed Fusion with Applications to Parallel and Sequential Code Deneration”, Center for Research on Parallel Computation, Rice University, June 1994.
[67]Meikang Qiu, Edwin H.-M. Sha, Meilin Liu, Man Lin, Shaoxiong Hua, Laurence T. Yang, “Energy Minimization with Loop Fusion and Multi-Functional-Unit Scheduling for Multi-Dimensional DSP”, Journal of Parallel and Distrubuted Computing, Vol. 68, Issue 4, Pages 443-455, November 2007.
[68]Naraig Manjikian, Tarek S. Abdelrahman, “Fusion of Loops for Parallelism and Locality”, IEEE Transcations on Parallel and Distributed Systems, Vol. 8, Pages 2, February 1997.
[69]Sven Verdoolaege, Maurice Bruynooghe, Gerda Janssens, Francky Catthoor, “Multi-Dimensional Incremental Loop Fusion for Data Locality", Proceedings of the Application-Specific Systems Architectures and Processors, July 2003.
校內:2016-07-01公開