| 研究生: |
郭興邦 Kuo, Hsin-Pang |
|---|---|
| 論文名稱: |
一可處理多重取樣率系統之低功率系統合成演算法 A Low Power Synthesis Flow for Multi-rate Systems |
| 指導教授: |
李昆忠
Lee, Kuen-Jong |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 高階合成 、多重取樣率 、低功率 、同步資料流 、模擬退火法 、全域非同步區域同步系統 、多重時脈系統 、緩衝器優化 |
| 外文關鍵詞: | High-level Synthesis, Multi-rate, Low Power, Synchronous Data Flow, Simulated Annealing, Globally Asynchronous Locally Synchronous, Multiple Clock Domains, Buffer Optimization |
| 相關次數: | 點閱:84 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現今大部分攜帶式裝置中都含有數位信號處理相關的應用,像是一些多媒體跟影像處理方面的設計,這些應用需要許多的運算資源並造成可觀的功率消耗,因此對於攜帶式裝置低功率的設計方式一直是有其必要的。在數位信號處理中時常需要處理時域頻域之間的轉換因此時常需要多重取樣率的的設計方式而同步資料流模型時常被用來描述含有多重取樣率的應用。在此篇研究中我們提出了一個合成流程可用來將以同步資料流描述的多重取樣率應用合成為一低功耗且使用少量的傳輸緩衝器並同時滿足特定的吞吐量的電路。因功耗最佳化屬於NP完全問題,我們的合成流程首先會使用全域非同步區域同步的架構以及自定時排程讓運算單元盡量運作在較慢的時脈頻率來優化運算單元的功率消耗,接著我們提出了一種混合式緩衝機制來優化傳輸單元的功率消耗。此混合式緩衝機制利用同步記憶單元來維持資料流的同步,接著插入少量的非同步緩衝器來協調橫跨時脈的資料傳輸。然而以上方法只能達到區域最佳化但並不能達到全域最佳化,因為當運算單元運作在較慢的時脈頻率之下可能會需要使用較大深度的非同步緩衝器導致較高的功率消耗,對於此問題我們則接著使用模擬退火法來去探索系統的各種組態,並試著找出運算單元以及傳輸單元之間一平衡組態以降低非同步緩衝器的深度使得整體系統功耗可以再降低。實驗結果以JPEG編碼器以及Intra Frame編碼器來驗證此方式的效果
Streaming Digital Signal Processing (DSP) applications such as multimedia and graphic processing are deployed in majority of mobile devices that highly desire low power design methodology. Such DSP applications usually employ time and/or frequency domain transformations that require multi-rate designs by nature. As the Synchronous Data Flow (SDF) has been widely used to model streaming multi-rate DSP applications, in this thesis we develop a synthesis method for multi-rate systems modelled by SDF graphs with the objective of minimizing the power consumption while satisfying the given throughput constraint and using as few communication buffers as possible. Owing to the NP-completeness nature of the problem, we propose a synthesis flow by first optimizing computation power using self-timed scheduling and Globally Asynchronous Locally Asynchronous (GALS) architecture. The buffer area and the consumed power are then optimized by using a novel hybrid synchronous/asynchronous buffering mechanism. This hybrid buffering mechanism employs synchronous memory to implement minimal size of buffer for data synchronization. It then inserts minimal asynchronous buffers to serve the purpose of Clock-Domain-Crossing (CDC) communication. However the above method may result in local but not global optimization because applying lowest possible clock rates over computation units first may result in bigger communication buffers and higher total power consumption. As such a simulated-annealing heuristic (SA) is used to explore the design space of the target system. By identifying a balanced configuration between computation components and communication buffers the SA further minimizes the total power. Experiments on a JPEG encoder and an Intra Frame encoder of an H.264 decoder have been done and the results validate the efficiency and effectiveness of the proposed method in dealing with multi-rate systems.
[1] J. B. Dennis, “First Version of A Data Flow Procedure Language,” in Symposium on Programming, 1974, pp. 362–376.
[2] G. Kahn, “The Semantics of Simple Language for Parallel Programming,” in IFIP Congress, 1974, pp. 471–475.
[3] R. E. Crochiere and L. R. Rabiner, “Multirate Digital Signal Processing,” Prentice Hall, 1983
[4] D. M. Chapiro, “Globally-asynchronous Locally-synchronous Systems (Performance, Reliability, Digital),” Doctorate Dissertation, Stanford University, Stanford, CA, 1985.
[5] E. A. Lee and D. G. Messerschmitt, “Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing,” Computers, IEEE Transactions on, vol. C-36, no. 1, 1987, pp. 24 – 35
[6] E. A. Lee and D. G. Messerschmitt, ” Synchronous Data Flow,” Proc. IEEE, vol. 75, no. 9, 1987, pp. 1235-1245
[7] S. S. Bhattacharyya and E. A. Lee, “Scheduling Synchronous Dataflow Graph for Efficient Looping,” Journal of VLSI Signal Processing, 1993, pp. 271-288
[8] G. Bilsen, M. Engels, R. Lauwereins. R, J. Perperstraete, “Cycle-static Dataflow,” Singal Processing, IEEE Transcations on, vol. 44, 1996, pp. 397-408.
[9] M. Ade, R. Lauwereins, J. A. Peperstraete, “Data Memory Minimisation for Synchronous Data Flow Graphs Evaluated on DSP-FPGA Targets,” Proc. Design Automation Conference, 1997, pp. 64-69.
[10] A. Hemani, T. Meincke, S. Kumar, A. Postula, T. Olsson, P. Nilsson, J. Oberg, P. Ellervee, and D. Lundqvist, “Lowering Power Consumption in Clock by Using Globally Asynchronous Locally Synchronous Design Style,” Proc. Design Automation Conference, 1999, pp. 873–878.
[11] S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee, Software Synthesis from Dataflow Graphs, Kluwer, 1999.
[12] L. Benini, M. Ferrero, A. Macii, E. Macii and M. Poncino, “Supporting System-level Power Exploration for DSP Applications,” GLSVLSI, 2000, pp. 17-22.
[13] S. Sriram and S. S. Bhattacharyya, Embedded Multiprocessors: Scheduling and Synchronization, CRC Press, 2000.
[14] R. Govindarajan, G. R. Gao, and P. Desai, “Minimizing Buffer Requirements Under Rate-optimal Scheduling in Regular Dataflow Networks,” Journal of VLSI Signal Processing, 2002, pp. 207-229.
[15] M. Edwards, P. Green, “The Implementation of Synchronous Dataflow Graph Using Reconfigurable Hardware,” Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Computing, 2002, pp. 739-748
[16] Joint Video Team of ITU-T VCEG and ISO/IEC MPEG, “Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification”, ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, 2003.
[17] J. Hu, and R. Marculescu, “Energy-Aware Mapping for Tile-based NoC Architecture Under Performance Constraints,” ASPDAC, 2003, pp. 233-239.
[18] M. Geilen, T. Basten and S. Stuijk, “Minimising Buffer Requirements of Synchronous Dataflow Graphs with Model Checking,” Design Automation Conference, 2005, pp. 819-824
[19] S. Stuijk, M. Geilen, and T. Basten, “Exploring Trade-offs in Buffer Requirements and Throughput Constraints for Synchronous Dataflow Graphs,” Design Automation Conference, 2006, pp. 899-904.
[20] M. Wiggers, M. Bekooij, and G. Smit, “Efficient Computation of Buffer Capacities for Cyclostatic Dataflow Graphs,” University of Twente, Tech. Report, 2007, pp. 658-663
[21] O. Moreira and M. Bekooij. Self-timed scheduling analysis for real-time applications. EURASIP Journal on Advances in Signal Processing, 2007.
[22] C. Bilsem, M. Engels, R. Lauwereins, and J.A Peperstraete, “Static Scheduling of Multi-rate and Cyclo-static DSP Applications,” VLSI Signal Processing, 1994, pp. 137-146
[23] M. Krstic and F. K. Gurkaynak, “Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook,” Design & Test of Computers, 2007, pp. 430-441.
[24] S. Suhaib, D. Mathaikutty, and S. Shukla, “Dataflow Architectures for GALS,” Electronic Notes in Theoretical Computer Science, vol. 200, no. 1, 2008, pp. 33–50.
[25] J. Zhu, I. Sander and A. Jantsch, “Energy Efficient Streaming Applications with Guaranteed Throughput on MPSoCs,” EMSOFT, 2008, pp. 19-24.
[26] N. Guan, Z. Gu, W. Yi and G. Yu, “Improving Scalability of Model-checking for Minimizing Buffer Requirements of Synchronous Dataflow Graphs,” ASP-DAC, 2009, pp. 715-720
[27] J. Zhu, “Energy and Design Cost Efficiency for Streaming Applications on Systems-on-Chip,” Licentiate Thesis, Stockholm, Sweden, 2009.
[28] H. Prabhu, S. Thomas, J. Rodrigues, T. Olsson and A. Carlsson, “A GALS ASIC Implementation from a CAL Dataflow Description,” NORCHIP, 2011, pp. 1-4.
[29] J. Cong, M. Huang, B. Liu, P. Zhang and Y. Zou, “Combing Module Selection and Replication for Throughput-Driven Streaming Programs,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, pp. 1018-1023.
[30] J. Y. Le Boudec and P. Thiran, Network Calculus: A Theory of Deter Ministic Queuing Systems for the Internet, Online Version of the Book Springer Verlag, 2012.
[31] K. Ravindran, A. Ghosal, R. Limaye, G. Wang, G. Yang, and H. Andrade, “Analysis Techniques for Static Dataflow Models with Access Patterns,” in Proc. DASIP, 2012, pp. 1-8.
[32] A. Ghosal, R. Limaye and Tripakis, S, “Static Dataflow with Access Patterns: Semantics and Analysis,” Design Automation Conference (DAC), 2012, pp. 656-663.
[33] S. C. Brunet, E. Bezati, C. Alberti, M. Mattavelli, E. Amaldi and J. W. Janneck, "Multi-Clock Domain Optimization for Reconfigurable Architectures in High-level Dataflow Applications", Signals, Systems and Computers, 2013, pp. 1796 – 1800.
[34] S. C. Brunet, C. Alberti, M. Mattavelli, and J. W. Janneck, "Turnus: A Unified Dataflow Design Space Exploration Framework for Heterogeneous Parallel Systems," Design and Architectures for Signal and Image Processing (DASIP), 2013, pp. 47-54
[35] S. C. Brunet, E. Bezati, C. Alberti, M. Mattavelli, E. Amaldi and J. W. Janneck, "Partitioning and Optimization of High Level Stream Applications for Multi Clock Domain Architectures", Signal Processing Systems (SiPS), 2013, pp. 177 – 182.
[36] S. C. Brunet, M. Mattavelli and J. W. Janneck, “Buffer Optimization Based on Critical Path Analysis of a Dataflow Program Design,” ISCAS, 2013, pp. 1384-1387.
[37] J. W. Janneck, S. C. Brunet and M. Mattavelli, “Characterizing Communication Behavior of Dataflow Programs Using Trace Analysis,” Proceeding International Conference on Embedded Computer Systems: Architecture, Modeling, and Simulation, 2014, pp. 44-50.
[38] W. Ahmad, P. K. F. Holzenspies, M. Stoelinga and J. V. D. Pol, “Green Computing: Power Optimisation of VFI-based Real-time Multiprocessor Dataflow Applications,” Digital System Design, 2015, pp. 271-275
[39] C. Yang, K. J. Lee, “A Throughput Driven Low Power Design Methodology for Multiple-Sample-Rates DSP Systems with Feedback Loops,” Master Thesis, Dept. of E. E., NCKU, Taiwan, 2015.
[40] Cadence C to Silicon (https://www.cadence.com)
[41] Synopsys Design Compiler, http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DCUltra/Pages/default.aspx
[42] Synopsys Primetime PX, http://www.synopsys.com/Tools/Implementation/SignOff/Pages/PrimeTime.aspx