| 研究生: |
戴自強 Tai, Tzu-Chiang |
|---|---|
| 論文名稱: |
可重組化計算之效能最佳化 Performance Optimization for Reconfigurable Computing |
| 指導教授: |
賴源泰
Lai, Yen-Tai |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 83 |
| 中文關鍵詞: | 可重組化計算 、電路分割 、效能 |
| 外文關鍵詞: | reconfigurable computing, circuit partitioning, performance |
| 相關次數: | 點閱:137 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
可重組計算是一種相當有前景的技術,因為這種技術可以藉著比軟體更高的效率及比硬體更好的彈性,來滿足未來的運算需求。在眾多的可重組計算硬體之中,現場可規劃邏輯陣列是最常被使用的一種。其中,動態現場可規劃邏輯陣列尤其受到了極大的關注。因為透過分時多工的方式重複使用相同的現場可規劃邏輯陣列裝置,使得邏輯使用的密度可以大幅度地增加。
在本論文中,我們研究了動態現場可規劃邏輯陣列的一些重要的電路分割問題。我們發展了數個分割演算法,其目標分別為效能最佳化的情況下考慮通訊成本,消耗功率最小化,及使用高效率演算法達成通訊成本最佳化。
一個電路分割演算法必須滿足今日對於效能的迫切要求。我們對動態現場可規劃邏輯陣列提出了一個以效能為導向的演算法。這個演算法將電路切割成數個子電路時,確保所有子電路執行時間的上限可以被最小化。在尋找最佳分割的過程之中,通訊成本也被列入考量。我們首先建立一個圖形來表示節點在時間上的優先順序,同時也利用這個圖形來計算分割時所需要的緩衝器數目。而我們的演算法包含三個階段。在第一個階段,我們將圖形內只有單一輸出的子系統放置於同一叢集,以便降低圖形的大小。這樣的子系統具有大量的內部連接線,因為除了唯一的輸出之外,其他所有節點的扇出節點都在子系統之內。這個階段可以明顯地降低分割時的運算複雜度。第二個階段尋找一個具有效能最佳化的分割。最後,第三個階段使用遞迴式增進的方法來使得通訊成本最小化。
由於無線及可持裝置的日益普及,降低功率消耗已經成為大家關注的焦點。針對以查找表(lookup table)做為基本邏輯區塊的動態現場可規劃邏輯陣列,我們研究了執行電路分割的方法,使得電路執行時的消耗功率可以最小。我們分析了在時序分割中的通訊緩衝器所造成的功率消耗。基於這個分析,我們使用流網路(flow network)來代表一個給定電路,這樣在執行電路分割時,緩衝器的功率消耗可以被正確地計算,同時也可以滿足節點在時間上的優先順序。然後我們將著名的、以網路流為基礎的FBB演算法應用在這個網路上,以便找到一個具有最小功率消耗而且滿足面積平衡的分割。實驗結果顯示,我們的方法所產生的結果,在功率消耗方面比傳統的電路分割方法更好。
演算法的效率對實際應用相當重要。我們提出了一個高效率的演算法來對動態現場可規劃邏輯陣列中的暫存器數目最佳化。這個演算法分為兩個階段: (1)貼標籤階段;(2)最小化成本階段。我們首先改良「儘早排程」及「儘晚排程」演算法,再來分配節點,使得循序電路在分割時能滿足節點在時間上的優先順序,然後我們調整節點的位置或是複製節點來達成暫存器數目的最佳化。
Reconfigurable computing is a promising technology to meet future computational demand by achieving potentially much higher performance than software, while maintaining a higher level of flexibility than hardware. Among a variety of reconfigurable devices, field programmable gate arrays (FPGAs) is one of the most widely used. Particularly, dynamically reconfigurable FPGAs have attracted a lot of attention due to their potential to dramatically increase logic density by sharing the same FPGA device in a time-multiplexed fashion.
In this dissertation, we study some of the essential partitioning problems for dynamically reconfigurable FPGAs. We develop partitioning algorithms with a variety of objectives that are optimal performance with consideration on communication cost, minimum power consumption, optimal communication cost with high algorithm efficiency, respectively.
A partitioning algorithm must meet today’s urgent performance requirements. We present a performance oriented algorithm for the DRFPGA partitioning problem. This algorithm partitions a given circuit system into stages such that the upper bound of the execution times of sub-circuits is minimized. The communication cost is taken into consideration in the process of searching for the optimal solution. A graph is first constructed to represent the precedence constraints and calculate the number of buffers needed in a partitioning. This algorithm includes three phases. The first phase reduces the problem size by clustering the gates into subsystems that have only one output. Such a subsystem has a large number of intra-connections because the fan-outs of all vertices except for the one output are fed to the vertices inside the subsystem. This phase significantly reduces the computational complexity of partitioning. The second phase finds a partition with optimal performance. Finally, the third phase minimizes the communication cost by using an iterative improvement approach.
Reducing power consumption has become a major concern because of the increasing popularity of wireless and portable devices. We study the partitioning problem targeting at power minimization for the DRFPGAs that have lookup table (LUT) based logic blocks. We analyze the power consumption caused by the communication buffers in the temporal partitioning. Based on the analysis, we use a flow network to represent a given circuit so that the power consumption of buffers is correctly evaluated and the temporal constraints are satisfied in circuit partitioning. The well known flow-based FBB algorithm is then applied to the network to find the area-balanced partitioning of minimum power consumption. Experimental results show that our method outperforms the conventional partitioning algorithms in terms of power consumption.
The algorithm efficiency is important for practical applications. We present a highly efficient partitioning algorithm to optimize the number of registers for dynamically reconfigurable FPGAs. The algorithm is divided into two phases: (1) the labeling phase and (2) the minimizing cost phase. We first improved the ‘‘as soon as possible’’ and ‘‘as late as possible’’ algorithms to assign nodes so that the constraints of partition a sequential circuit are satisfied for dynamically reconfigurable FPGAs. Then, some nodes are adjusted or replicated to optimize the number of registers.
[1] A. DeHon, “DPGA-coupled microprocessors: commodity ICs for the early 21st century,” in Proc. IEEE Workshop FPGAs Custom Comput. Mach., 1994, pp. 31–39.
[2] J. Brown, D. Chen, I. Eslick, E. Tau, and A. DeHon, DELTA: Prototype for a First-Generation Dynamically Programmable Gate Array. Cambridge, MA: MIT Press, 1995.
[3] D. Jones and D. M. Lewis, “A time-multiplexed FPGA architecture for logic emulation,” in Proc. IEEE Custom Integr. Circuits Conf., 1995, pp. 495–498.
[4] N. B. Bhat, K. Chaudhary, and E. S. Kuh, “Performance-oriented fully routable dynamic architecture for a field programmable logic device,” Univ. California, Berkeley, Memo no. UCB/RELM93/42, 1993.
[5] S. Trimberger, “A time-multiplexed FPGA”, in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1997, pp. 22–28.
[6] S. Trimberger, “Scheduling designs into a time-multiplexed FPGA,” in Proc. ACM Int. Symp. FPGAs, 1998, pp. 153–160.
[7] T. Fujii et al., “A dynamically reconfigurable logic engine with a multi-context/multi-mode unified-cell architecture,” in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 364–365.
[8] W. K. Mak and E. F. Y. Young, “Temporal logic replication for dynamically reconfigurable FPGA partitioning,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 22, no. 7, pp. 952–959, Jul. 2003.
[9] D. Chang and M. Marek-Sadowska, “Buffer minimization and time-multiplexed I/O on dynamically reconfigurable FPGAs,” in Proc. ACM Int. Symp. FPGAs, 1997, pp. 142–148.
[10] D. Chang and M. Marek-Sadowska, “Partitioning sequential circuits on dynamically reconfigurable FPGAs,” IEEE Trans. Comput., vol. 48, no. 6, pp. 565–578, June 1999.
[11] H. Liu and D. F. Wong, “Network flow based circuit partitioning for time-multiplexed FPGAs,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., 1998, pp. 497–504.
[12] M. C. T. Chao, G. M. Wu, I. H. R. Jiang, and Y. W. Chang, “A clustering- and probability-based approach for time-multiplexed FPGA partitioning,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., 1999, pp. 364–368.
[13] G. M. Wu, J. M. Lin, and Y. W. Chang, “Generic ILP-based approaches for time-multiplexed FPGA partitioning,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 20, no. 10, pp. 1266–1274, Oct. 2001.
[14] J. Cong and Y. Ding, “On area/depth tradeoff in LUT-based FPGA technology mapping,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 2, no. 2, pp. 137–148, June 1994.
[15] E. S. H. Wong, E. F. Y. Young, and W. K. Mak, “Clustering based acyclic multi-way partitioning,” in Proc. ACM Great Lakes Symp. VLSI, 2003, pp. 203–206.
[16] Xilinx, Inc., The Programmable Logic Data Book. San Jose, CA: Xilinx Inc., 1996.
[17] C. C. Kao and Y. T. Lai, “An efficient algorithm for finding the minimal-area FPGA technology mapping,” ACM Trans. Des. Autom. Electron. Syst., vol. 10, no. 1, pp. 168–186, Jan. 2005.
[18] Gordon R. Chiu, Deshanand P Singh, Valavan Manohararajah, and Stephen D. Brown, “Mapping arbitrary logic functions into synchronous embedded memories for area reduction on FPGAs,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., 2006, pp. 135–142.
[19] J. Cong, Li Zheng, and R. Bagrodia, “Acyclic multi-way partitioning of boolean networks,” in Proc. Des. Autom. Conf., 1994, pp. 670–675.
[20] N. K. Pareek, Vinod Patidar, and K. K. Sud, “Image encryption using chaotic logistic map,” Image and Vision Computing, vol. 24, no. 9, pp. 926–934, Sep. 2006.
[21] A. DeHon, “DPGA utilization and application,” in Proc. ACM Int. Symp. FPGA, 1996, pp. 115–121.
[22] A. H. Farrahi and M. Sarrafzadeh, “FPGA technology mapping for power minimization,” in Proc. Int. Workshop Field Programmable Logic and Applicat., 1994, pp. 66–77.
[23] F. N. Najm, “Transition density: a new measure of activity in digital circuits,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 12, no. 2, pp. 310–323, Feb. 1993.
[24] Z. H. Wang, E. C. Liu, J. Lai, and T. C. Wang, “Power minimization in LUT-based FPGA technology mapping,” in Proc. Asia South Pacific Des. Autom. Conf., 2001, pp. 635–640.
[25] H. Li, S. Katkoori, and W. K. Mak, “Power minimization algorithms for LUT-based FPGA technology mapping,” ACM Trans. Des. Autom. Electron. Syst., vol. 9, no. 1, pp. 33–51, Jan. 2004.
[26] V. Saxena, F. N. Najm, and I. N. Hajj, “Estimation of state line statistics in sequential circuits,” ACM Trans. Des. Autom. Electron. Syst., vol. 7, no. 3, pp. 455–473, Jul. 2002.
[27] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective. Reading, MA: Addison-Wesley, 1993.
[28] J. R. Ford and D. R. Fulkerson, Flows in Networks. Princeton, NJ: Princeton Univ. Press, 1962.
[29] H. Yang and D. F. Wong, “Efficient network flow based min-cut balanced partitioning,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 15, no. 12, pp. 1533–1540, Dec. 1996.
[30] H. Liu and D. F. Wong, “Network-flow-based multiway partitioning with area and pin constraints,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 17, no. 1, pp. 50–59, Jan. 1998.
[31] L. Cheng, F. Li, Y. Lin, P. Wong, and L. He, “Device and architecture cooptimization for FPGA power reduction,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 26, no. 7, pp. 1211–1221, Jul. 2007.
[32] K. Furuta, T. Fujii, M. Motomura, K.Wakabayashi, and M. Yamashina, “Spatial-temporal mapping of real applications on a dynamically reconfigurable logic engine (DRLE) LSI,” in Proc. IEEE Custom Integr. Circuits Conf., 2000, pp. 151–154.
[33] ITRS, “International Technology Roadmap for Semiconductors,” 2002 [Online]. Available: http://www.itrs.net
[34] M. Yamashina and M. Motomura, “Reconfigurable computing: its concept and a practical embodiment using newly developed dynamically reconfigurable logic (DRL) LSI,” in Proc. Asia South Pacific Des. Autom. Conf., 2000, pp. 329–332.
[35] T. Nishitani, “An approach to a multimedia system on a chip,” in Proc. IEEE Workshop Signal Processing Syst., 1999, pp. 13–22.
[36] A. J. Elbirt and C. Paar, C, “An FPGA implementation and performance evaluation of the serpent block cipher,” in Proc. ACM. Int. Symp. FPGAs, 2000, pp. 33–40.
[37] H. J. Kim and W. H. Mangione-Smith, “Factoring large numbers with programmable hardware,” in Proc. ACM. Int. Symp. FPGAs, 2000, pp. 41–48.
[38] J. R. Hauser and J. Wawrzynek, “Garp: A MIPS processor with a reconfigurable coprocessor,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1997, pp.12–21.
[39] K. H. Leung, K. W. Ma, W. K. Wong, and P. H. W. Leong, “FPGA Implementation of a microcoded elliptic curve cryptographic processor,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 2000, pp. 68–76.
[40] M. Rencher and B. L. Hutchings, “Automated target recognition on SPLASH2,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1997, pp. 192–200.
[41] M. Weinhardt and W. Luk, “Pipeline vectorization for reconfigurable systems,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1999, pp. 52–62.
[42] A. Dollas, E. Sotiriades, and A. Emmanouelides, “Architecture and design of GE1,AFCCM for golomb ruler derivation,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1998, pp. 48–56.
[43] E. Sotiriades, A. Dollas, and P. Athanas, “Hardware-software codesign and parallel implementation of a Golomb ruler derivation engine,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 2000, pp. 227–235.
[44] L. Huelsbergen, “A representation for dynamic graphs in reconfigurable hardware and its application to fundamental graph algorithms,” in Proc. ACM. Int. Symp. FPGAs, 2000, pp. 105–115.
[45] P. Zhong, M. Martinosi, P. Ashar, and S. Malik, “Accelerating Boolean satisfiability with configurable hardware,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1998, pp. 186–195.
[46] W. J. Huang, N. Saxena, and E. J. Mccluskey, “A reliable LZ data compressor on reconfigurable coprocessors,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 2000, pp. 249–258.
[47] P. Graham and B. Nelson, “Genetic algorithms in software and in hardware—A performance analysis of workstations and custom computing machine implementations,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1996, pp. 216–225.
[48] C. Ebeling, D. C. Cronquist, and P. Franklin, “RaPiD—Reconfigurable pipelined datapath,” in Lecture Notes in Computer Science 1142—Field-Programmable Logic: Smart Applications, New Paradigms and Compilers, R. W. Hartenstein and M. Glesner, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 126–135.
[49] S. Cadambi, J. Weener, S. C. Goldstein, H. Schmit, and D. E. Thomas, “Managing pipeline-reconfigurable FPGAs,” in Proc. ACM. Int. Symp. FPGAs, 1998, pp. 55–64.
[50] C. R. Rupp, M. Landguth, T. Garverick, E. Gomersall, H. Holt, J. M. Arnold, and M, Gokhale, “The NAPA adaptive processing architecture,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1998, pp. 28–37.
[51] S. M. Scalera and J. R. Vazquez, “The design and implementation of a context switching FPGA,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1998, pp. 78–85.
[52] S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao, “The Chimaera reconfigurable functional unit,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1997, pp. 87–96.
[53] Xilinx, Inc., The Programmable Logic Data Book. San Jose, CA: Xilinx Inc., 1994.
[54] Altera Corporation, Data Book. San Jose, CA: Altera Corporation, 1998.
[55] Lucent Technologies, Inc., FPGA Data Book. Allentown, PA: Lucent Technologies, Inc., 1998.
[56] Chameleon Systems, Inc., CS2000 Advance Product Specification. San Jose, CA: Chameleon Systems, Inc., 2000.
[57] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. R. Taylor, “PipeRench: A Reconfigurable Architecture and Compiler,” IEEE Computer, vol. 33, no. 4, pp. 70–77, Apr. 2000.
[58] Xilinx, Inc., XC6200: Advance Product Specification. San Jose, CA: Xilinx Inc., 1996.
[59] Xilinx, Inc., VirtexTM 2.5 V Field Programmable Gate Arrays: Advance Product Specification. San Jose, CA: Xilinx Inc., 1999.
[60] E. Canto, J. M. Moreno, J. Cabestany, I. Lacadena, and J. M. Insenser, “A temporal bipartitioning algorithm for dynamically reconfigurable FPGAs,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 1, pp. 210–218, Feb. 2001.