簡易檢索 / 詳目顯示

研究生: 陳盈誌
Chen, Ying-Chih
論文名稱: 高速二維離散餘弦轉換之電路設計
VLSI Design of High-Speed Two-Dimensional Discrete Cosine Transform
指導教授: 郭耀煌
Kuo, Yau-Hwang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 70
中文關鍵詞: 超大型積體電路設計二維離散餘弦轉換高速分散式運算免乘法器免ROM
外文關鍵詞: discrete cosine transform (DCT), VLSI, two-dimensional (2-D), high-speed, multiplier-free, distributed arithmetic, ROM-free
相關次數: 點閱:80下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們提出了一個可以達到高速二維離散餘弦轉換的積體電路設計。在本論文中,所有在二維轉換運算中所需要的一維離散餘弦轉換相關計算通通都被結合到我們所提出硬體架構中的一個二維離散餘弦轉換模組。換句話說,在此處所提出的二維離散餘弦轉換硬體模組直接了當地計算了一整個二維轉換的區塊。首先,每一個個別的二維離散餘弦轉換係數都是由在行與列的計算方式所產生的所有一維離散餘弦轉換擴展後的運算所構成。接著,將所有二維的離散餘弦轉換係數都分解成二的補數形式,如此一來所有的乘法運算都可以由加法運算來取代,並由此來達成我們免乘法器的目標。最終,所有用同樣方式被分解過後的二維係數將會被放在一起同時考慮,這樣的考慮是肇因於其中的某些加法運算所達成的功能是一模一樣的。因此,為了要讓所有加法運算的利用率加到最大,我們利用了基因演算法來解決這個減少非必要加法器的最佳化問題。我們所提出二維離散餘弦轉換的整個構造在完成了此最佳化演算後即可得到。除此之外,為了要達到較高的產能,此硬體採用了管線化的設計方式來進行實作。在此論文中,我們設計了一個八乘八二維離散餘弦轉換的硬體架構,並且利用TSMC 0.18-um的製程技術進行模擬。此設計最終達到了9GHz Mpel/sec的生產力伴隨著7cycles的延遲,67.5mW的功率消耗與大約十六萬兩千個邏輯閘的結果。

    A VLSI design achieving high speed two-dimensional discrete cosine transform is proposed. In this work, all the one-dimensional (1-D) DCT required in a 2-D transform are combined into a 2-D DCT module in the proposed architecture. In other words, the 2-D DCT hardware module proposed computes the transform of a two dimension block directly. At first, each coefficient of a single 2-D DCT is composed of the calculation expended by all the 1-D DCTs in row and column directions. After that, all the 2-D coefficients in DCT domain are decomposed into two’s complement format, and then the additions are substituted for the multiplications for achieving multiplier-free. Finally, all the 2-D coefficients decomposed are considered at the same time because of some of the additions the functions of which are same. Therefore, for maximizing the utilization of additions, genetic algorithms (GAs) is used to solve the optimal problem to reduce the unnecessary adders. After the optimization, the structure of the 2-D DCT is obtained. Besides, the hardware implementation based on a pipeline design is used for achieving high throughput. In the paper, an 8x8 2-D DCT is designed and simulated in TSMC 0.18-um technology. As a result, 9GHz Mpels/sec throughput with 7cycles latency, 67.5mW power consumption and 162k gate counts approximately are obtained.

    LIST OF TABLES XII LIST OF FIGURES XIII CHAPTER 1 INTRODUCTION 1 1.1 BACKGROUND AND MOTIVATION 2 1.2 ISSUES IN DISCRETE COSINE TRANSFORM 5 1.3 CONTRIBUTIONS OF THIS PAPER 6 1.4 ORGANIZATION OF THIS PAPER 7 CHAPTER 2 RELATED WORK 8 2.1 DCT APPROACHES OF COMPLEXITY REDUCED 8 2.1.1 Distributed Arithmetic Approach 9 2.1.2 New Distributed Arithmetic Approach 12 2.1.3 Systolic Array Approach 17 2.2 DCT APPROACHES OF TWO-DIMENSIONAL 19 2.2.1 Row-Column Approach 19 2.2.2 Polynomial Transform Approach 21 2.3 COMPARISONS OF DCT APPROACHES 27 CHAPTER 3 METHODOLOGY 29 3.1 OVERVIEW 30 3.2 DCT COEFFICIENTS SCATTER 31 3.3 BINARY 33 3.4 INTER-LEVELS OPTIMIZATION 36 3.4.1 Define Problem 36 3.4.2 Formulations of Sub-level 40 3.4.3 Formulations of Top-level 47 3.4.4 Solutions Searching Algorithms 48 3.5 MATHEMATICAL ANALYSIS 53 3.5.1 Sub-level 53 3.5.2 Top-level 54 3.5.3 Features of the Problem 54 3.6 HARDWARE ARCHITECTURE 55 CHAPTER 4 EXPERIMENT RESULTS 57 4.1 RESULTS OF INTER-LEVELS OPTIMIZATION 57 4.2 DESIGN CHARACTERISTICS AND COMPARISONS 58 4.3 SIMULATIONS 61 CHAPTER 5 CONCLUSION AND FUTURE WORK 63 REFERENCES 64 APPENDIX A HEURISTIC OPTIMAL METHODS 67

    [CHA99] T. S. Chang, C. Chen and C. W. Jen, “New Distributed Arithmetic Algorithm and Its Application to IDCT,” IEE Proc. Of Circuits, Devices and Systems, Vol. 146, No. 4, pp. 159-163, August 1999
    [CHI00] H. C. Chang, J. Y. Jiu, L. L. Chen and L. G. Chen, “A Low Power 8×8 Direct 2-D DCT Chip Design,” Journal of VLSI Signal Processing Systems, Vol. 26, Issue 3, pp. 319-332, November 2000
    [CHI06-1] A. Chidanandan and M. Bayoumi, “Area-Efficient NEDA Architecture For The 1-D DCT/IDCT,” IEEE International Conference on Acoustic, Speech and Signal Processing, Vol. 3, pp. 944-947, May 2006
    [CHI06-2] Y. M. Chien and Y. Lin, “A Recursive DCT Algorithm with New Distributed Arithmetic,” IEEE Proc. Of International Conference on Communications, Circuits and Systems, Vol. 4, pp. 2582-2587, June 2006
    [CHI06-3] A. Chidanandan, J. Moder, M. Bayoumi, “Implementation of NEDA-based DCT Architecture using Even-Odd Decomposition of the 8×8 DCT Matrix,” IEEE International Midwest Symposium on Circuits and Systems, Vol. 1, pp. 600-603, August 2006
    [CHI07] D. F. Chiper, M. N. S. Swamy and M. O. Ahmad, “An Efficient Unified Framework for Implementation of a Prime-Length DCT/IDCT With High Throughput,” IEEE Trans. On Signal Processing, Vol. 55, No. 6, pp. 2925-2936, June 2007
    [CHO91] N. I. Cho and S. U. Lee, “Fast Algorithm and Implementation of 2-D Discrete Cosine Transform,” IEEE Trans. On Circuits and Systems, Vol. 38, No. 3, pp. 297-305, March 1991
    [CHO92] N. I. Cho and S. U. Lee, “A Fast DCT Algorithm for the Recursive 2-D DCT,” IEEE Trans. On Signal Processing, Vol. 40, No. 9, pp. 2166-2173, September 1992
    [CHR95] C. A. Christopoulos, J. Bormans, J. Cornelis and A. N. Skodras, “The Vector-radix Fast Cosine Transform: Pruning and Complexity Analysis,” Signal Processing, Vol. 43, Issue 2, pp. 197-205, May 1995
    [DAR05] T. Darwish and M. Bayoumi, “Coefficient Elimination Algorithm for Low Energy Distributed Arithmetic DCT Architectures,” Journal of VLSI Signal Processing, Vol. 40, Issue 3, pp. 355-369, July 2005
    [DUH90] P. Duhamel and C. Guillemot, “Polynomial Transform Computation of the 2-D DCT,” IEEE International Conference on Acoustic, Speech, and Signal Processing, Vol. 3, pp. 1515-1518, April 1990
    [FAN02] L. Fanucci and S. Saponara, “Data Driven VLSI Computation for Low Power DCT-based Video Coding,” IEEE Proceedings of 9th International Conference Electronics, Circuits, Systems, Vol. 2, pp. 541-544, 2002
    [FEI92] E. Feig and S. Winograd, “Fast Algorithms for the Discrete Cosine Transform,” IEEE Trans. On Signal Processing, Vol. 40, No. 9, pp. 2174-2193, September 1992
    [GHO05] S. Ghosh, S. Venigalla, M. Bayoumi, “Design and Implementation of a 2D-DCT Architecture using Coefficient Distributed Arithmetic,” IEEE Computer Society Annual Symposium on VLSI, pp. 162-166, May 2005
    [GUO93] J. I. Guo, C. M. Liu and C. W. Jen, “A New Array Architecture for Prime-Length Discrete Cosine Transform,” IEEE Trans. On Signal Processing, Vol. 41, No. 1, pp. 436-442, January 1993
    [HUA99] Y. M. Huang and J. L. Wu, “A Refined Fast 2-D Discrete Cosine Transform Algorithm,” IEEE Trans. On Signal Processing, Vol. 47, No. 3, pp. 904-907 March 1999
    [LIU96] M. N. Liu, “Vector-radix DCT/IDCT Implementation for MPEG DSP,” IEEE International Conference on Signal Processing, Vol. 1, pp. 641-644, October 1996
    [MAD95] A. Madisetti and A. N. Willson Jr., “A 100 MHz 2-D DCT/IDCT Processor for HDTV Applications,” IEEE Trans. On Circuits and Systems for Video Technology, Vol. 5, No. 2, pp. 158-165, April 1995
    [MEH06] P. K. Meher, “Unified Systolic-Like Architecture for DCT and DST Using Distributed Arithmetic,” IEEE Trans. On Circuits and Systems – I: Regular Papers, Vol. 53, No. 12, pp. 2656-2663, December 2006
    [NUS80] H. J. Nussbaumer, “Fast Polynomial Transform Algorithms for Digital Convolution,” IEEE Trans. On Acoustics, Speech, and Signal Processing, Vol. ASSP-28, No. 2, pp. 205-215, April 1980
    [PAN97] S. B.Pan and R. H. Park, “Unified Systolic Arrays for Computation of the DCT/DST/DHT,” IEEE Trans. On Circuits and Systems for Video Technology, Vol. 7, No. 2, pp. 413-419, April 1997
    [PAN99] W. Pan, A. Shams and M. A. Bayoumi, “NEDA: A New Distributed Arithmetic Architecture and its Application to One Dimensional Discrete Cosine Transform,” IEEE Proc. Of Workshop on Signal Processing Systems, pp. 159-168, 1999
    [PAT04] A. M. Patino, M. M. Peiro, F. Ballester and G. Paya, “2D-DCT on FPGA by Polynomial Transform in Two-Dimensions,” IEEE Proc. Of the International Symposium on Circuits and Systems, Vol. 3, pp. 365-368, May 2004
    [SHA06] A. M. Shams, A. Chidanandan, W. Oan and M. A. Bayoumi, “NEDA: A Low-Power High-Performance DCT Architecture,” IEEE Trans. On Signal Processing, Vol. 54, No. 3, pp. 955-964, March 2006
    [SHI94] K. W. Shin, H. W. Jeon and Y. S. Kang, “An Efficient VLSI Implementation of Vector-radix 2-D DCT using Mesh-Connected 2-D Array,” IEEE Proc. Of the International Symposium on Circuits and Systems, Vol. 4, pp. 47-50, May 1994
    [SUN89] M. T. Sun, T. C. Chen and A. M. Gottlieb, “VLSI Implementations of a Discrete Cosine Transform,” IEEE Trans. On Circuits and Systems, Vol. 36, No. 4, pp. 610-617, April 1989
    [URA92] S. I. Uramoto, Y. Inoue, A. Takabatake, J. Takeda, Y. Yamashita, H. Terane and M. Yoshimoto, “A 100 Mhz 2-D Discrete Cosine Transform Core Processor,” IEEE Journal of Solid-State Circuits, Vol. 27, No. 4, pp. 492-499, April 1992
    [VET89] M. Vetterli, P. Duhamel and C. Guillemot, “Trade-Off’s in the Computation of Mono- and Multi-Dimensional DCT’s,” IEEE International Conference on Acoustic, Speech, and Signal Processing, Vol. 2, pp. 999-1002, May 1989
    [WAH05] K. Wahid, V. Dimitrov and G. Jullien, “Error-Free Computation of 8×8 2-D DCT and IDCT Using Two-Dimensional Algebraic Integer Quantization,” IEEE Symposium on Computer Arithmetic, pp. 214-221 June 2005
    [WHI89] S. A. White, “Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review,” IEEE ASSP Magazine, Vol. 6, Issue 3, pp. 4-19, July 1989
    [YUA06] W. Yuan, P. Hao and C. Xu, “Matrix Factorization for Fast DCT Algorithms,” IEEE International Conference on Acoustic, Speech, and Signal Processing, Vol. 3, pp. 948-951, May 2006

    下載圖示 校內:2009-08-26公開
    校外:2009-08-26公開
    QR CODE