| 研究生: |
古永上 Ku, Yung-Shang |
|---|---|
| 論文名稱: |
MPEG-4之位移估測與離散餘弦轉換硬體電路實現之研究 The Study on Hardware Implementation of MPEG-4 Motion Estimation and DCT |
| 指導教授: |
廖德祿
Liao, Teh-Lu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 英文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 離散餘弦轉換 、位移估測 |
| 外文關鍵詞: | discrete cosine transform, motion estimation |
| 相關次數: | 點閱:108 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在這篇論文中,我們提出MPEG-4之位移估測與離散餘弦轉換的硬體電路模組設計及相關演算法的分析及改進。相關的硬體電路模組包括 System Controller、Block Engine 模組、Motion Estimation 模組、DMA模組、Memif模組、External SRAM和Data Bus。在MPEG-4的系統中包含兩個主要部分,第一個部份為位移估測模組,另一個部份為離散餘弦轉換模組。在位移估測模組部分,我們分析三種不同的演算法,然後根據PSNR和Runtime的兩個指標參數選擇一個演算法來實現成硬體電路。採用PDE 演算法的原因是因為PDE 演算法在PSNR值跟FSBMA演算法的PSNR值相同,但是PDE演算法的Runtime大約只有FSBMA的一半。我們選擇用Adder Tree的硬體架構實現PDE的演算法,因為Adder Tree的硬體架構適合用於在隨機的搜尋視窗位址上計算SAD值。在離散餘弦轉換模組部分,我們修改Guo’s所提DCT/IDCT的演算法,並且利用Madisett’s的硬體架構實現本研究修改後的演算法。從軟體的模擬中,我們可以發現修改後的演算法的效能在PSNR的表現上較Madisett’s的演算法好。另外在硬體電路的實現上,我們只需要額外的五個加法器電路實現我們修改的演算法。在最後的階段,我們利用FPGA 的發展平台驗證MPEG-4之位移估測與離散餘弦轉換的硬體電路的功能正確。MPEG-4之位移估測與離散餘弦轉換的硬體電路的面積為630935 gate count。MPEG-4之位移估測與離散餘弦轉換的硬體電路的最大可操作頻率為18MHz。
In this thesis, the hardware design of Motion Estimation and Discrete Cosine Transform (DCT) for MPEG-4 is presented. The hardware system includes the following seven subsystems: system’s controller, block engine module, motion estimation (ME) module, direct memory access (DMA) module, Memif module, external static random access memory (SRAM) and data bus. There are two main parts in MPEG-4 system. One is motion estimation module. The other is transform coding module. In motion estimation module, we analyze many kinds of motion estimation algorithms and choose one as our adopted algorithm depending on the performances of peak signal to noise ration (PSNR) and the Runtime. The Partial Distortion Elimination (PDE) algorithm is adopted here, because the PSNR of PDE algorithm is the same with that of Full-Search Block Matching Algorithm (FSBMA) algorithm but the Runtime of PDE algorithm is about half of that of FSBMA algorithm. We take adder tree architecture to implement PDE algorithm because the adder tree is easily used to calculate sum absolute difference (SAD) values at random positions. In the transform coding module, we modify the Guo’s DCT/IDCT algorithm and map the modified algorithm to Madisett’s architecture. In the software simulation, we can find that the PSNR of modified algorithm is much better than that of Madisett’s algorithm. In addition, it only needs extra five adders to implement the modified algorithm. A field-programmable gate array (FPGA) development board is used to verify the function of MPEG-4 Motion Estimation and DCT. The area of MPEG-4 Motion Estimation and DCT is 630,932 gate counts. The maximum timing of MPEG-4 Motion Estimation and DCT is 18 Mhz.
[1] ISO/IEC JTC1 IS 11172, Coding of Moving Picture and Coding of Continuous Audio for Digital Storage Media up to 1.5Mbps, 1992
[2] ISO/IEC JTC1/SC29/WG11 Draft CD 13818-2, General Coding of Moving Pictures and Associated Audio, ITU-T Recommendation H.262 Committee Draft, 1994
[3] MPEG-4 Video Group. "Generic coding of audio-visual objects: part 2 -visual 14496-2," ISO/IEC JTC1/SC29/WG11 N2502a, FDIS, Atlantic City, Oct. 1998.
[4] Zahariadis, T. and Kalivas, D., “Fast algorithms for the estimation of block motion vectors,” IEEE Electr., Cir.and Syst. 1996 . ICECS '96., Proc. of the Third IEEE Int. Conf. on, vol.2 pp. 716 -719, Oct. 1996
[5] R. Li, B. Zengand M. L. Liou, "A new three-step search algorithm for block motion estimation, "IEEE Trans. Circuits Syst. Video Technol., Vol. 4, No. 4, pp. 438-442, Aug. 1994.
[6] S. Zhu and K.K. Ma, "A new diamond search algorithm for fast block-matching motion estimation," IEEE Trans. Image Processing, vol. 9, No. 2, pp. 287-290, Feb. 2000.
[7] L.-M. Poand W.-C. Ma, “A Novel four-step search algorithm for fast block motion estimation," IEEE Trans. Circuits Syst. Video Technol., vol. 6, No. 3, pp. 313-317, Jun. 1996.
[8] W. Li and E. Salari, "Successive elimination algorithm for motion estimation," IEEE Trans. on Image Processing, vol. 4, No.1, pp. 105-107. Jan. 1995.
[9] N. Ahmed, T. Natarajanand K. R. Rao, ”Discrete cosine transform,” IEEE Trans. on Communications, vol.COM-23, pp.90-93, Jan. 1974.
[10]J.I. Guo, R.C. Juand J.W. Chen, “An efficient 2-D DCT/IDCT core design Using cyclic convolution and adder-based realization,” IEEE Trans. Circuits and Systems for Video Technology, Vol.14, No.4, pp. 416-428, Apr. 2004
[11]H. Yeo and Y.H. Hu. “A novel modular systolic array architecture for full-search block matching motion estimation,” IEEE Trans. Circuits and Systems, vol. No.5, pp.407-416, Otc. 1995.
[12]Y.S. Jehng, L.G. Chenand T.D. Chiueh, “An efficient and simple VLSI tree architecture for motion estimation algorithms," IEEE Trans. Signal Processing, vol. 41, No. 2, pp.889-900, Feb. 1993.
[13]J.I. Guo, R.C. Juand J.W. Chen, “An efficient 2-D DCT/IDCT core design Using cyclic convolution and adder-based realization,” IEEE Trans. Circuits and Systems for Video Technology, Vol.14, No.4, pp. 416-428, Apr. 2004
[14]Y. P. Lee, et.al, “A cost effective architecture for 8 x 8 two-dimensional DCT/IDCT using direct method, ”IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, no. 3, pp. 459-467, June 1997.
[15]Y. H. Hu and Z. Wu, “An efficient CORDIC array structure for the implementation of discrete cosine transform,” IEEE Trans. on Signal Processing, vol. 43, no. 1, pp.331-336, Jan. 1995.
[16]N. I. Cho and S. U. Lee, “DCT algorithms for VLSI parallel implementation,” IEEE Trans. on Acoustics, Speechand Signal Processing, vol. 38, no. 1, pp. 121-127, 1990.
[17]M. T. Sun, T.C. Chenand A. M. Gottlieb, “VLSI implementation of a 16x16 discrete cosine transform,” IEEE Trans. on Circuits and Systems-II, vol. 36, no. 4, pp. 610-616, 1989.
[18] D. W. Kim, et. Al., “A compatible DCT/IDCT architecture using hardwired distributed arithmetic,” Proc. ISCAS’ 2001, pp. II-457~ II-460, 2001.