| 研究生: |
林宜豊 Lin, Yi-Li |
|---|---|
| 論文名稱: |
低記憶體頻寬之移動估測演算法及其應用於H.264/AVC移動估測之硬體架構 A Low Memory Bandwidth Fast Motion Search Algorithm and Its Hardware Architecture for H.264/AVC VBSME |
| 指導教授: |
蘇文鈺
Su, W.Y. Alvin |
| 共同指導教授: |
楊中平
Young, Chung-Ping |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 91 |
| 中文關鍵詞: | 快速移動估測 、積體電路 、硬體架構評估 、快速模擬 、低記憶體頻寬 、低計算複雜度 |
| 外文關鍵詞: | motion search, fast search algorithm, video, motion estimation, h.264/avc, variable block size, VLSI, hardware architecture exploration, modeling, trace-based simulation, low memory bandwidth, low computational complexity |
| 相關次數: | 點閱:130 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
動態搜尋估測為廣泛使用在各種視訊壓縮標準中的技術,其主要目的是在已經編碼的影像中找尋與現在欲編碼的區塊相近似的區塊,藉由紀錄它們之間的差異值,進而達成減少需記錄的資料量。動態搜尋估測需要大量的計算及資料存取,其比例通常高達一視訊編碼系統計算量的70%以上,所以也往往是壓縮速度的瓶頸。在設計視訊編碼IC時,動態搜尋估測通常採用全域搜尋(Full Search),此演算法將搜尋區域內所有位置皆估算一次,故其優勢在於控制電路設計較簡便和資料的可重覆利用性較高。另一種方式則是採用快速搜尋演算法(Fast Search),此類演算法經由探討畫面中物體移動歸納出數個特性,依據這些特性來策略性地選擇做估算的位置,藉此來達到減少計算量及搜尋時間之目標,但其主要的問題在於估測位置不固定,所以在控制電路設計上較複雜,再者資料的重覆使用性亦大為降低,且減少估測位置亦可能會造成壓縮後影像的品質下降。
本論文主要針對硬體設計提出一快速搜尋演算法—Tri-Point Search (TS) ,它具有快速搜尋演算法的特性,但改良其資料重覆使用不易的缺陷,從演算法的觀點來看,TS跟Diamond Search (DS)比較起來,所需檢查的位置較少,但比Hexagon-Based Search (HEXBS)略多一些,然而,就所需讀取的資料量而言,TS明顯比兩者更具有優勢,最多可節省到六成的資料存取量,再者TS更容易以硬體的方式實現。
在H.264/AVC視訊壓縮標準中,其動態搜尋估測是以區塊為主,而區塊的大小包含16x16、16x8、8x16、8x8、8x4、4x8、4x4,要同時對這麼多不同大小的區塊平行做動態搜尋,可以採用多核心的處理器來組成動態搜尋引擎,但效率更佳的方式則是設計硬體加速器來達成,但無論採用何種架構,在產品生命及研發週期都相當短的今日,要如何快速決定動態搜尋引擎的實做方式及架構,就是一大問題。本論文中亦試著提出一快速建立系統模型的方法來協助硬體架構及效能的分析及模擬,以大量及真實的測試資料,透過模擬的方式對不同的硬體架構等議題來進行效能上的測試及分析。論文中提出的方式主要以SystemC為基礎,但不牽涉到目標系統的功能實作,所以需要輔助記錄檔來協助模擬,不過此方式能有效縮短模擬時所需要的時間,並且提供系統執行期間的時序資訊(Timing Information),讓設計者可以輕易地找出造成系統效能不佳的部份,加以改進。
最後經由硬體架構分析,加上在演算法方面加入新的資料共享機制,讓TS演變成能有效率地執行H.264/AVC動態搜尋的硬體設計,此設計僅需傳統使用全域搜尋(Full Search)方式的3%計算量及16.9%的資料存取量,在效能上可說是有很大的增進。
Motion search/estimation engines usually require vast amounts of computation power and memory bandwidth. Traditionally, full search (FS) is applied due to its regular control flow and efficient data reuse. At the same time, fast search algorithms have reduced computation complexity and high data consumption while sacrificing image quality. On the other hand, irregular-shaped search patterns introduced by most fast search algorithms have made data reuse less efficient and hardware implementation more difficult. This dissertation proposes the use of a square-based fast search algorithm, Tri-Point search (TS), whose search efficiency is superior to that of the well-known diamond search (DS). Though TS is still slightly inferior to the state-of-the-art hexagon-based search (HEXBS), the amount of data access required by TS is much lower than that of both DS and HEXBS, requiring up to 63% less data access than DS. In addition, the hardware architecture of TS is simpler than that of DS and HEXBS due to its search patterns.
The search algorithm is applied to video coding standard - H.264/AVC. The variable block-size motion estimation (VBSME) in H.264/AVC consumes lots of computing power and usually requires hardwired accelerators. One may adopt multicore- or multiprocessor-like systems, or introduce a dedicated ASIC. Whichever is chosen, due to the short life cycle of products, deciding a suitable architecture for VBSME system rapidly becomes an issue. This dissertation presents a fast hardware architecture exploration (HAE) method for H.264/AVC motion search. The presented model is constructed at an abstract system level by using SytemC instead of implementing details. It incorporates motion search log previously generated by the reference software such that the simulation model produces the identical motion vectors. Changes in VBSME hardware architecture can be easily and quickly reflected in the model. Many design issues involving target specification, such as resource allocations and bus arbitration policies, can be quickly evaluated. In addition, the behavior of the system is highly visible so that it is easy to locate bottlenecks and improve. Modeling in such a high abstraction level also benefits simulation time. The simulation time is reduced up to seven times when compared with that of the corresponding behavior level RTL implementation. Besides, the proposed simulation model is cycle accurate.
After doing hardware architecture exploration, Tri-Point search for H.264/AVC variable block size motion estimation (VBSME) is defined. Moreover, new data reuse technique is introduced for further reduction of data access. The enhanced TS for VBSME is called variable block size TS (VBS-TS). With these improvements, VBS-TS requires only 16.9% memory bandwidth and 3% computation complexity with acceptable PSNR drop when compared with FS.
[1] Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s: Video, ISO/IEC CD 11172-2 (MPEG-1 Video), 1993.
[2] Information technology—Generic coding of moving pictures and associated audio information: Video, ISO/IEC CD 13818-2-ITU-T H.262 (MPEG-2 Video), 1995.
[3] H.264/AVC Advanced Video Coding for Generic Audiovisual Services:ISO/IEC 14496-10:2005(E) ITU-T Rec. H.264(E), Mar., 2005.
[4] (TSS) T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion compensated interframe coding for video conferencing,” Proc. Nat. Telecommun. Conf., New Orleans, LA, 1981, pp. G5.3.1-G5.3.5.
[5] (4SS) L.M. Po and W.C. Ma, “A novel four-step search algorithm for fast block motion estimation,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 6, pp. 313-317, 1996.
[6] (DS) J.Y. Tham, S. Ranganath, M. Ranganath, and A.A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion estimation,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 8, pp. 369-377, 1998.
[7] (DS) S. Zhu and K.K. Ma, “A new diamond search algorithm for fast block-matching motion estimation,” Image Processing, IEEE Trans. on, vol. 9, pp. 287-290, 2000.
[8] (HEXBS) Ce Zhu, Xiao Lin, and Lap-Pui Chau, “Hexagon-based search pattern for fast block motion estimation,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 12, pp. 349-355, 2002.
[9] (NTSS) R. Li, B. Zeng, and M.L. Liou, “A new three-step search algorithm for block motion estimation,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 4, pp. 438-442, 1994.
[10] L.K. Liu and E. Feig, “A block-based gradient descent search algorithm for block motion estimation in video coding,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 6, pp. 419-422, 1996.
[11] M.J. Chen, L.G. Chen, and T.D. Chiueh, “One-dimensional full search motion estimation algorithm for video coding,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 4, pp. 504-509, 1994.
[12] S. Chang, J.H. Hwang, and C.W. Jen, “Scalable array architecture design for full search block matching,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 5, pp. 332-343, 1995.
[13] S.B. Pan, S.S. Chae, and R.H. Park, “A novel VLSI architecture for the full search block matching algorithm using systolic array,” ISCAS 1996, IEEE International Symposium on, 1996, pp. 750-753 vol.2.
[14] J.F. Shen, T.C. Wang, and L.G. Chen, “A novel low-power full-search block-matching motion-estimation design for H.263+,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 11, pp. 890-897, 2001.
[15] Z. He, M.L. Lieu, P.C.H. Chan, and R. Li, “An efficient VLSI architecture for new three-step search algorithm,” Circuits and Systems, Proceedings of the 38th Midwest Symposium on, 1995, pp. 1228-1231 vol.2.
[16] A. Wu and M.F. So, “An efficient VLSI implementation of four-step search algorithm,” Electronics, Circuits and Systems, 1998 IEEE International Conference on, 1998, pp. 503-506 vol.3.
[17] Y.W. Huang, C.Y. Chen, C.H. Tsai, C.F. Shen and L.G. Chen, “Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results,” The Journal of VLSI Signal Processing, vol. 42, pp. 297-320, Mar. 2006.
[18] C.Y. Chen, S.Y. Chien, Y.W. Huang, T.C. Chen, T.C. Wang and L.G. Chen, “Analysis and architecture design of variable block-size motion estimation for H.264/AVC,” IEEE Trans. Circuits and Systems I: Regular Papers, vol. 53, pp. 578-593, 2006.
[19] M. Ravasi and M. Mattavelli, “High-abstraction level complexity analysis and memory architecture simulations of multimedia algorithms,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 15, pp. 673-684, 2005.
[20] http://www.SystemC.org
[21] S. Pasricha, “Transaction level modeling of SoC with SystemC 2.0,” Synopsys Users Group Conference India, 2002.
[22] L. Cai and D. Gajski, “Transaction level modeling: an overview,” Proc. 1st CODES+ISSS, Newport Beach, CA, USA, pp. 19-24., Oct. 2003
[23] N. Calazans, E. Moreno, F. Hessel, V. Rosa, F. Moraes, and E. Carara, “From VHDL register transfer level to SystemC transaction level modeling: a comparative case study,” Proc. 16th SBCCI, Sao Paulo, Brazil, pp. 355-360, Sept. 2003.
[24] N. Bombieri, F. Fummi, and G. Pravadelli, “A methodology for abstracting RTL designs into TL descriptions,” Proc. 4th MEMOCODE, 2006, Napa Valley, CA, USA, pp. 103-112, July 2006
[25] (BBGDS) L. Luo, C. Zou, X. Gao and Z. He, “A new prediction search algorithm for block motion estimation in video coding,” Consumer Electronics, IEEE Trans. on , vol.43, no.1, pp.56-61, Feb 1997
[26] (CDS) C.H. Cheung and L.M. Po, “A novel cross-diamond search algorithm for fast block motion estimation,” Circuits and Systems for Video Technology, IEEE Trans. on , vol.12, no.12, pp. 1168- 1177, Dec 2002
[27] (CDHS) C.H. Cheung and L.M. Po, “Novel cross-diamond-hexagonal search algorithms for fast block motion estimation,” Multimedia, IEEE Trans. on , vol.7, no.1, pp. 16- 22, Feb. 2005
[28] (ADLIS) C.S. Yu and S.C. Tai, “Adaptive Double-Layered Initial Search Pattern for Fast Motion Estimation,” Multimedia, IEEE Trans. on , vol.8, no.6, pp.1109-1116, Dec. 2006
[29] J.B. Xu, L.M. Po and C.K. Cheung, “A new prediction model search algorithm for fast block motion estimation,” Image Processing, 1997. Proceedings., International Conference on , vol.3, no., pp.610-613 vol.3, 26-29 Oct 1997
[30] I. Ismaeil, A. Docef, F. Kossentini, and R. Ward, “Efficient motion estimation using spatial and temporal motion vector prediction,” ICIP 1999, International Conference on , vol.1, no., pp.70-74 vol.1, 1999
[31] Y. Nie and K.K. Ma, “Adaptive rood pattern search for fast block-matching motion estimation,” Image Processing, IEEE Trans. on , vol.11, no.12, pp. 1442- 1449, Dec 2002
[32] J. Wang, D. Wang and W. Zhang, “Temporal compensated motion estimation with simple block-based prediction”, Broadcasting, IEEE Trans. on, vol. 49, no. 3, pp. 241-248, Sept. 2003
[33] (VLSI FS) L. de Vos, and M. Stegherr, “Parameterizable VLSI architectures for the full-search block-matching algorithm,” Circuits and Systems, IEEE Trans. on , vol.36, no.10, pp.1309-1316, Oct 1989
[34] (VLSI FS) K.M. Yang, M.T. Sun and L. Wu, “A family of VLSI designs for the motion compensation block-matching algorithm,” Circuits and Systems, IEEE Trans. on , vol.36, no.10, pp.1317-1325, Oct 1989
[35] (VLSI FS) C.H. Hsieh and T.P. Lin, “VLSI architecture for block-matching motion estimation algorithm ,” Circuits and Systems for Video Technology, IEEE Trans. on , vol.2, no.2, pp.169-175, Jun 1992
[36] (VLSI FS) S.H. Nam, J.S. Baek and M.K. Lee, “Flexible VLSI architecture of full search motion estimation for video applications,” Consumer Electronics, IEEE Trans. on , vol.40, no.2, pp.176-184, May 1994
[37] (VLSI FS) T. Komarek and P. Pirsch, “Array architectures for block matching algorithms,” Circuits and Systems, IEEE Trans. on , vol.36, no.10, pp.1301-1308, Oct 1989
[38] (VLSI TSS) H.M. Jong, L.G. Chen, and T.D. Chiueh, “Parallel architectures for 3-step hierarchical search block-matching algorithm,” Circuits and Systems for Video Technology, IEEE Trans. on , vol.4, no.4, pp.407-416, Aug 1994
[39] (VLSI NTSS) H. Zhongli, M.L. Lieu, P.C.H. Chan and R. Li, “An efficient VLSI architecture for new three-step search algorithm,” Circuits and Systems, Proceedings of the 38th Midwest Symposium on , vol.2, pp.1228-1231 vol.2, 13-16 Aug 1995
[40] (VLSI 4SS) A. Wu and M.F. So, “An efficient VLSI implementation of four-step search algorithm,” Electronics, Circuits and Systems, 1998 IEEE International Conf. on , vol.3, pp.503-506 vol. 3, 1998
[41] S. Dutta and W. Wolf, “A Flexible Parallel Architecture Adopted to Block-matching Motion Estimation Algorithms,” Circuits Syst. Video Technol., IEEE Trans. on, vol. 6, no. 1, 1996, pp. 74–86
[42] W.M. Chao, T.C. Chen, Y.C. Chang, C.W. Hsu and L.G. Chen, “Computationally controllable integer, half, and quarter-pel motion estimator for MPEG-4 Advanced Simple Profile,” ISCAS 2003. Proceedings of the 2003 International Symposium on , vol.2, no., pp. II-788- II-791 vol.2, 25-28 May 2003
[43] S.C. Cheng and H.M. Hang, “A comparison of block-matching algorithms mapped to systolic-array implementation,” Circuits and Systems for Video Technology, IEEE Trans. on , vol.7, no.5, pp.741-757, Oct 1997
[44] (MLS) S. Kappagantula and K. Rao, “Motion Compensated Interframe Image Prediction,” Communications, IEEE Trans. on , vol.33, no.9, pp. 1011- 1015, Sep 1985
[45] R. Srinivasan and K. Rao, “Predictive Coding Based on Efficient Motion Estimation,” Communications, IEEE Trans. on [legacy, pre - 1988], vol. 33, pp. 888-896, 1985.
[46] B. Liu and A. Zaccarin, “New fast algorithms for the estimation of block motion vectors,” Circuits and Systems for Video Technology, IEEE Trans. on , vol.3, no.2, pp.148-157, Apr 1993
[47] W.M. Chao, C.W. Hsu, Y.C. Chang and L.G. Chen, “A novel hybrid motion estimator supporting diamond search and fast full search,” ISCAS 2002, IEEE International Symposium on, 2002, pp. II-492-II-495 vol.2.
[48] W. Li and E. Salari, “Successive elimination algorithm for motion estimation,” Image Processing, IEEE Trans. on , vol.4, no.1, pp.105-107, Jan 1995
[49] X.Q. Gao, C.J. Duanmu, C.R. Zou and Z.Y. He, “Multi-level successive elimination algorithm for motion estimation in video coding,” ISCAS 1999, Proc. of the IEEE International Symposium on , vol.4, no., pp.227-230 vol.4, Jul 1999
[50] S. Eckart and C. Fogg, “ISO/IEC MPEG-2 software video codec.” Proc. SPIE, vol. 2419, pp. 100-118, 1995
[51] J.N. Kim and et al., “A fast motion estimation for software based real-time video coding.” IEEE Trans. Consumer Electronics, vol. 45, NO. 2, pp. 417-426, May 1999
[52] T. Wiegand, M. Lightstone, D. Mukherjee, T.G. Campbell and S.K. Mitra, “Rate-distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard,” Circuits and Systems for Video Technology, IEEE Trans. on, Vol. 6, Issue 2, pp 182-190, Apr 1996
[53] G.B. jontcgaard and K. Lillcvold, “Contest-adaptive VLC (CAVLC) coding of coefficients,” Doc JVT-028, JVT of ISO/MPEG/VCEG 3 Meeting, Rairfas. Virginia, USA, May. 2002.
[54] D. Marpe, H. Schwarz and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” Circuits and Systems for Video Technology, IEEE Trans. on , vol.13, no.7, pp. 620- 636, July 2003
[55] S.Y. Yap and J.V. McCanny, “A VLSI architecture for variable block size video motion estimation,” Circuits and Systems II: Express Briefs, IEEE Trans. on, vol. 51, pp. 384-389, 2004.
[56] M. Kim, I. Hwang and S.I. Chae, “A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264,” Proc. of the ASP-DAC 2005. Asia and South Pacific, 2005, pp. 631-634 Vol. 1.
[57] C.M. Ou, C.F. Le and W.J. Hwang, “An efficient VLSI architecture for H.264 variable block size motion estimation,” Consumer Electronics, IEEE Trans. on, vol. 51, pp. 1291-1299, 2005.
[58] C. Wei and M.Z. Gang, “A novel VLSI architecture for VBSME in MPEG-4 AVC/H.264,” ISCAS 2005, IEEE International Symp. on, 2005, pp. 1794-1797 Vol. 2.
[59] J.H. Lee and N.S. Lee, “Variable block size motion estimation algorithm and its hardware architecture for H.264/AVC,” ISCAS 2004. Proceedings of the 2004 International Symposium on, 2004, pp. III-741-4 Vol.3.
[60] T.C. Chen, Y.H. Chen, S.F. Tsai, S.Y. Chien and L.G. Chen, “Fast Algorithm and Architecture Design of Low-Power Integer Motion Estimation for H.264/AVC,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 17, pp. 568-577, 2007.
[61] S. Wuytack, J.P. Diguet, F.V.M. Catthoor and H.J. De Man, “Formalized methodology for data reuse: exploration for low-power hierarchical memory mappings,” Very Large Scale Integration (VLSI) Systems, IEEE Trans. on, vol. 6, pp. 529-537, 1998.
[62] O.T.C. Chen, “Motion estimation using a one-dimensional gradient descent search,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 10, pp. 608-616, 2000.
[63] M. Kim, I. Hwang and S.I. Chae, “A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264,” Proceedings of the ASP-DAC 2005, pp. 631-634, Vol. 1.
[64] O. Ndili and T. Ogunfunmi, “Hardware-oriented Modified Diamond Search for Motion Estimation in H.264/AVC,” Proc. Of 2010 IEEE 17th Intl. Conf. on Image Processing (ICIP), pp. 749-752, Sept. 2010
校內:2021-01-01公開