簡易檢索 / 詳目顯示

研究生: 陳政廷
Chen, Cheng-Ting
論文名稱: 針對H.265/HEVC 整數運動向量估計開發之快速深度決策演算法及其硬體架構設計
Fast CU Depth Decision Algorithm and VLSI Architecture Design for H.265/HEVC Integer Motion Estimation
指導教授: 賴源泰
Lai, Yen-Tai
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 76
中文關鍵詞: 高效率影像編碼移動估計超大型積體電路設計
外文關鍵詞: H.265/HEVC, motion estimation, VLSI design
相關次數: 點閱:84下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • H.265/HEVC是目前最新的視訊規格,其目標是達到比H.264視訊壓縮標準更高的編碼效率, 其適用範圍更從HD(720p 或1080p) 增加到UltraHD(3840x2160)的影片。此外HEVC提供的編碼單位(coding unit, CU),其預設大小從8x8到64x64不等,取代H.264中固定大小的編碼單位(marcoblock, MB),使得位元率(bitrate)得以下降,卻也大幅增加了計算複雜度。
    在影像編碼中,整數運動向量估計(integer motion estimation, IME)佔據了大部分的計算複雜度,若要達到實時計算(real time calculation),要藉由超大型硬體電路架構(VLSI)輔以平行處理來達成。由於CU 深度資訊存在很高程度空間及時間相關性,我們對此加以統計並分析,最後提出了一個新的快速深度判斷演算法(FCDD)來簡化編碼單位的深度運算,約可節省編碼軟體28~52%的編碼時間。以此為基礎並將其活用到超大型積體電路設計上,輔以
    可變動的搜尋範圍,大幅度地降低整數運動向量搜尋的運算週期時間。在硬體設計上也提出了可合併之處理單元陣列(merged PE array),若演算法選擇較較低深度,則可省去大量差分絕對值和(Sum of absolute difference, SAD)運算。此外,於不同合併模式下可共用相同的記憶體,無須額外的記憶體及面積花費。最後使用Verilog 硬體描述語言來實現其電路架構,再透過Synopsys Design Compiler 與 TSMC 90nm GUTM 製作電路合成,結果顯示可達到支援H.265/HEVC 進行每秒30 張Ultra HD 解析度影片實時運算的目標。

    H.265/HEVC is the latest video standard, its goal is to achieve a higher compression coding efficiency than H.264, and its supported resolutions is from HD (720p or 1080p) to Ultra HD (3840x2160). Furthermore, coding unit, CU, is provided in HEVC. Its size is ranging from 8x8 to 64x64, replacing the fixed-size marcoblock in H.264. The new technique results in the decrease in bitrate but the increase in computational complexity.
    In video encoding, the integer motion vector estimation, IME, is accounted for most of the computational complexity. In order to achieve real-time computation, a VLSI parallel processing architecture is necessary. Since there is a high degree of spatial and temporal correlation in CU depth, the property is taken into statistics and analysis, and finally a new fast CU depth decision (FCDD) is proposed to simplify the computation in depths, which can save about 28~ 52% encoding time when running on software. This algorithm is utilized as a basis for VLSI design, supplemented by a concept of variable search range, resulting in great decreases in the cycle time of integer motion estimation. The hardware is further designed to merge the processing element arrays, and if the algorithm chooses the lower depth, a large amount of calculation in sum of absolute
    difference, SAD, is saved. In addition, different merge modes can share the same memory, so no extra cost is needed. Finally the Verilog hardware description language is ultilized to achieve the circuit architecture. Then, it is synthesized through the Synopsys Design Compiler with TSMC 90nm GUTM. The experimental results showed that the proposed hardware can support the real-time computation of Ultra HD resolution video at 30 frames per second in H.265/HEVC.

    Abstract Contents List of Figures List of Tables CHAPTER 1 Introduction ………………………………………………1 1.1 Background………………………………………………………………………1 1.2 Research Motivation and Purpose…………………………………………2 1.3 Outline of this Thesis………………………………………………………………3 CHAPTER 2 Video Processing in H.265/HEVC……………………4 2.1 Video Picture Format……………………………………………………4 2.2 Outlook of HEVC…………………………………………………………………6 2.3 Video Coding Layer of HEVC…………………………………………………8 2.4.1 Coding Unit, CU…………………………………………………………………9 2.4.2 Inter Prediction of Prediction Unit……………………………………………10 2.4.3 Intra Prediction of Prediction Unit………………………………………11 2.5.1 Inter Motion Estimation, IME………………………………………………13 2.5.2 Mode Decision of Motion Estimation………………………………………14 2.5.3 Motion Vector Prediction, MVP……………………………………………15 2.5.4 Merge Mode…………………………………………………………………18 2.5.5 Fractional Motion Estimation, FME……………………………………18 2.5.6 Multi-Reference Frame………………………………………………19 2.6.1 Transform Unit, TU………………………………………………………20 2.6.2 Transform……………………………………………………………………20 2.6.3 Quantization……………………………………………………………………22 2.6.4 Diagonal Scanning…………………………………………………………23 2.6.5 Entropy Coding…………………………………………………………………24 2.7 In-loop Filter……………………………………………………………………25 2.8 Complexity Analysis…………………………………………………27 2.9 Multicore parallel processing…………………………………………………30 CHAPTER 3 Fast Algorithm Applied in Inter Prediction…………32 3.1 Previous Works…………………………………………………………………32 3.2.1 Test Environment of Analysis……………………………………………34 3.2.2 Statistical Results…………………………………………………………………35 3.3 Proposed Fast CU Depth Decision…………………………………………………37 3.4.1 Test Environment of Performance Analysis……………………………38 3.4.2 Performance Analysis………………………………………………………39 CHAPTER 4 Proposed VLSI Architecture of Asymmetric VBSME……44 4.1 Previous Works………………………………………………………………44 4.2 Data-Reuse Schemes…………………………………………………………49 4.3.1 Proposed Architecture…………………………………………………………51 4.3.2 Processing Element………………………………………………………52 4.3.3 Data Flow of PEA and MPEA……………………………………52 4.3.4 Adder Tree………………………………………………………………………55 4.3.5 CU Depth Controller…………………………………………………………57 4.3.6 Memory Organization………………………………………………………59 4.4 Experimental Results…………………………………………………………64 4.5 Synthesis results………………………………………………………………68 CHAPTER 5 Conclusions………………………………………………72 References………………………………………………………………73

    [1]S. Oudin, P. Helle, J. Stegemann, C. Bartnik, B. Bross, D. Marpe, H. Schwarz and T. Wiegand, “Block Merging for Quadtree-Based Video Coding,” VDE ITG 3.2 Treffen, July 2011, Aachen.
    [2]Gary J. Sullivan, Fellow, IEEE, Jens-Rainer Ohm, Member, IEEE, Woo-Jin Han,Member, IEEE, and Thomas Wiegand, Fellow, IEEE, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions On Circuits And Systems For Video Technology, Vol. 22, No. 12, December 2012.
    [3]Jian-Liang Lin, Yi-Wen Chen, Yu-Pao Tsai, Yu-Wen Huang, Shawmin Lei, “Motion Vector Coding Techniques for HEVC,” Multimedia Signal Processing (MMSP), 2011 IEEE 13th International Workshop on, pp. 17-19 Oct. 2011
    [4]Z. Liu, L. Shen, and Z. Zhang, “Block Inter Mode Decision For Fast Encoding of H.264,” IEEE ICASSP, vol. 3, pp. 181-184, May 2004.
    [5]X. Jing and L. Chau, “Fast approach for H.264 inter mode decision,”Electronics Letters, vol. 40, no. 17, pp. 1050–1052, Sep. 2004.
    [6]C. Crecos and M. Y. Yang,“Fast Inter Mode Predictionfor P Slices in the H264 Video Coding Standard,”IEEE Trans. Broadcasting, vol. 51, no. 2, pp. 256 -263, June 2005.
    [7] B. G. Kim and S. K. Song,“Enhanced inter-mode decision based on contextual prediction for P-slice in H.264/AVC videocoding,”ETRI J.,vol. 28,pp.425,2006
    [8] Z. Liu, L. Shen, and Z. Zhang, “An Efficient Intermode Decision Algorithm Based on Motion Homogeneity for H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 1, pp. 128-132, Jan. 2009.
    [9] L. Shen, Z. Liu, Z. Zhang, and X. Shi, “Fast Inter Mode Decision Using Spatial Property Of Motion Field,” IEEE Trans. Multimedia, vol. 10, no.6, pp. 1208–1214, 2008.
    [10]B.-G. Kim, “Novel Inter-Mode Decision Algorithm Based on Macroblock (MB) Tracking for the P-Slice in H.264/AVC Video Coding,”IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 2, pp. 273–279,Feb. 2008.
    [11]H. Wang, S. Kwong, and C.-W. Kok, “An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization,” IEEE Trans. Multimedia, vol. 9, no. 4, pp. 882–888, Apr. 2007.
    [12]H. Zeng, C. Cai, and K.-K. Ma, “Fast Mode Decision for H.264/AVC Based on Macroblock Motion Activity,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 19, no. 4, pp. 1–10, Apr. 2009.
    [13]M.-J. Chen, G.-L. Li, Y.-Y. Chiang, and C.-T. Hsu, “Fast Multiframe Estimation Algorithms by Motion Vector Composition for the MPEG-4/AVC/H.264 Standard,” IEEE Trans. Multimedia, vol. 8, no.3, pp. 478–487, Mar. 2006.
    [14]S.-E. Lim, J.-K. Han, and J.-G. Kim, “An Efficient Scheme for Motion Estimation Using Multireference Frames in H.264/AVC,” IEEE Trans.Multimedia, vol. 8, no. 3, pp. 457–466, Mar. 2006.
    [15]D. Wu, S. Wu, K. P. Lim, F. Pan, Z. G. Li, X. Lin, “Content based hierarchical fast coding unit decision algorithm for HEVC,” IEEE CMSP,vol. 1, pp. 56-59, May 2011.
    [16]J. Lee and B. Jeon, “Fast mode decision for H.264,” IEEE ICME, June 2004.
    [17]C. Rhee, J., Kim, H. Lee, “Cascaded Direction Filtering for Fast Multidirectional Inter-Prediction in H.264/AVC Main and High Profile Compression,” IEEE Trans. Circuits and Systems for Video Technology, vol. 22, no. 3, pp. 403-413, 2012.
    [18]JCT-VC, “Coding tree pruning based CU early termination,” JCTVC-F092, 6th JCT-VC Meeting, Torino, July 2011.
    [19]JCT-VC, “Early termination of CU encoding to reduce HEVC complexity,” JCTVC-F045, 6th JCT-VC Meeting, Torino, July 2011.
    [20]Yang K-M, Sun M-T, Wu L, “A family of VLSI designs for the motion compensation block-matching algorithm,” IEEE Trans. Circuits System Video Technology, vol. 36, no. 10, pp. 1317–1325, Oct. 1989.
    [21]Yeo H, Hu Y-H, “A novel modular systolic array architecture for full-search block matching motion estimation,” IEEE Trans. Circuits System Video Technology, vol.5, no.5, pp. 407-416, May 1995.
    [22]Komerek T, Pirsch P, “Array architectures for block matching algorithms,” IEEE Trans. Circuits System Video Technology, vol. 36, no. 10, pp. 1301–1308, Oct 1989.
    [23]Vos L, Stegherr M, “Parameterizable VLSI architectures for the full-search block matching algorithm,” IEEE Trans Circuits System, vol. 36, no. 10, pp. 1309–1316, May 1989.
    [24]Yap S, McCanny, “A VLSI architecture for variable block size video motion estimation,” IEEE Trans Circuits System II, Express Briefs, vol. 51, no. 7 ,pp. 384–389, July 2004.
    [25]Huang Y-W, Wang T-C, Hsieh B-Y, Chen L-G, ”Hardware architecture design for variable block size motion estimation in MPEG-4 AVC/JVT/ITU-T H.264,” IEEE international symposium on circuits and systems, Bangkok, Thailand, pp. 796–799, May 2003.
    [26]Kim M, Hwang I, Chae S-I, “A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264,” In Proceedings of Asia and South Pacific design automation conference, Shanghai, China, pp. 631–634, January 2005.
    [27]Chen T-C, Chien S-Y, Huang Y-W, Tsai C-H, Chen C-Y, Chen T-W, Chen L-G , ”Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder,” IEEE Trans. Circuits System Video Technology, vol. 16, no. 6, pp. 673-688, June 2006.
    [28]An-Chao Tsai, Bharanitharan K., Jhing-Fa Wang, Kuan I. Lee, “Effective Search Point Reduction Algorithm and its VLSI Design for HDTV H.264/AVC Variable Block Size Motion Estimation,” IEEE Trans. Circuits and Systems for Video Technology, vol. 22, no.7, pp. 981-988, July 2012.
    [29]Tuan J-C, Chang T-S, Jen C-W, “The data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture,” IEEE Trans. Circuits System Video Technology, vol.12, no. 1, pp.61–72, Jan. 2002.
    [30]C. Y. Chen, S. Y. Chine, Y. W. Hung, T. C. Chen, T. C. Wang, and L. G. Chen, “Analysis and architecture design of variable block-size motion estimation for H.264/AVC,” IEEE Trans. Circuit Syst.-I, vol. 53, no. 2, pp. 578-593, Feb. 2006.
    [31]S. Y. Yap and V. McCanny, “A VLSI architecture for variable block size video motion estimation,” IEEE Trans. Circuits Syst. –II, vol. 51, no. 7, pp.384-389, Jul. 2004.
    [32]J. Kim and T. Park, “A novel VLSI structure for full-search variable block size motion estimation,” IEEE Trans. Consumer Electron., vol. 55, no. 2, pp. 728-733, May 2009.
    [33]C. M. Ou, C. F. Le, and W. J. Hwang, “An efficient VLSI architecture for H.264 variable block size motion estimation,” IEEE Trans. Consumer Electron., vol. 51, no. 4, pp. 1291-1299, Nov. 2005.
    [34]Donggyu Sim, “Overview of the high efficient video coding (HEVC) standard,” Kwangwoon University, Nov. 2012.

    無法下載圖示 校內:2023-01-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE