簡易檢索 / 詳目顯示

研究生: 葉家茂
Yeh, Chia-Mao
論文名稱: 一個輕量化卷積神經網路模型之VVC快速QT-MTT分區演算法
A Lightweight CNN Model for VVC Fast QT-MTT Partition
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 71
中文關鍵詞: VVC畫面內預測CU 快速分區QT-MTT
外文關鍵詞: VVC, Intra prediction, fast CU partition, QT-MTT
相關次數: 點閱:99下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • VVC (Versatile Video Coding, 也稱H.266) 是最新一代的視訊編碼標準,相較於上一代的標準HEVC (High Efficiency Video Coding, 也稱H.265) 顯著提高了編碼效率,但代價是複雜度的急劇增加。在VVC中,許多新的技術被引入,而其中一項為QT-MTT (Quad-Trees & Multi-Type Trees) 塊分區結構。相較於HEVC只使用QT結構進行CU (Coding Unit) 塊分區,VVC多了MTT塊分區結構,其中包括垂直和水平方向的BT (Binary Tree) 與TT (Ternary Tree)。這種新的CU塊分區結構導致每個CU塊有五種可能的分區,加上不分割總共有六種模式的選擇,如此一來將進行大量RDO (Rate-Distortion Optimization) 的計算。研究結果顯示在畫面內編碼時基於QT-MTT的CU塊分區占編碼時間97%以上;因為如此,本文提出一種基於深度學習方法的快速分區演算法。

    根據對VVC標準的觀察,發現CU塊的大小和位置出現了規律。本文藉此歸納出一些模式,因而提出了一個名為拼圖演算法。該算法基於一個新穎的想法:不是預測將使用六種模式中的哪一種,而是預測每個CU塊是否將被分割。在編碼時,只需要預測七個不同大小的CU塊是否分割。之後,識別出所有不需分割的CU塊,並將它們從大到小依次放入相應的區域中。在相同大小的CU塊的情況下,具有較高預測分數的CU塊被優先考慮,如果CU塊與先前放置的CU塊重疊,則將其丟棄。其餘區域稍後由VTM(VVC Test Model)填充。這種方法降低了預測模型的複雜性並顯著減少了編碼時間。在VTM 17.2上的實驗結果顯示當BDBR(Bjontegaard Delta Bit Rate)保持在5.78%時,編碼時間減少了74.57%。

    Versatile Video Coding (VVC/H.266) is the latest generation video coding standard that significantly improves coding efficiency compared to its predecessor, High Efficiency Video Coding (HEVC/H.265). However, this improvement comes at the cost of increased complexity. In VVC, several new techniques have been introduced, including the Quad-Trees & Multi-Type Trees (QT-MTT) block partition structure. Unlike HEVC, which only utilizes the QT structure for Coding Unit (CU) block partitioning, VVC incorporates the MTT block partition structure, which includes both vertical and horizontal Binary Tree (BT) and Ternary Tree (TT) structures. This new CU block partition structure leads to five possible partitions for each CU block, and when combined with the "no split" option, there are a total of six mode choices, resulting in a large number of Rate-Distortion Optimization (RDO) calculations. Research indicates that QT-MTT-based CU block partitioning accounts for over 97% of the encoding time in intra-frame coding. Therefore, this Thesis proposes a fast partition algorithm based on deep learning methods.

    Based on observations of the VVC standard, it is found that the size and position of CU blocks exhibit regular patterns. This led to the induction of certain modes and the proposal of an algorithm called the "puzzle algorithm." This algorithm is based on a novel idea: instead of predicting which of the six modes will be used, the prediction focusses on whether each CU block will be partitioned or not. During encoding, only the prediction of seven different-sized CU block partitions is needed. After that, all non-partitioned CU blocks are identified, and they are sequentially placed into corresponding regions from large to small. In the case of equally sized CU blocks, those with higher predicted scores are prioritized, and if a CU block overlaps with a previously placed one, it is discarded. The remaining regions are filled by the VTM (VVC Test Model) later. This approach reduces the complexity of the prediction model and significantly decreases encoding time. Experimental results on VTM 17.2 showed a 74.57% reduction in encoding time while maintaining a BDBR (Bjontegaard Delta Bit Rate) of 5.78%.

    摘要 i Abstract iii Acknowledgments v Contents vi List of Tables viii List of Figures ix Chapter 1 Introduction 1 1.1 Overview of Video Compression 1 1.2 Issues of Computation Reduction 2 1.3 Fast Prediction Method Based on Deep Learning 3 1.4 Organization of this Thesis 5 Chapter 2 Background and Related Work 6 2.1 Introduction to H.266/VVC 6 2.1.1 Coding Configurations of VVC 7 2.1.2 Frame Type 10 2.1.3 Slices, Tiles, Bricks, and Coding Tree Unit (CTU) 11 2.1.4 Coding Unit (CU) & QT-MTT 14 2.1.5 Rate-Distortion Optimization (RDO) 17 2.1.6 Intra Prediction [13-14] 19 2.2 Related Work 22 2.2.1 Edge Detection Methods 23 2.2.2 Machine Learning Methods 24 2.2.3 Deep Learning Methods 25 Chapter 3 The Proposed Method 27 3.1 CPH Database [37] 27 3.1.1 Get CU block partition structure 28 3.1.2 Extract feature 30 3.1.3 Integrate into a Database 30 3.2 VVC Test Model (VTM) 31 3.3 Idea 32 3.4 Network Design Principle 34 3.4.1 Feature Extraction 36 3.4.2 Classifier 38 3.5 Algorithm 40 Chapter 4 Experimental Results 43 4.1 Test Environment 43 4.2 Training Strategy 45 4.3 Training Performance 46 4.4 Measurement of Encoding Performance 49 4.5 Encoding performance compared with VTM 17.2 51 4.6 Compared with deep learning-based methods 61 Chapter 5 Conclusions and Future Works 64 5.1 Conclusion 64 5.2 Future Work 65 References 67

    [1] “High efficiency video coding,” ITU-T Rec. H.265 ISO/IEC 23008-2 HEVC, Aug. 2021.
    [2] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649 - 1668, Dec. 2012.
    [3] “Versatile video coding,” ITU-T Rec. H.266 ISO/IEC 23090-3 VVC, Apr. 2022.
    [4] B. Bross, Y. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J. Ohm, “Overview of the Versatile Video Coding (VVC) Standard and Its Applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736 - 3764, Oct. 2021.
    [5] M. Viitanen, J. Sainio, A. Mercat, A. Lemmetti, and J. Vanne, “From HEVC to VVC: The First Development Steps of a Practical Intra Video Encoder,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 68, no. 2, pp. 139 - 148, May 2022.
    [6] Y. Huang, J. An, H. Huang, X. Li, S. Hsiang, K. Zhang, H. Gao, J. Ma, and O. Chubach, “Block Partitioning Structure in the VVC Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3818 - 3833, Oct. 2021.
    [7] VTM reference software 17.2, [Online Available]: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/releases/VTM-17.2
    [8] M. Saldanha, G. Sanchez, C. Marcon, and L. Agostini, “Complexity Analysis Of VVC Intra Coding,” Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 3119 - 3123, Oct. 2020.
    [9] A. Tissier, A. Mercat, T. Amestoy, W. Hamidouche, J. Vanne, and D. Menard, “Complexity reduction opportunities in the future VVC intra encoder,” Proc. IEEE 21st Int. Workshop Multimedia Signal Process. (MMSP), pp. 1 - 6, Sep. 2019.
    [10] T. Li, M. Xu, R. Tang, Y. Chen, and Q. Xing, “DeepQTMT: A Deep Learning Approach for Fast QTMT-Based CU Partition of Intra-Mode VVC,” IEEE Transactions on Image Processing, vol. 30, pp. 5377 - 5390, May 2021.
    [11] P. Fu, C. Yen, N. Yang, and J. Wang, “Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks,” Proc. IEEE 3rd Int. Conf. on Artificial Intelligence Circuits and Systems (AICAS), pp. 1 - 6, Jun. 2021.
    [12] S. Wu, J. Shi, and Z. Chen, “HG-FCN: Hierarchical Grid Fully Convolutional Network for Fast VVC Intra Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 8, pp. 5638 - 5649, Aug. 2022.
    [13] A. Filippov, and V. Rufitskiy, “Recent Advances in Intra Prediction for the Emerging H.266/VVC Video Coding Standard,” 2019 Int. Multi-Conf. on Engineering, Computer and Information Sciences (SIBIRCON), pp. 525 - 530, Oct. 2019.
    [14] J. Pfaff, A. Filippov, S. Liu, X. Zhao, J. Chen, S. D.-L.-Hernández, T. Wiegand, V. Rufitskiy, A. K. Ramasubramonian, and G. V. der Auwera, “Intra Prediction and Mode Coding in VVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3834 - 3847, Oct. 2021.
    [15] N. Tang, J. Cao, F. Liang, J. Wang, H. Liu, X. Wang, and X. Du, “Fast CTU Partition Decision Algorithm for VVC Intra and Inter Coding,” Proc. IEEE Asia Pacific Conf. on Circuits and Systems (APCCAS), pp. 361 - 364, Nov. 2019.
    [16] Y. Fan, J. Chen, H. Sun, J. Katto, and M. Jing, “A Fast QTMT Partition Decision Strategy for VVC Intra Prediction,” IEEE Access, vol. 8, pp. 107900 – 107911, Jun. 2020.
    [17] H. Liu, S. Zhu, R. Xiong, G. Liu, and Bing Zeng, “Cross-Block Difference Guided Fast CU Partition for VVC Intra Coding,” Proc. Int. Conf. on Visual Communications and Image Process. (VCIP), pp. 1 – 5, Dec. 2021.
    [18] Z. Zhang, C. Fu, K. Xie, H. Hong, and G. Su, “Fast VVC Intra Coding by Skipping Redundant Coding Block Structures and Unnecessary Directional Partition,” Proc. IEEE 5th Int. Conf. on Multimedia Information Process. and Retrieval (MIPR), pp. 84 - 89, Aug. 2022.
    [19] C. Ni, S. Lin, P. Chen, and Y. Chu, “High Efficiency Intra CU Partition and Mode Decision Method for VVC,” IEEE Access, vol. 10, pp. 77759 – 77771, Jul. 2022.
    [20] J. Zhao, P. Li, and Q. Zhang, “A Fast Decision Algorithm for VVC Intra-Coding Based on Texture Feature and Machine Learning,” Computational Intelligence and Neuroscience, Hindawi, Sept. 2022.
    [21] Q. He, W. Wu, L. Luo, C. Zhu, and H. Guo, “Random Forest Based Fast CU Partition for VVC Intra Coding,” Proc. IEEE Int. Symposium on Broadband Multimedia Systems and Broadcast. (BMSB), pp. 1 - 4, Aug. 2021.
    [22] M. Saldanha, G. Sanchez, C. Marcon, and L. Agostini, “Configurable Fast Block Partitioning for VVC Intra Coding Using Light Gradient Boosting Machine,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 6, pp. 3947 - 3960, Aug. 2021.
    [23] A. Tissier, W. Hamidouche, S. B. D. Mdalsi, J. Vanne, F. Galpin, and D. Menard, “Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders,” IEEE Transactions on Circuits and Systems for Video Technology, Early Access.
    [24] G. Wu, Y. Huang, C. Zhu, L. Song, and W. Zhang, “SVM Based Fast CU Partitioning Algorithm for VVC Intra Coding,” 2021 IEEE Int. Symposium on Circuits and Systems (ISCAS), pp. 1 - 5, May 2021.
    [25] Y. Wang, Y. Liu, J. Zhao, and Q. Zhang, “Fast CU partitioning algorithm for VVC based on multi-stage framework and binary subnets,” IEEE Access, May 2023.
    [26] S. Park, and J. Kang, “Fast Multi-Type Tree Partitioning for Versatile Video Coding Using a Lightweight Neural Network,” IEEE Transactions on Multimedia, vol. 23, pp. 4388 - 4399, 2021.
    [27] J. Zhang, M. Wang, C. Jia, Q. Wang, S. Wang, S. Ma, and W. Gao, “Fast Partition Mode Decision via a Plug-in Fully Connected Network for Video Coding,” Proc. Data Compression Conference (DCC), IEEE, pp. 222 - 231, Mar. 2022.
    [28] G. Tech, J. Pfaff, H. Schwarz, P. Helle, A. Wieckowski, D. Marpe, and T. Wiegand, “Fast Partitioning for VVC Intra-Picture Encoding With a CNN Minimizing the Rate-Distortion-Time Cost,” Proc. Data Compression Conference (DCC), IEEE, pp. 3 - 12, Mar. 2021.
    [29] G. Tech, J. Pfaff, H. Schwarz, P. Helle, A. Wieckowski, D. Marpe, and T. Wiegand, “Rate-Distortion-Time Cost Aware CNN Training for Fast VVC Intra-Picture Partitioning Decisions,” Proc. Picture Coding Symposium (PCS), IEEE, pp. 1 - 5, Jul. 2021.
    [30] G. Tang, M. Jing, X. Zeng, and Y. Fan, “Adaptive CU Split Decision with Pooling-variable CNN for VVC Intra Encoding,” Proc. IEEE Visual Communications and Image Process. (VCIP), pp. 1 - 4, Dec. 2019.
    [31] A. Tissier, W. Hamidouche, J. Vanne, F. Galpin, and D. Menard, “CNN Oriented Complexity Reduction Of VVC Intra Encoder,” Proc. IEEE Int. Conf. on Image Processing (ICIP), pp. 3139 - 3143, Oct. 2020.
    [32] B. Abdallah, S. B. Jdidia, F. Belghith, M. A. B. Ayed, and N. Masmoudi, “A CNN-based QTMT partitioning decision for the VVC standard,” Proc. IEEE Int. Conf. on Design & Test of Integrated Micro & Nano-Systems (DTS), pp. 1 - 5, Jun. 2022.
    [33] X. HoangVan, S. NguyenQuang, M. DinhBao, M. DoNgoc, and D. T. Duong, “Fast QTMT for H.266/VVC Intra Prediction using Early-Terminated Hierarchical CNN model,” Proc. Int. Conf. on Advanced Technologies for Communications (ATC), IEEE, pp. 195 - 200, Oct. 2021.
    [34] J. Zhao, A. Wu, B. Jiang, and Q. Zhang, “ResNet-Based Fast CU Partition Decision Algorithm for VVC,” IEEE Access, vol. 10, pp. 100337 - 100347, Sept. 2022.
    [35] Z. Liu, T. Li, Y. Chen, K. Wei, M. Xu, and H. Qi, “Deep Multi-task Learning based Fast Intra-mode Decision for Versatile Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, Early Access, Mar. 2023.
    [36] H. Li, P. Zhang, B. Jin, and Q. Zhang, “Fast CU Decision Algorithm Based on Texture Complexity and CNN for VVC,” IEEE Access, vol. 11, pp. 35808 – 35817, Apr. 2023.
    [37] CPH Database, [Online Available]: https://github.com/HEVC-Projects/CPH.git
    [38] G. Bjontegaard, “Calculation of average psnr differences between rdcurves,” VCEG M33, 2001.
    [39] ETRO's Bjontegaard Metric implementation for Excel, [Online Available]: https://github.com/tbr/bjontegaard_etro.git
    [40] CPIV Database, [Online Available]: https://github.com/tianyili2017/CPIV.git
    [41] C.-C. Jay Kuo, and A. M. Madni, “Green learning: Introduction, examples and outlook,” Journal of Visual Communication and Image Representation, vol. 90, Feb. 2023.

    下載圖示 校內:2024-08-31公開
    校外:2024-08-31公開
    QR CODE