研究生: |
歐陽仲威 Ouyang, Chung-Wei |
---|---|
論文名稱: |
一個基於深度學習之HEVC編碼單元快速預測演算法 A fast HEVC Coding Unit Prediction Method based on Deep Learning |
指導教授: |
戴顯權
Ti, Shen-Chuan |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 62 |
中文關鍵詞: | HEVC 、畫面內預測 、CU快速切割 、批次標準化 |
外文關鍵詞: | HEVC, Intra prediction, fast CU partition, batch normalization |
相關次數: | 點閱:144 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
High Efficiency Video Coding(HEVC/H.265)比起上一代視訊編碼Advanced Video Coding (AVC/H.264)有更多樣的編碼單元(Coding Unit, CU)尺寸以及更精細的預測方向,在編碼量上可以進一步降低,但是相對地因為更深的遞迴深度搜尋,計算複雜度也相當可觀,使得在即時的應用上有很大的限制。其中有大部分的研究提出CU大小的提前判斷方法,以避免進行大量編碼率-失真優化(Rate-Distortion Optimization, RDO)的算法,在解碼重建畫面的失真度和編碼位元率之間取得平衡。
針對HEVC編碼單元深度預測的問題,基於從上而下預測全捲積神經網路(Top-Down-Prediction-based Fully Convolutional Network, TDP-FCN)在這篇論文被提出,並嵌入至HM16.5以加速編碼的時間。這個網路使用了全捲積層(Fully Convolutional Layer)的架構,在預測的過程保留了位置資訊,同時也減少了訓練參數量。TDP-FCN 與其他CU預測網路不一樣的地方有兩點,一是使用步長為1的交錯捲積以盡可能獲取編碼單元周圍的像素特徵,二是在捲積層之間加入 Batch Noralization (BN) 提升網路訓練的效率。實驗結果與JCT-VC參考軟體HM16.5比較,在HEVC畫面內模式(intra-mode)下,針對HEVC的測試影片序列,基於TDP-FCN改良的HM16.5編碼器有平均61.72%的編時間節省,相比原始HM16.5只有平均1.84%的編碼效能損失。
High Efficiency Video Coding (HEVC/H.265) has more diverse coding unit sizes and finer prediction directions than the previous generation of video coding standard Advanced Video Coding (AVC/H.264). The number of coding bits can be further reduced by HEVC, but relatively because of deeper recursive depth search, the computational complexity is also considerable, making real-time applications have great limitations. Most of the studies put forward a method to determine the size of the coding unit (CU) in advance to avoid a large number of Rate-Distortion Optimization (RDO) calculations, and to strike a balance between the distortion of the decoded reconstructed picture and the encoding bit rate.
For the problem of HEVC CU depth prediction, Top-Down-Prediction-based Fully Convolutional Network (TDP-FCN) is proposed in this Thesis and embedded in HM16.5 to speed up the encoding time. This network uses the Fully Convolutional Network (FCN) architecture to retain location information during the prediction process and also reduce the number of training parameters. There are two differences between TDP-FCN and other CU prediction networks. One is to use interlaced convolution with stride 1 to obtain the pixel features around the coding unit as much as possible, and the other is to add the Batch Normalization (BN) layer between the convolutional layers to improve the efficiency of network training The experimental results are compared with the JCT-VC reference software HM16.5. In HEVC intra-mode, modified HM16.5 encoder based on TDP-FCN has an average of 61.72% savings in encoding time, and only an average of 1.84% of coding performance loss compared with original HM16.5.
[1] ”Advanced video coding for generic audiovisual services,”ITU-T Rec. H. 264- ISO/IEC 14496-10 AVC, Nov. 2007.
[2] T. Wiegand, G. J. Sullivan, G. Bjontegaard and A. Luthra, "Overview of the H.264/AVC video coding standard," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003.
[3] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) Standard," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec. 2012.
[4] K. R. Rao, "Video coding standards: AVS China, H.264/MPEG-4 part 10, HEVC, VP9, DIRAC and VC-1," 2013 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2013.
[5] S. Paul, A. Norkin, and A. C. Bovik, "Speeding Up VP9 Intra Encoder With Hierarchical Deep Learning-Based Partition Prediction," in IEEE Transactions on Image Processing, vol. 29, pp. 8134-8148, 2020.
[6] HM reference software 16.5, [Online Available]: https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tree/HM-16.5
[7] Y. Piao, J. Min, and J. Chen, "Encoder improvement of unified intra prediction", Doc. JCTVC-C207 JCTVC, October 2010.
[8] F. Chen, D. Jin, Z. Peng, G. Jiang, M. Yu, and H. Chen, “Fast intra coding algorithm for hevc based on depth range prediction and mode reduction,” Multimedia Tools and Applications, vol. 77, no. 21, pp. 28 375–28 394, 2018.
[9] Y.-F. Cen, W.-L. Wang, and X.-W. Yao, “A fast cu depth decision mechanism for hevc,” Information Processing Letters, vol. 115, no. 9, pp. 719–724, 2015.
[10] Y. Zhang, N. Li, S. Kwong, G. Jiang, and H. Zeng, “Statistical early termination and early skip models for fast mode decision in hevc intra coding,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 15, no. 3, pp. 1–23, 2019.
[11] T. Zhang, M.-T. Sun, D. Zhao, and W. Gao, “Fast intra-mode and cu size decision for hevc,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 8, pp. 1714–1726, 2016.
[12] X. Liu, Y. Li, D. Liu, P. Wang, and L. T. Yang, “An adaptive cu size decision algorithm for hevc intra prediction based on complexity classification using machine learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 1, pp. 144–155, 2017.
[13] L. Zhu, Y. Zhang, Z. Pan, R. Wang, S. Kwong, and Z. Peng, “Binary and multi-class learning based low complexity optimization for hevc encoding,” IEEE Transactions on Broadcasting, vol. 63, no. 3, pp. 547– 561, 2017.
[14] M. Grellert, B. Zatt, S. Bampi, and L. A. da Silva Cruz, “Fast coding unit partition decision for hevc using support vector machines,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 6, pp. 1741–1753, 2018.
[15] Z. Liu, X. Yu, Y. Gao, S. Chen, X. Ji, and D. Wang, "CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network," in IEEE Transactions on Image Processing, 2016.
[16] T. Li, M. Xu, and X. Deng, “A deep convolutional neural network approach for complexity reduction on intra-mode hevc,” in 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp. 1255–1260, 2017.
[17] M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, “Reducing complexity of hevc: A deep learning approach,” IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5044–5059, 2018.
[18] D.-T. Dang-Nguyen, C. Pasquini, V. Conotter and G. Boato, "RAISE: A raw images dataset for digital image forensics", Proc. 6th ACM Multimedia Syst. Conf., pp. 219-224, 2015.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., pp. 1097–1105, 2012.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv: 1409.1556. [Online Available]: http://arxiv.org/abs/1409.1556 VGG16
[21] Min Lin, Qiang Chen, Shuicheng Yan: “Network In Network”, 2013;[Online Available]:http://arxiv.org/abs/1312.4400 arXiv:1312.4400
[22] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich: “Going Deeper with Convolutions”, 2014;[Online Available]:http://arxiv.org/abs/1409.4842 arXiv:1409.4842.
[23] G. Bjontegaard, “Calculation of average psnr differences between rdcurves,” VCEG-M33, 2001.
[24] Y. Huang, L. Song, and E. Izquierdo, “Cnn accelerated intra video coding, where is the upper bound?” in 2019 Picture Coding Symposium (PCS). IEEE, pp. 1–5, 2019.
[25] Y. Huang, L. Song, R. Xie, E. Izquierdo, and W. Zhang, "Modeling Acceleration Properties for Flexible INTRA HEVC Complexity Control," in IEEE Transactions on Circuits and Systems for Video Technology
[26] J. Shi, C. Gao, and Z. Chen, “Asymmetric-kernel cnn based fast ctu partition for hevc intra coding,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE , pp. 1–5, 2019.
[27] Z. Chen, J. Shi, and W. Li, “Learned fast hevc intra coding,” IEEE Transactions on Image Processing, vol. 29, pp. 5431–5446, 2020.
[28] Sergey Ioffe, Christian Szegedy: “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, 2015;[Online Available]: http://arxiv.org/abs/1502.03167 arXiv:1502.03167.
[29] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam: “MobileNets: Efficient convolution neural networks for Mobile Vision Applications”, 2017;[Online Available]: https://arxiv.org/abs/1704.04861.