| 研究生: |
曾浩軒 Tseng, Hao-Hsuan |
|---|---|
| 論文名稱: |
邊緣引導之影片超解析網路 Edge-Guided Video Super-Resolution Network |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 98 |
| 中文關鍵詞: | 超解析度 、卷積神經網路 、圖像先驗知識 |
| 外文關鍵詞: | super resolution, convolutional neural network, image prior knowledge |
| 相關次數: | 點閱:90 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本文提出一種邊緣引導的影片超解析(Edge Guided Video Super-Resolution, EGVSR)網路,利用圖像的邊緣訊息有效地恢復高解析度畫面的高頻細節。重建過程包括兩個階段:第一階段是將輸入畫面經由粗略畫面重建網路(Coarse Frame Reconstruction Network, CFRN)和邊緣預測網路(Edge-Prediction Network, EPN)的處理分別生成粗略超解析畫面和超解析邊緣,並在第二階段藉由提出的畫面細化網路(Frame Refinement Network, FRN)結合粗略超解析畫面和超解析邊緣以精細更多的高頻細節。先前超解析技術傾向堆疊深度神經網路或採用注意力機制以助於網路效能提升,卻不利於小尺寸物體的重建。我們為此提出注意力融合殘差塊(Attentional Fusion Residual Block, AFRB)以重建不同尺寸的物體。AFRB通過多尺度通道注意力機制進行融合,可以視為傳統殘差塊的增強版本,我們將其用來作為 CFRN 和 EPN 中的基本運算單元。我們的模型在VID4 數據集與最先進的超解析方法相比,減少 54% 參數量的情況下提升約 0.5% 的峰值信噪比(Peak Signal to Noise Ratio, PSNR)及 1.8% 和結構相似性(Structural Similarity, SSIM)。
In this paper, we propose an edge-guided video super-resolution (EGVSR) network that utilizes the edge information of the image to effectively recover high-frequency details for high-resolution frames. The reconstruction process consists of two stages. In the first stage, coarse SR frames and edges are generated by the Coarse Frame Reconstruction Network (CFRN) and the Edge-Prediction Network (EPN), respectively. Then, in the second stage, more SR details are refined by the proposed Frame Refinement Network (FRN). Unlike some prior SR works that tend to increase the depth of networks or use attention mechanisms to reconstruct large-size objects but ignore small-size objects, we propose the Attention Fusion Residual Block (AFRB) to process objects of different sizes. The AFRB, an enhanced version of the conventional residual block, performs fusion through a multi-scale channel attention mechanism and serves as the basic operation unit in the CFRN and the EPN. Compared with the state-of-the-art method, our SR model improves approximately 0.5% in PSNR and 1.8% in SSIM evaluation on the benchmark VID4 dataset when the number of parameters is reduced by 54%.
[1] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,”IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
[2] Q. Dai, S. Yoo, A. Kappeler, and A. K. Katsaggelos, “Dictionarybased multiple frame video super-resolution,”in 2015 IEEE International Conference on Image Processing (ICIP), pp. 83–87, IEEE, 2015.
[3] C. Liu and D. Sun, “On bayesian adaptive video super resolution,”IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 2, pp. 346–360, 2013.
[4] Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for
image super-resolution,” in European conference on computer vision, pp. 184–199, Springer, 2014.
[5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J.
Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the
game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp.484–489, 2016.
[6] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,”science, vol. 313, no. 5786, pp. 504–507, 2006.
[7] E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pretrained deep neural networks for large-vocabulary speech recognition,” IEEE Transactions on audio, speech, and language processing, vol. 20, no. 1, pp. 30–42, 2011.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
[9] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision, pp. 818– 833, Springer, 2014.
[10] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
[11] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,”in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1637–1645, 2016.
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[13] A. L. Maas, A. Y. Hannun, A. Y. Ng, et al., “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, p. 3, Citeseer, 2013.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
[15] C. Osendorfer, H. Soyer, and P. Van Der Smagt, “Image super-resolution with fast approximate convolutional sparse coding,” in International Conference on Neural Information Processing, pp. 250–257, Springer, 2014.
[16] P. Yi, Z. Wang, K. Jiang, J. Jiang, and J. Ma, “Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3106–3115, 2019.
[17] W. Yang, J. Feng, J. Yang, F. Zhao, J. Liu, Z. Guo, and S. Yan, “Deep edge guided
recurrent residual learning for image super-resolution,” IEEE Transactions on Image Processing, vol. 26, no. 12, pp. 5895–5907, 2017.
[18] F. Fang, J. Li, and T. Zeng, “Soft-edge assisted network for single image super-resolution,” IEEE Transactions on Image Processing, vol. 29, pp. 4656–4668, 2020.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770–778, 2016.
[20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700– 4708, 2017.
[21] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,”in 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp. 2528–2535, IEEE, 2010.
[22] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z.
Wang, “Real-time single image and video superresolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883, 2016.
[23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
[24] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
[25] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), pp.3–19, 2018.
[26] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, pp.7794–7803, 2018.
[27] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646–1654, 2016.
[28] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in European conference on computer vision, pp. 391–407, Springer, 2016.
[29] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
[30] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, ´ A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
[31] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), pp. 286–301, 2018.
[32] Y. Hu, J. Li, Y. Huang, and X. Gao, “Channel-wise and spatial feature modulation network for single image super-resolution,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 11, pp. 3911–3927, 2019.
[33] D.-W. Jang and R.-H. Park, “Densenet with deep residual channel-attention blocks for single image super resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.
[34] H. Wang, D. Su, C. Liu, L. Jin, X. Sun, and X. Peng, “Deformable non-local network for video super-resolution,” IEEE Access, vol. 7, pp. 177734–177744, 2019.
[35] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, “Video super-resolution with convolutional neural networks,” IEEE transactions on computational imaging, vol. 2, no. 2, pp. 109–122, 2016.
[36] J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi, “Real-time video super-resolution with spatio-temporal networks and motion compensation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4778–4787, 2017.
[37] M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,”Advances in neural information processing systems, vol. 28, pp. 2017–2025, 2015.
[38] X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia, “Detail-revealing deep video superresolution,”in Proceedings of the IEEE International Conference on Computer Vision, pp. 4472–4480, 2017.
[39] Wang, P. Yi, K. Jiang, J. Jiang, Z. Han, T. Lu, and J. Ma, “Multimemory convolutional neural network for video super-resolution,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2530–2544, 2018.
[40] L. Wang, Y. Guo, Z. Lin, X. Deng, and W. An, “Learning for video super-resolution through hr optical flow estimation,” in Asian Conference on Computer Vision, pp. 514–529, Springer, 2018.
[41] M. S. Sajjadi, R. Vemulapalli, and M. Brown, “Frame-recurrent video super-resolution,”in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634, 2018.
[42] M. Chu, Y. Xie, J. Mayer, L. Leal-Taixe, and N. Thuerey, “Learning ´ temporal
coherence via self-supervision for gan-based video generation,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 75–1, 2020.
[43] Y. Tian, Y. Zhang, Y. Fu, and C. X. TDAN, “temporally-deformable alignment network for video super-resolution. in 2020 ieee,” in CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3357–3366, 2020.
[44] H. Wang, D. Su, C. Liu, L. Jin, X. Sun, and X. Peng, “Deformable non-local network for video super-resolution,” IEEE Access, vol. 7, pp. 177734–177744, 2019.
[45] W. Wang, C. Ren, X. He, H. Chen, and L. Qing, “Video super-resolution via residual learning,” IEEE Access, vol. 6, pp. 23767–23777, 2018.
[46] S. Y. Kim, J. Lim, T. Na, and M. Kim, “3dsrnet: Video super-resolution using 3d convolutional neural networks,” arXiv preprint arXiv:1812.09079, 2018.
[47] X. Wang, K. C. Chan, K. Yu, C. Dong, and C. Change Loy, “Edvr: Video restoration with enhanced deformable convolutional networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.
[48] Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, and K. Barnard, “Attentional feature fusion,”in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3560–3569, 2021.
[49] A. Lucas, S. Lopez-Tapia, R. Molina, and A. K. Katsaggelos, “Generative adversarial networks and perceptual losses for video super-resolution,” IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3312–3327, 2019.
[50] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse representation for image restoration,” IEEE Trans. Image Process., vol. 22, no. 4, pp. 1620–1630, Apr.2013.
[51] H. Shen, “Zoom-based super-resolution reconstruction approach using prior total variation,” Opt. Eng., vol. 46, no. 12, Dec. 2007, Art. no. 127003.
[52] A. Singh and M. Ebrahimi, “Temporally consistent edge-informed video superresolution (edge-vsr),” Journal of Computational Vision and Imaging Systems, vol. 6, no. 1, pp. 1–6, 2020.
[53] J. Wang, G. Teng, and P. An, “Video super-resolution based on generative adversarial network and edge enhancement,” Electronics, vol. 10, no. 4, p. 459, 2021
[54] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533– 536, 1986
[55] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[56] A. Graves, “Generating sequences with recurrent neural networks,” arXiv preprint arXiv:1308.0850, 2013.
[57] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249– 256, JMLR Workshop and Conference Proceedings, 2010.
[58] D. Liu, Z. Wang, Y. Fan, X. Liu, Z. Wang, S. Chang, and T. Huang, “Robust video super-resolution with learned temporal dynamics,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2507–2515, 2017.
[59] M. Haris, G. Shakhnarovich, and N. Ukita, “Recurrent back-projection network for video super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897– 3906, 2019.
[60] H. Chung, S.-h. Lee, and N. I. Cho, “Dsbn: Detail-structure blending network for video super-resolution,” in International Workshop on Advanced Imaging Technology (IWAIT) 2021, vol. 11766, p. 1176607, International Society for Optics and Photonics, 2021.
[61] D. Su, H. Wang, L. Jin, X. Sun, and X. Peng, “Local-global fusion network for video super-resolution,” IEEE Access, vol. 8, pp. 172443– 172456, 2020.
[62] J. Lei, X. He, C. Ren, X. Wu, and Y. Wang, “Video super-resolution network via enhanced deep feature extraction and residual up-down block,” Journal of Electronic Imaging, vol. 29, no. 6, p. 063016, 2020.
[63] H. Song, W. Xu, D. Liu, B. Liu, Q. Liu, and D. N. Metaxas, “Multi-stage feature fusion network for video super-resolution,” IEEE Transactions on Image Processing, vol. 30, pp. 2923–2934, 2021.
[64] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
[65] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3224–3232, 2018.
[66] P. Yi, Z. Wang, K. Jiang, Z. Shao, and J. Ma, “Multi-temporal ultra-dense memory network for video super-resolution,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 8, pp. 2503– 2516, 2019.
[67] M. Drulea and S. Nedevschi, “Total variation regularization of local-global optical flow,” in 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 318–323, IEEE, 2011
[68] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp.448–456, PMLR, 2015.
[69] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
校內:2027-01-07公開