| 研究生: |
張珈偉 Zhang, Jia-Wei |
|---|---|
| 論文名稱: |
基於速度注意力之雙分支影片超解析度演算法 Dual-Branch Video Super-Resolution Network With Velocity Attention |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 影片超解析度 、卷積神經網路 、生成對抗網路 |
| 外文關鍵詞: | super resolution, convolutional neural network, Generative Adversarial Networks |
| 相關次數: | 點閱:40 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
影片超解析度任務包含運動補償和圖像重建兩個步驟。運動補償後的畫面在部分區域發生未對齊的情況,尤其是被遮擋或亮度發生變化的區域,可能會導致重建效果不佳。本文使用雙分支映射模塊 (Dual Branch Mapping Module, DBMM) 避免圖像重建過於依賴運動補償的準確性。雙分支映射模塊中的一個分支使用多個畫面作為輸入,可以從多張相鄰畫面中捕捉更多相似的紋理資訊,重建更清晰的圖像;另一個分支以單一畫面作為輸入,避免相鄰畫面中未對齊的區域干擾神經網路重建正確的紋理,使超解析度網路更加穩健。為了結合兩條分支的優點,我們提出一個速度注意模塊 (Velocity attention module, VAM) 融合來自雙分支的特徵圖,使得重建圖像具有清晰的紋理並且包含較少偽像。此外雙分支框架可以靈活地與任何圖像超解析度架構整合。與其他最先進的超解析度方法相比,我們的方法在 VID4 測試數據集上改進0.35dB的峰值信噪比 (Peak Signal to Noise Ratio, PSNR) 和0.033的基於學習之感知圖像相似度 (Learned Perceptual Image Patch Similarity, LPIPS)。
The process of video super-resolution usually consists of two steps: motion compensation and image reconstruction. However, some areas in the compensated frame may be misaligned, especially for the occluded object or when the light is being changed, and they may lead to poor reconstructions. In this thesis, we use a dual-branch mapping module to avoid over-reliance on motion compensation for image reconstruction. One branch takes a single frame as input while the other takes multiple frames as input. The branch that takes multiple frames as input helps reconstruct a clearer image because it contains more inter-frame information. And the other branch that takes a single frame as input can avoid the propagation of misaligned frames, and hence can generate more stable frames. To merge the two branches, we propose a velocity attention module to acquire clear and stable reconstructions. This framework can be integrated with any image super-resolution architecture as a branch. Compared with state-of-the-art methods, the proposed approach improves the visual quality by 0.35dB in PSNR and 0.033 in LPIPS from experiments on the VID4 test dataset.
[1] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European conference on computer vision, pp. 184–199, Springer, 2014.
[2] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646–1654, 2016.
[3] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, “Video super-resolution with convolutional neural networks,” IEEE transactions on computational imaging, vol. 2, no. 2, pp. 109–122, 2016.
[4] J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi, “Real-time video super-resolution with spatio-temporal networks and motion compensation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787, 2017.
[5] S. Y. Kim, J. Lim, T. Na, and M. Kim, “3dsrnet: Video super-resolution using 3d convolutional neural networks,” arXiv preprint arXiv:1812.09079, 2018.
[6] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3224–3232, 2018.
[7] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z.Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
[8] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
[9] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
[10] C. Liu and D. Sun, “On bayesian adaptive video super resolution,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 2, pp. 346–360, 2013.
[11] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
[12] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533– 536, 1986
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[15] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,” in 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp. 2528–2535, IEEE, 2010.
[16] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
[17] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[19] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z.Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
[20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700– 4708, 2017.
[21] M. S. Sajjadi, R. Vemulapalli, and M. Brown, “Frame-recurrent video super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634, 2018.
[22] M. Chu, Y. Xie, J. Mayer, L. Leal-Taixe, and N. Thuerey, “Learning ´ temporal coherence via self-supervision for gan-based video generation,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 75–1, 2020.
[23] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in European conference on computer vision, pp. 391–407, Springer, 2016.
[24] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video superresolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883, 2016.
[25] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), pp. 286–301, 2018.
[26] J. Lei, X. He, C. Ren, X. Wu, and Y. Wang, “Video super-resolution network via enhanced deep feature extraction and residual up-down block,” Journal of Electronic Imaging, vol. 29, no. 6, p. 063016, 2020.
[27] L. Wang, Y. Guo, Z. Lin, X. Deng, and W. An, “Learning for video super-resolution through hr optical flow estimation,” in Asian Conference on Computer Vision, pp. 514–529, Springer, 2018.
[28] Y. Tian, Y. Zhang, Y. Fu, and C. X. TDAN, “temporally-deformable alignment network for video super-resolution. in 2020 ieee,” in CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3357–3366, 2020.
[29] H. Wang, D. Su, C. Liu, L. Jin, X. Sun, and X. Peng, “Deformable non-local network for video super-resolution,” IEEE Access, vol. 7, pp. 177734–177744, 2019.
[30] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
[31] M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” Advances in neural information processing systems, vol. 28, pp. 2017–2025, 2015.
[32] A. Bruhn, J. Weickert, and C. Schnorr. Lucas/Kanade meets ¨ Horn/Schunck: Combining local and global optic flow methods. IJCV, 61(3):211–231, 2005.
[33] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[34] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249– 256, JMLR Workshop and Conference Proceedings, 2010.
[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[36] M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “Enhancenet: Single image super-resolution through automated texture synthesis,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4491–4500, 2017.
[37] https://www.qianzhan.com/analyst/detail/220/201225-2a8d89ea.html