簡易檢索 / 詳目顯示

研究生: 張珈偉
Zhang, Jia-Wei
論文名稱: 基於速度注意力之雙分支影片超解析度演算法
Dual-Branch Video Super-Resolution Network With Velocity Attention
指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 60
中文關鍵詞: 影片超解析度卷積神經網路生成對抗網路
外文關鍵詞: super resolution, convolutional neural network, Generative Adversarial Networks
相關次數: 點閱:40下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影片超解析度任務包含運動補償和圖像重建兩個步驟。運動補償後的畫面在部分區域發生未對齊的情況,尤其是被遮擋或亮度發生變化的區域,可能會導致重建效果不佳。本文使用雙分支映射模塊 (Dual Branch Mapping Module, DBMM) 避免圖像重建過於依賴運動補償的準確性。雙分支映射模塊中的一個分支使用多個畫面作為輸入,可以從多張相鄰畫面中捕捉更多相似的紋理資訊,重建更清晰的圖像;另一個分支以單一畫面作為輸入,避免相鄰畫面中未對齊的區域干擾神經網路重建正確的紋理,使超解析度網路更加穩健。為了結合兩條分支的優點,我們提出一個速度注意模塊 (Velocity attention module, VAM) 融合來自雙分支的特徵圖,使得重建圖像具有清晰的紋理並且包含較少偽像。此外雙分支框架可以靈活地與任何圖像超解析度架構整合。與其他最先進的超解析度方法相比,我們的方法在 VID4 測試數據集上改進0.35dB的峰值信噪比 (Peak Signal to Noise Ratio, PSNR) 和0.033的基於學習之感知圖像相似度 (Learned Perceptual Image Patch Similarity, LPIPS)。

    The process of video super-resolution usually consists of two steps: motion compensation and image reconstruction. However, some areas in the compensated frame may be misaligned, especially for the occluded object or when the light is being changed, and they may lead to poor reconstructions. In this thesis, we use a dual-branch mapping module to avoid over-reliance on motion compensation for image reconstruction. One branch takes a single frame as input while the other takes multiple frames as input. The branch that takes multiple frames as input helps reconstruct a clearer image because it contains more inter-frame information. And the other branch that takes a single frame as input can avoid the propagation of misaligned frames, and hence can generate more stable frames. To merge the two branches, we propose a velocity attention module to acquire clear and stable reconstructions. This framework can be integrated with any image super-resolution architecture as a branch. Compared with state-of-the-art methods, the proposed approach improves the visual quality by 0.35dB in PSNR and 0.033 in LPIPS from experiments on the VID4 test dataset.

    中文摘要 I 目錄 IX 圖目錄 XII 表目錄 XIV 第一章 緒論 1 1-1 前言 1 1-2 研究動機 1 1-3 研究貢獻 3 1-4 論文架構 4 第二章 相關研究背景介紹 5 2-1 超解析度技術 (Super-Resolution) 5 2-1-1 圖像超解析度 (Single image super-resolution, SISR) 5 2-1-2 影片超解析度 (Video super-resolution, VSR) 6 2-2 深度學習 (Deep Learning) 7 2-2-1 人工神經網路 (Artificial Neural Networks) 7 2-2-2 深度神經網路 (Deep Neural Networks) 8 2-2-3 反向傳播法 (Back-Propagation) 9 2-2-4 卷積神經網路 (Convolutional Neural Networks) 10 2-2-5 反卷積神經網路 (Deconvolutional Neural Networks) 11 2-2-6 生成對抗網路 (Generative Adversarial Nets, GANs) 12 2-2-6-1 傳統生成對抗網路 12 2-2-6-2 條件生成對抗網路 (Conditional GANs, CGANs) 13 第三章 深度學習超解析度技術回顧 15 3-1 應用於圖像重建之超解析度演算法 15 3-1-1 基於卷積神經網路之圖像超解析度演算法 15 3-1-2 基於增強深度殘差網路之圖片超解析度演算法 16 3-1-3 基於生成對抗網路之圖像超解析度演算法 17 3-2 應用於影片重建之超解析度演算法 18 3-2-1 基於卷積神經網路之影片超解析度演算法 18 3-2-2 遞迴式影片超解析度演算法 19 3-2-3 基於生成對抗網路之時態連貫性影片超解析度演算法 20 3-3 超解析度相關研究方法比較 21 第四章 基於速度注意力之雙分支網路 24 4-1 基於速度注意力之雙分支網路 (DBVSR) 架構 25 4-2 運動補償模塊 (Motion Compensation Module, MCM) 26 4-3 雙分支映射模塊 (Dual-branch Mapping Module, DBMM) 27 4-4 重建模塊 (Reconstruction Module, RM) 29 4-5 損失函數 (Loss Function) 30 第五章 實驗環境與數據分析 36 5-1 資料集(Dataset) 36 5-2 影像品質評估 37 5-3 網路實施細節 39 5-4 架構分析 40 5-4-1 網路深度與超解析度效能之關係 40 5-4-2 雙分支映射模塊 (DBMM) 中不同分支之有效性 42 5-4-3 速度注意力模塊 (VAM) 之有效性 45 5-4-4 對抗性學習對重建圖像之影響 47 5-5 重建結果與比較 48 5-5-1 定量與定性分析(Quantitative and Qualitative Evaluation) 48 5-5-2 時間連貫性(Temporal consistency) 51 5-5-3 網路複雜度 53 第六章 結論與未來展望 55 6-1 結論 55 6-2 未來展望 55 參考文獻 56

    [1] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European conference on computer vision, pp. 184–199, Springer, 2014.
    [2] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646–1654, 2016.
    [3] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, “Video super-resolution with convolutional neural networks,” IEEE transactions on computational imaging, vol. 2, no. 2, pp. 109–122, 2016.
    [4] J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi, “Real-time video super-resolution with spatio-temporal networks and motion compensation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787, 2017.
    [5] S. Y. Kim, J. Lim, T. Na, and M. Kim, “3dsrnet: Video super-resolution using 3d convolutional neural networks,” arXiv preprint arXiv:1812.09079, 2018.
    [6] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3224–3232, 2018.
    [7] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z.Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
    [8] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
    [9] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
    [10] C. Liu and D. Sun, “On bayesian adaptive video super resolution,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 2, pp. 346–360, 2013.
    [11] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
    [12] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533– 536, 1986
    [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
    [14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [15] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,” in 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp. 2528–2535, IEEE, 2010.
    [16] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
    [17] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
    [18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [19] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z.Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
    [20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700– 4708, 2017.
    [21] M. S. Sajjadi, R. Vemulapalli, and M. Brown, “Frame-recurrent video super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634, 2018.
    [22] M. Chu, Y. Xie, J. Mayer, L. Leal-Taixe, and N. Thuerey, “Learning ´ temporal coherence via self-supervision for gan-based video generation,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 75–1, 2020.
    [23] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in European conference on computer vision, pp. 391–407, Springer, 2016.
    [24] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video superresolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883, 2016.
    [25] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), pp. 286–301, 2018.
    [26] J. Lei, X. He, C. Ren, X. Wu, and Y. Wang, “Video super-resolution network via enhanced deep feature extraction and residual up-down block,” Journal of Electronic Imaging, vol. 29, no. 6, p. 063016, 2020.
    [27] L. Wang, Y. Guo, Z. Lin, X. Deng, and W. An, “Learning for video super-resolution through hr optical flow estimation,” in Asian Conference on Computer Vision, pp. 514–529, Springer, 2018.
    [28] Y. Tian, Y. Zhang, Y. Fu, and C. X. TDAN, “temporally-deformable alignment network for video super-resolution. in 2020 ieee,” in CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3357–3366, 2020.
    [29] H. Wang, D. Su, C. Liu, L. Jin, X. Sun, and X. Peng, “Deformable non-local network for video super-resolution,” IEEE Access, vol. 7, pp. 177734–177744, 2019.
    [30] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
    [31] M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” Advances in neural information processing systems, vol. 28, pp. 2017–2025, 2015.
    [32] A. Bruhn, J. Weickert, and C. Schnorr. Lucas/Kanade meets ¨ Horn/Schunck: Combining local and global optic flow methods. IJCV, 61(3):211–231, 2005.
    [33] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [34] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249– 256, JMLR Workshop and Conference Proceedings, 2010.
    [35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [36] M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “Enhancenet: Single image super-resolution through automated texture synthesis,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4491–4500, 2017.
    [37] https://www.qianzhan.com/analyst/detail/220/201225-2a8d89ea.html

    下載圖示 校內:2024-03-31公開
    校外:2024-03-31公開
    QR CODE