| 研究生: |
林建維 Lin, Chien-Wei |
|---|---|
| 論文名稱: |
基於卷積神經網路之影片超解析度技術 Video Super-Resolution Based on Convolutional Neural Network |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 82 |
| 中文關鍵詞: | 影片超解析度技術 、深度學習 、卷積神經網路 、動作補償 |
| 外文關鍵詞: | Multi-frame, Super-resolution, Convolutional neural network |
| 相關次數: | 點閱:173 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
卷積神經網路(Convolutional Neural Network, CNN) 為深度神經網路中的一個分支,目前廣泛應用在圖片辨識、影像重建等多媒體相關領域,並得到相當出色的成果。本論文中,我們提出一個基於卷積神經網路之快速影片超解析度演算法,將針對單一圖片之快速超解析度神經網路(Fast Super-Resolution Convolutional Neural Network, FSRCNN) 改良為適用於影片超解析度的架構。一般超解析度技術僅參考單一圖片進行重建與放大,應用於影片超解析度時,對於動作模糊的區塊重建會有誤差,我們將欲重建之影像前後幀當作卷積神經網路的輸入,讓網路能學習到幀與幀之間的動作資訊。此外,許多文獻證明針對前後幀進行動作補償前處理,再讓神經網路學習,能有效提升網路的超解析度效能。然而,動作補償前處理會帶來額外的運算量,本論文提出關聯層的概念來取代之。不同於一般基於卷積神經網路之動作補償演算法,使用較為深層的卷積神經網路進行動作補償,關聯層僅僅使用一層卷積層完成這項任務,利用中間層特徵圖作為網路輸入而非動作補償幀,使神經網路能更加有效地保留動作資訊。實驗結果顯示,本論文提出的三種架構均能得到優於其他演算法的超解析度效果;關聯層的使用,相較於動作補償前處理在超解析度和執行時間都有更優異的表現。
neural network (CNN) has been widely applied to super-resolutiontasks. However, the existing super-resolution algorithms with CNN need pre-processing steps which require massive computation. In this paper, we propose
an architecture that learns both spatial and temporal information of low-resolutionvideo. Consecutive frames are used as input to our video super-resolution CNN.
Unlike existing approaches, only a single convolutional layer, named correlatio-nal layer, is used to replace the motion compensation step. Experimental results show that the proposed correlational layer is able to learn the motion information and outperform the network with motion compensated frames by 0.19 dB in PSNR. Our method can averagely outperform VSRnet by 0.39 dB in PSNR for
videos upscaled with factor 3.
[1] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via
sparse representation,” IEEE Transactions on Image Processing, vol. 19, no. 11,
pp. 2861–2873, 2010.
[2] M.-H. Cheng, N.-W. Lin, K.-S. Hwang, and J.-H. Jeng, “Fast video superresolution
using artificial neural networks,” in 2012 8th International Symposium
on Communication Systems, Networks Digital Signal Processing
(CSNDSP), pp. 1–4, July 2012.
[3] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network
for image super-resolution,” in European Conference on Computer Vision,
pp. 184–199, Springer, 2014.
[4] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional
neural network,” in European Conference on Computer Vision, pp. 391–407,
Springer, September 2016.
[5] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, “Video super-resolution
with convolutional neural networks,” IEEE Transactions on Computational Imaging,
vol. 2, pp. 109–122, June 2016.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpas-sing human-level performance on imagenet classification,” in Proceedings of
the IEEE international conference on computer vision, pp. 1026–1034, 2015.
[7] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. van der
Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional
networks,” in Proceedings of the IEEE International Conference on
Computer Vision, pp. 2758–2766, 2015.
[8] Xiph.Org, “http://media.xiph.org/,”
[9] C. Liu and D. Sun, “On bayesian adaptive video super resolution,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 36, pp. 346–360,
Feb 2014.
[10] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward
neural networks.,” in Aistats, vol. 9, pp. 249–256, 2010.
[11] H. Zhang, Z. Yang, L. Zhang, and H. Shen, “Super-resolution reconstruction for
multi-angle remote sensing images considering resolution differences,” Remote
Sensing, vol. 6, no. 1, pp. 637–657, 2014.
[12] J. S. Isaac and R. Kulkarni, “Super resolution techniques for medical image
processing,” in Technologies for Sustainable Development (ICTSD), 2015 International
Conference on, pp. 1–6, IEEE, 2015.
[13] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche,
J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering
the game of go with deep neural networks and tree search,” Nature, vol. 529,
no. 7587, pp. 484–489, 2016.
[14] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep
convolutional networks,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 38, pp. 295–307, Feb 2016.
[15] R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood regression
for fast example-based super-resolution,” in Proceedings of the IEEE International
Conference on Computer Vision, pp. 1920–1927, 2013.
[16] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Lowcomplexity
single-image super-resolution based on nonnegative neighbor embedding,”
BMVA press, 2012.
[17] K. Fukushima, “Neocognitron: A self-organizing neural network model for a
mechanism of pattern recognition unaffected by shift in position,” Biological
Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
[18] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,W. Hubbard, and
L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,”
Neural Computation, vol. 1, pp. 541–551, Dec 1989.
[19] F. Rosenblatt, “The perceptron: A probabilistic model for information storage
and organization in the brain,” Psychological Review, pp. 65–386, 1958.
[20] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations
by back-propagating errors,” Cognitive modeling, vol. 5, no. 3, p. 1, 1988.
[21] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data
with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing
Systems 25, pp. 1097–1105, Curran Associates, Inc., 2012.
[23] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative
adversarial text to image synthesis,” in Proceedings of The 33rd International
Conference on Machine Learning, vol. 3, 2016.
[24] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X.Wang, and D. Metaxas, “Stackgan:
Text to photo-realistic image synthesis with stacked generative adversarial
networks,” arXiv preprint arXiv:1612.03242, 2016.
[25] D. Teney and M. Hebert, Learning to Extract Motion from Videos in Convolutional
Neural Networks, pp. 412–428. Springer International Publishing, 2017.
[26] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a
window covered with dirt or rain,” in Proceedings of the IEEE International
Conference on Computer Vision, pp. 633–640, 2013.
[27] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep
video deblurring,” arXiv preprint arXiv:1611.08387, 2016.
[28] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep
belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
[29] P. Smolensky, “Parallel distributed processing: Explorations in the microstructure
of cognition, vol. 1,” ch. Information Processing in Dynamical Systems:
Foundations of Harmony Theory, pp. 194–281, Cambridge, MA, USA: MIT
Press, 1986.
[30] J. Gao, Y. Guo, and M. Yin, “Restricted boltzmann machine approach to couple dictionary training for image super-resolution,” in Image Processing (ICIP),
2013 20th IEEE International Conference on, pp. 499–503, IEEE, 2013.
[31] Y. Zhou, Y. Qu, Y. Xie, andW. Zhang, “Image super-resolution using deep belief
networks,” in Proceedings of International Conference on Internet Multimedia
Computing and Service, p. 28, ACM, 2014.
[32] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural
Information Processing Systems 27, pp. 2672–2680, Curran Associates, Inc.,
2014.
[33] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta,
A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single
image super-resolution using a generative adversarial network,” arXiv preprint
arXiv:1609.04802, 2016.
[34] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324,
Nov 1998.
[35] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and
functional architecture in the cat’s visual cortex,” The Journal of physiology,
vol. 160, no. 1, pp. 106–154, 1962.
[36] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,”
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference
on, pp. 2528–2535, 2010.
[37] M. Lin, Q. Chen, and S. Yan, “Network in network,” International Conference
on Learning Representations (ICLR), March 2014.
[38] R. Liao, X. Tao, R. Li, Z. Ma, and J. Jia, “Video super-resolution via deep draftensemble
learning,” in The IEEE International Conference on Computer Vision
(ICCV), December 2015.
[39] L. Xu, J. Jia, and Y. Matsushita, “Motion detail preserving optical flow estimation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34,
pp. 1744–1757, Sept 2012.
[40] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical
flow estimation based on a theory for warping,” Computer Vision-ECCV 2004,
pp. 25–36, 2004.
[41] M. Drulea and S. Nedevschi, “Total variation regularization of local-global optical
flow,” in 2011 14th International IEEE Conference on Intelligent Transportation
Systems (ITSC), pp. 318–323, Oct 2011.
[42] Y. Huang, W. Wang, and L. Wang, “Video super-resolution via bidirectional
recurrent convolutional networks,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2017.
[43] A. Ahmadi and I. Patras, “Unsupervised convolutional neural networks for motion
estimation,” in Image Processing (ICIP), 2016 IEEE International Conference
on, pp. 1629–1633, IEEE, 2016.
[44] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv
preprint arXiv:1412.6980, 2014.
[45] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado,
A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning
on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
[46] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”
in Proceedings of the 22nd ACM international conference on Multimedia,
pp. 675–678, ACM, 2014.
[47] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for matlab,”
in Proceedings of the 23rd ACM international conference on Multimedia, pp. 689–692, ACM, 2015.
校內:2019-11-03公開