簡易檢索 / 詳目顯示

研究生: 林建維
Lin, Chien-Wei
論文名稱: 基於卷積神經網路之影片超解析度技術
Video Super-Resolution Based on Convolutional Neural Network
指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 106
語文別: 中文
論文頁數: 82
中文關鍵詞: 影片超解析度技術深度學習卷積神經網路動作補償
外文關鍵詞: Multi-frame, Super-resolution, Convolutional neural network
相關次數: 點閱:173下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 卷積神經網路(Convolutional Neural Network, CNN) 為深度神經網路中的一個分支,目前廣泛應用在圖片辨識、影像重建等多媒體相關領域,並得到相當出色的成果。本論文中,我們提出一個基於卷積神經網路之快速影片超解析度演算法,將針對單一圖片之快速超解析度神經網路(Fast Super-Resolution Convolutional Neural Network, FSRCNN) 改良為適用於影片超解析度的架構。一般超解析度技術僅參考單一圖片進行重建與放大,應用於影片超解析度時,對於動作模糊的區塊重建會有誤差,我們將欲重建之影像前後幀當作卷積神經網路的輸入,讓網路能學習到幀與幀之間的動作資訊。此外,許多文獻證明針對前後幀進行動作補償前處理,再讓神經網路學習,能有效提升網路的超解析度效能。然而,動作補償前處理會帶來額外的運算量,本論文提出關聯層的概念來取代之。不同於一般基於卷積神經網路之動作補償演算法,使用較為深層的卷積神經網路進行動作補償,關聯層僅僅使用一層卷積層完成這項任務,利用中間層特徵圖作為網路輸入而非動作補償幀,使神經網路能更加有效地保留動作資訊。實驗結果顯示,本論文提出的三種架構均能得到優於其他演算法的超解析度效果;關聯層的使用,相較於動作補償前處理在超解析度和執行時間都有更優異的表現。

    neural network (CNN) has been widely applied to super-resolutiontasks. However, the existing super-resolution algorithms with CNN need pre-processing steps which require massive computation. In this paper, we propose
    an architecture that learns both spatial and temporal information of low-resolutionvideo. Consecutive frames are used as input to our video super-resolution CNN.
    Unlike existing approaches, only a single convolutional layer, named correlatio-nal layer, is used to replace the motion compensation step. Experimental results show that the proposed correlational layer is able to learn the motion information and outperform the network with motion compensated frames by 0.19 dB in PSNR. Our method can averagely outperform VSRnet by 0.39 dB in PSNR for
    videos upscaled with factor 3.

    摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II 誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XII 目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIII 圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XVI 表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X. VIII 1 緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1-1 前言. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1-2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1-3 研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1-4 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 相關研究背景介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2-1 超解析度(Super-Resolution) . . . . . . . . . . . . . . . . . . . . . . 7 2-1-1 圖片超解析度技術(Image Super-Resolution) . . . . . . . . 8 2-1-2 影片超解析度技術(Video Super-Resolution) . . . . . . . . . 10 2-2 深度學習(Deep Learning) . . . . . . . . . . . . . . . . . . 11 2-2-1 人工神經網路(Artificial Neural Networks) . . . . . . . . . . 11 2-2-2 深度神經網路(Deep Neural Networks) . . . . . . . . . . . . 13 2-2-3 激活函數(Activation Function) . . . . . . . . . . . . . . . . . 19 2-2-4 反向傳播法(Back-Propagation) . . . . . . . . . . . . . . . . 20 2-2-5 卷積神經網路(Convlutional Neural Networks) . . . . . . . . 21 2-2-6 反卷積神經網路(Deconvolutional Neural Networks) . . . . 23 3 相關文獻回顧. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3-1 基於人工神經網路之快速影像超解析度技術. . . . . . . . . . . . 25 3-2 基於卷積神經網路之圖像超解析度演算法. . . . . . . . . . . . . . 27 3-3 基於卷積神經網路之快速圖像超解析度演算法. . . . . . . . . . . 28 3-4 基於動作樣本訓練卷積神經網路之影片超解析度演算法. . . . . 29 3-5 基於卷積神經網路之影片超解析度演算法. . . . . . . . . . . . . . 30 3-6 基於雙向遞迴卷積神經網路之影片超解析度演算法. . . . . . . . 32 3-7 超解析度演算法總結. . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 基於卷積神經網路之快速影片超解析度演算法. . . . . . . . . . . . . . 35 4-1 快速影片超解析度卷積神經網路(FVSRnet) . . . . . . . . . . . . . 37 4-1-1 網路架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4-1-2 網路訓練. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4-2 關聯層(Correlational Layer) . . . . . . . . . . . . . . . . . . . . . . . 43 4-2-1 網路架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4-2-2 網路訓練. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5 實驗分析與測試結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5-1 資料庫(Dataset) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5-2 卷積神經網路訓練參數設定. . . . . . . . . . . . . . . . . . . . . . 55 5-2-1 快速影片超解析度卷積神經網路(FVSRnet) . . . . . . . . 55 5-2-2 關聯層(Correlational Layer) . . . . . . . . . . . . . . . . . . 56 5-2-3 預訓練(Pre-training) . . . . . . . . . . . . . . . . . . . . . . 57 5-3 超解析度神經網路架構分析. . . . . . . . . . . . . . . . . . . . . . 58 5-3-1 關聯層特徵圖數量. . . . . . . . . . . . . . . . . . . . . . . 58 5-3-2 關聯層卷積核大小. . . . . . . . . . . . . . . . . . . . . . . 59 5-3-3 不同激活函數對於關聯層的影響. . . . . . . . . . . . . . . 62 5-4 快速超解析度卷積神經網路架構分析. . . . . . . . . . . . . . . . . 63 5-5 重建結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5-5-1 超解析度效能. . . . . . . . . . . . . . . . . . . . . . . . . . 66 5-5-2 執行時間. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6 結論與未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6-1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6-2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    [1] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via
    sparse representation,” IEEE Transactions on Image Processing, vol. 19, no. 11,
    pp. 2861–2873, 2010.
    [2] M.-H. Cheng, N.-W. Lin, K.-S. Hwang, and J.-H. Jeng, “Fast video superresolution
    using artificial neural networks,” in 2012 8th International Symposium
    on Communication Systems, Networks Digital Signal Processing
    (CSNDSP), pp. 1–4, July 2012.
    [3] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network
    for image super-resolution,” in European Conference on Computer Vision,
    pp. 184–199, Springer, 2014.
    [4] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional
    neural network,” in European Conference on Computer Vision, pp. 391–407,
    Springer, September 2016.
    [5] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, “Video super-resolution
    with convolutional neural networks,” IEEE Transactions on Computational Imaging,
    vol. 2, pp. 109–122, June 2016.
    [6] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpas-sing human-level performance on imagenet classification,” in Proceedings of
    the IEEE international conference on computer vision, pp. 1026–1034, 2015.
    [7] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. van der
    Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional
    networks,” in Proceedings of the IEEE International Conference on
    Computer Vision, pp. 2758–2766, 2015.
    [8] Xiph.Org, “http://media.xiph.org/,”
    [9] C. Liu and D. Sun, “On bayesian adaptive video super resolution,” IEEE Transactions
    on Pattern Analysis and Machine Intelligence, vol. 36, pp. 346–360,
    Feb 2014.
    [10] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward
    neural networks.,” in Aistats, vol. 9, pp. 249–256, 2010.
    [11] H. Zhang, Z. Yang, L. Zhang, and H. Shen, “Super-resolution reconstruction for
    multi-angle remote sensing images considering resolution differences,” Remote
    Sensing, vol. 6, no. 1, pp. 637–657, 2014.
    [12] J. S. Isaac and R. Kulkarni, “Super resolution techniques for medical image
    processing,” in Technologies for Sustainable Development (ICTSD), 2015 International
    Conference on, pp. 1–6, IEEE, 2015.
    [13] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche,
    J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering
    the game of go with deep neural networks and tree search,” Nature, vol. 529,
    no. 7587, pp. 484–489, 2016.
    [14] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep
    convolutional networks,” IEEE Transactions on Pattern Analysis and Machine
    Intelligence, vol. 38, pp. 295–307, Feb 2016.
    [15] R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood regression
    for fast example-based super-resolution,” in Proceedings of the IEEE International
    Conference on Computer Vision, pp. 1920–1927, 2013.
    [16] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Lowcomplexity
    single-image super-resolution based on nonnegative neighbor embedding,”
    BMVA press, 2012.
    [17] K. Fukushima, “Neocognitron: A self-organizing neural network model for a
    mechanism of pattern recognition unaffected by shift in position,” Biological
    Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
    [18] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,W. Hubbard, and
    L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,”
    Neural Computation, vol. 1, pp. 541–551, Dec 1989.
    [19] F. Rosenblatt, “The perceptron: A probabilistic model for information storage
    and organization in the brain,” Psychological Review, pp. 65–386, 1958.
    [20] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations
    by back-propagating errors,” Cognitive modeling, vol. 5, no. 3, p. 1, 1988.
    [21] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data
    with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
    [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing
    Systems 25, pp. 1097–1105, Curran Associates, Inc., 2012.
    [23] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative
    adversarial text to image synthesis,” in Proceedings of The 33rd International
    Conference on Machine Learning, vol. 3, 2016.
    [24] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X.Wang, and D. Metaxas, “Stackgan:
    Text to photo-realistic image synthesis with stacked generative adversarial
    networks,” arXiv preprint arXiv:1612.03242, 2016.
    [25] D. Teney and M. Hebert, Learning to Extract Motion from Videos in Convolutional
    Neural Networks, pp. 412–428. Springer International Publishing, 2017.
    [26] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a
    window covered with dirt or rain,” in Proceedings of the IEEE International
    Conference on Computer Vision, pp. 633–640, 2013.
    [27] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep
    video deblurring,” arXiv preprint arXiv:1611.08387, 2016.
    [28] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep
    belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
    [29] P. Smolensky, “Parallel distributed processing: Explorations in the microstructure
    of cognition, vol. 1,” ch. Information Processing in Dynamical Systems:
    Foundations of Harmony Theory, pp. 194–281, Cambridge, MA, USA: MIT
    Press, 1986.
    [30] J. Gao, Y. Guo, and M. Yin, “Restricted boltzmann machine approach to couple dictionary training for image super-resolution,” in Image Processing (ICIP),
    2013 20th IEEE International Conference on, pp. 499–503, IEEE, 2013.
    [31] Y. Zhou, Y. Qu, Y. Xie, andW. Zhang, “Image super-resolution using deep belief
    networks,” in Proceedings of International Conference on Internet Multimedia
    Computing and Service, p. 28, ACM, 2014.
    [32] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural
    Information Processing Systems 27, pp. 2672–2680, Curran Associates, Inc.,
    2014.
    [33] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta,
    A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single
    image super-resolution using a generative adversarial network,” arXiv preprint
    arXiv:1609.04802, 2016.
    [34] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
    to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324,
    Nov 1998.
    [35] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and
    functional architecture in the cat’s visual cortex,” The Journal of physiology,
    vol. 160, no. 1, pp. 106–154, 1962.
    [36] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,”
    Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference
    on, pp. 2528–2535, 2010.
    [37] M. Lin, Q. Chen, and S. Yan, “Network in network,” International Conference
    on Learning Representations (ICLR), March 2014.
    [38] R. Liao, X. Tao, R. Li, Z. Ma, and J. Jia, “Video super-resolution via deep draftensemble
    learning,” in The IEEE International Conference on Computer Vision
    (ICCV), December 2015.
    [39] L. Xu, J. Jia, and Y. Matsushita, “Motion detail preserving optical flow estimation,”
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34,
    pp. 1744–1757, Sept 2012.
    [40] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical
    flow estimation based on a theory for warping,” Computer Vision-ECCV 2004,
    pp. 25–36, 2004.
    [41] M. Drulea and S. Nedevschi, “Total variation regularization of local-global optical
    flow,” in 2011 14th International IEEE Conference on Intelligent Transportation
    Systems (ITSC), pp. 318–323, Oct 2011.
    [42] Y. Huang, W. Wang, and L. Wang, “Video super-resolution via bidirectional
    recurrent convolutional networks,” IEEE Transactions on Pattern Analysis and
    Machine Intelligence, 2017.
    [43] A. Ahmadi and I. Patras, “Unsupervised convolutional neural networks for motion
    estimation,” in Image Processing (ICIP), 2016 IEEE International Conference
    on, pp. 1629–1633, IEEE, 2016.
    [44] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv
    preprint arXiv:1412.6980, 2014.
    [45] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado,
    A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning
    on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
    [46] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
    and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”
    in Proceedings of the 22nd ACM international conference on Multimedia,
    pp. 675–678, ACM, 2014.
    [47] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for matlab,”
    in Proceedings of the 23rd ACM international conference on Multimedia, pp. 689–692, ACM, 2015.

    無法下載圖示 校內:2019-11-03公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE