簡易檢索 / 詳目顯示

研究生: 徐睿廷
Hsu, Jui-Ting
論文名稱: 利用膠囊神經網路之圖片超解析度技術
Image Super-Resolution Using Capsule Neural Networks
指導教授: 郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 90
中文關鍵詞: 圖片超解析度技術深度學習卷積神經網路膠囊神經網路
外文關鍵詞: Super-resolution, Convolutional Neural Network, Capsule Neural Network
相關次數: 點閱:146下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 卷積神經網路(Convolutional Neural Network, CNN)廣泛應用在圖片辨識、影像重建等多媒體相關領域,得到相當出色的成果。膠囊神經網路在近年提出,該網路提出以多個元素表示一個神經元,稱之為膠囊神經元。
    新的膠囊神經元能透過路由機制從具有相近性質的候選者中產出。本論文中,我們提出膠囊圖片恢復神經網絡(Capsule Image Restoration Neural Network, CIRNN)和膠囊注意與重建神經網絡(Capsule Attention and Reconstruction Neural Network, CARNN)結合膠囊神經網路到基於卷積神經網絡的圖片超解析度框架。CIRNN利用膠囊神經元中編碼豐富訊息來重建更準確的高解析物圖片。另一方面,CARNN是基於卷積神經網路的超解析度方法,利用膠囊神經網絡的強大分割能力設計注意網絡。實驗結果顯示CIRNN和CARNN的重建效果比起許多先進方法更加優異。

    In this paper, we present Capsule Image Restoration Neural Network (CIRNN) and Capsule Attention and Reconstruction Neural Network (CARNN) incorporating capsule neural networks into the convolutional neural network(CNN)-based image super-resolution (SR) frameworks. The proposed CIRNN takes advantage of the rich information encoded in the capsules to reconstruct more accurate high-resolution images. On the other hand, CARNN is a CNN-based SR method with capsule attention networks designed by utilizing the robust segmentation ability of capsule neural networks. Our experiment results show comparable performance in PSNR with other state-of-the-art methods.

    摘要.....I Abstract.....II 誌謝.....XIX 目錄.....XX 圖目錄.....X.XIII 表目錄.....X.XVI 1 緒論.....1 1-1 前言.....1 1-2 超解析度 (Super-Resolution).....2 1-2-1 圖片超解析度技術 (Image Super-Resolution).....3 1-3 深度學習 (Deep Learning).....4 1-3-1 人工神經網路 (Artificial Neural Networks).....5 1-3-2 深度神經網路 (Deep Neural Networks).....5 1-3-3 反向傳播法 (Back-Propagation).....7 1-3-4 卷積神經網路 (Convlutional Neural Networks.....8 1-4 研究動機.....11 1-5 研究貢獻.....12 1-6 論文架構.....13 2 相關研究背景介紹.....14 2-1 反卷積神經網路 (Deconvolutional Neural Networks).....14 2-2 亞像素卷積神經網路 (Sub-Pixel Convolutional Neural Networks).....16 2-3 先進的深度神經網路架構.....18 2-4 膠囊神經網路 (Capsule Neural Network).....19 2-4-1 膠囊神經網路層 (Capsule neural network layer).....19 2-4-2 基於動態路由演算法之膠囊神經網路.....22 2-4-3 基於期望值最大化路由演算法之膠囊神經網路.....25 3 深度學習超解析度技術.....28 3-1 利用預處理升取樣之超解析度演算法.....29 3-1-1 基於卷積神經網路之超解析度演算法.....29 3-1-2 基於深度卷積神經網路之精確圖片超解析度演算法.....30 3-2 利用單層升取樣之超解析度演算法.....30 3-2-1 基於卷積神經網路之快速超解析度演算法.....31 3-2-2 基於增強深度殘差網絡之靜態圖片超解析度演算法.....32 3-3 利用漸進式升取樣之超解析度演算法.....34 3-3-1 基於深層拉普拉斯金字塔網絡之快速準確靜態圖片超解析度演算法.....34 3-3-2 基於全漸進式方法之靜態圖片超解析度演算法.....35 3-4 利用迭代升降取樣之超解析度演算法.....36 3-4-1 基於深度反向投影網絡之靜態圖片超解析度演算法.....37 3-5 針對圖片超解析度之注意網路.....38 3-6 超解析度演算法比較.....39 4 利用膠囊神經網路之超解析度方法.....42 4-1 膠囊神經網路層輸入連結方式.....44 4-2 膠囊影像恢復神經網路.....45 4-2-1 利用用路由機制於圖片超解析度技術.....45 4-2-2 網路架構.....46 4-3 膠囊注意與重建網路 (CARNN).....48 4-4 網路訓練.....49 5 實驗分析與測試結果.....53 5-1 資料庫 (Dataset).....53 5-2 影像品質評估.....56 5-3 網路訓練工具及設備.....57 5-4 網路訓練設定.....57 5-5 膠囊神經網路層超解析度效果評估.....58 5-6 膠囊影像恢復神經網路之架構分析.....59 5-6-1 膠囊神經元維度擾動實驗.....61 5-7 膠囊注意與重建神經網路之架構分析.....64 5-8 重建結果.....65 5-8-1 重建圖片與場景分析.....68 5-8-2 複雜度與效能評估.....75 6 結論與未來展望.....79 6-1 結論.....79 6-2 未來展望.....80 參考文獻.....81

    [1] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Transactions on Image Processing, vol. 19, no. 11, pp. 2861–2873, 2010.
    [2] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional net- work for image super-resolution,” in European conference on computer vision, pp. 184–199, Springer, 2014.
    [3] J. Kim, J. Kwon Lee, and K. Mu Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1637–1645, 2016.
    [4] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive resid- ual network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3147–3155, 2017.
    [5] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646–1654, 2016.
    [6] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolu- tional neural network,” in European conference on computer vision, pp. 391– 407, Springer, 2016.
    [7] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883, 2016.
    [8] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Fast and accurate image super-resolution with deep laplacian pyramid networks,” IEEE transactions on pattern analysis and machine intelligence, 2018.
    [9] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection networks for super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1664–1673, 2018.
    [10] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in Neural Information Processing Systems, pp. 3856–3866, 2017.
    [11] A. Shahroudnejad, P. Afshar, K. N. Plataniotis, and A. Mohammadi, “Improved explainability of capsule networks: Relevance path by agreement,” in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 549–553, IEEE, 2018.
    [12] G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” in 6th International Conference on Learning Representations, ICLR, 2018.
    [13] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard arti- facts,” Distill, vol. 1, no. 10, p. e3, 2016.
    [14] G. E. Hinton, A. Krizhevsky, and S. D. Wang, “Transforming auto-encoders,” in International Conference on Artificial Neural Networks, pp. 44–51, Springer, 2011.
    [15] E. Mansimov, E. Parisotto, J. L. Ba, and R. Salakhutdinov, “Generating images from captions with attention,” arXiv preprint arXiv:1511.02793, 2015.
    [16] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, pp. 2048–2057, 2015.
    [17] L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.-S. Chua, “Sca- cnn: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5659–5667, 2017.
    [18] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132– 7141, 2018.
    [19] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual attention network for image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition. (CVPR), pp. 3156–3164, 2017.
    [20] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the Eu- ropean Conference on Computer Vision (ECCV), pp. 286–301, 2018.
    [21] X. Wang, K. Yu, C. Dong, and C. Change Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 606–615, 2018.
    [22] Y. Hu, J. Li, Y. Huang, and X. Gao, “Channel-wise and spatial feature modula- tion network for single image super-resolution,” IEEE Transactions on Circuits and Systems for Video Technology, 2019.
    [23] Matrix-Capsules-EM-Tensorflow, “https://github.com/www0wwwjs1/matrix- capsules-em-tensorflow,”
    [24] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super- resolution: Dataset and study,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, vol. 3, p. 2, 2017.
    [25] MATLAB, 9.5.0.944444 (R2018b). Natick, Massachusetts: The MathWorks Inc., 2018.
    [26] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse- representations,” in International conference on curves and surfaces, pp. 711– 730, Springer, 2010.
    [27] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierar- chical image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 898–916, 2011.
    [28] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor- ward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256, 2010.
    [29] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [30] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighbor-hood regression for fast super-resolution,” in Asian conference on computer vi- sion, pp. 111–126, Springer, 2014.
    [31] S. Schulter, C. Leistner, and H. Bischof, “Fast and accurate image upscaling with super-resolution forests,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp. 3791–3799, 2015.
    [32] H. Zhang, Z. Yang, L. Zhang, and H. Shen, “Super-resolution reconstruction for multi-angle remote sensing images considering resolution differences,” Remote Sensing, vol. 6, no. 1, pp. 637–657, 2014.
    [33] J. S. Isaac and R. Kulkarni, “Super resolution techniques for medical image processing,” in 2015 International Conference on Technologies for Sustainable Development (ICTSD), pp. 1–6, IEEE, 2015.
    [34] J. Gao, Y. Guo, and M. Yin, “Restricted boltzmann machine approach to couple dictionary training for image super-resolution,” in IEEE International Confer- ence on Image Processing, pp. 499–503, IEEE, 2013.
    [35] Y. Zhou, Y. Qu, Y. Xie, and W. Zhang, “Image super-resolution using deep belief networks,” in Proceedings of International Conference on Internet Multimedia Computing and Service, p. 28, ACM, 2014.
    [36] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
    [37] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
    [38] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
    [39] D. E. Rumelhart, G. E. Hinton, R. J. Williams, et al., “Learning representations by back-propagating errors,” Cognitive modeling, vol. 5, no. 3, p. 1, 1988.
    [40] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
    [41] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information pro- cessing systems, pp. 1097–1105, 2012.
    [42] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Master- ing the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
    [43] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
    [44] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in Proceedings of The 33rd International Conference on Machine Learning, vol. 3, pp. 1060–1069, 2016.
    [45] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative ad- versarial networks,” in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
    [46] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2758–2766, 2015.
    [47] D. Teney and M. Hebert, “Learning to extract motion from videos in convolu- tional neural networks,” in Asian Conference on Computer Vision, pp. 412–428, Springer, 2016.
    [48] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video deblurring,” CoRR, vol. abs/1611.08387, 2016.
    [49] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 633–640, 2013.
    [50] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [51] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” The Journal of physiology, vol. 160, no. 1, pp. 106–154, 1962.
    [52] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, p. 3, 2013.
    [53] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpass- ing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
    [54] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
    [55] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual net- works for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
    [56] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional net- works,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535, 2010.
    [57] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard arti- facts,” Distill, vol. 1, no. 10, p. e3, 2016.
    [58] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recog- nition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [59] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely con- nected convolutional networks,” in Proceedings of the IEEE conference on com- puter vision and pattern recognition, pp. 4700–4708, 2017.
    [60] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
    [61] T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense skip connections,” in Proceedings of the IEEE International Conference on Com- puter Vision, pp. 4799–4807, 2017.
    [62] Y. Wang, F. Perazzi, B. McWilliams, A. Sorkine-Hornung, O. Sorkine- Hornung, and C. Schroers, “A fully progressive approach to single-image super- resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 864–873, 2018.
    [63] Y. Bengio, P. Simard, P. Frasconi, et al., “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
    [64] E. L. Denton, S. Chintala, R. Fergus, et al., “Deep generative image models using a laplacian pyramid of adversarial networks,” in Advances in neural infor- mation processing systems, pp. 1486–1494, 2015.
    [65] Y. Romano, J. Isidoro, and P. Milanfar, “RAISR: rapid and accurate image super resolution,” CoRR, vol. abs/1606.01299, 2016.
    [66] R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in Proceedings of the IEEE interna- tional conference on computer vision, pp. 1920–1927, 2013.
    [67] M. Bevilacqua, A. Roumy, C. Guillemot, and M. line Alberi Morel, “Low- complexity single-image super-resolution based on nonnegative neighbor em- bedding,” in Proceedings of the British Machine Vision Conference, pp. 135.1– 135.10, BMVA Press, 2012.
    [68] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from trans- formed self-exemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206, 2015.
    [69] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
    [70] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadar- rama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embed- ding,” in Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678, ACM, 2014.
    [71] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for mat- lab,” in Proceedings of the 23rd ACM international conference on Multimedia, pp. 689–692, ACM, 2015.

    無法下載圖示 校內:2022-09-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE