| 研究生: |
徐睿廷 Hsu, Jui-Ting |
|---|---|
| 論文名稱: |
利用膠囊神經網路之圖片超解析度技術 Image Super-Resolution Using Capsule Neural Networks |
| 指導教授: |
郭致宏
Kuo, Chih-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 90 |
| 中文關鍵詞: | 圖片超解析度技術 、深度學習 、卷積神經網路 、膠囊神經網路 |
| 外文關鍵詞: | Super-resolution, Convolutional Neural Network, Capsule Neural Network |
| 相關次數: | 點閱:146 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
卷積神經網路(Convolutional Neural Network, CNN)廣泛應用在圖片辨識、影像重建等多媒體相關領域,得到相當出色的成果。膠囊神經網路在近年提出,該網路提出以多個元素表示一個神經元,稱之為膠囊神經元。
新的膠囊神經元能透過路由機制從具有相近性質的候選者中產出。本論文中,我們提出膠囊圖片恢復神經網絡(Capsule Image Restoration Neural Network, CIRNN)和膠囊注意與重建神經網絡(Capsule Attention and Reconstruction Neural Network, CARNN)結合膠囊神經網路到基於卷積神經網絡的圖片超解析度框架。CIRNN利用膠囊神經元中編碼豐富訊息來重建更準確的高解析物圖片。另一方面,CARNN是基於卷積神經網路的超解析度方法,利用膠囊神經網絡的強大分割能力設計注意網絡。實驗結果顯示CIRNN和CARNN的重建效果比起許多先進方法更加優異。
In this paper, we present Capsule Image Restoration Neural Network (CIRNN) and Capsule Attention and Reconstruction Neural Network (CARNN) incorporating capsule neural networks into the convolutional neural network(CNN)-based image super-resolution (SR) frameworks. The proposed CIRNN takes advantage of the rich information encoded in the capsules to reconstruct more accurate high-resolution images. On the other hand, CARNN is a CNN-based SR method with capsule attention networks designed by utilizing the robust segmentation ability of capsule neural networks. Our experiment results show comparable performance in PSNR with other state-of-the-art methods.
[1] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Transactions on Image Processing, vol. 19, no. 11, pp. 2861–2873, 2010.
[2] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional net- work for image super-resolution,” in European conference on computer vision, pp. 184–199, Springer, 2014.
[3] J. Kim, J. Kwon Lee, and K. Mu Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1637–1645, 2016.
[4] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive resid- ual network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3147–3155, 2017.
[5] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646–1654, 2016.
[6] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolu- tional neural network,” in European conference on computer vision, pp. 391– 407, Springer, 2016.
[7] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883, 2016.
[8] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Fast and accurate image super-resolution with deep laplacian pyramid networks,” IEEE transactions on pattern analysis and machine intelligence, 2018.
[9] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection networks for super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1664–1673, 2018.
[10] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in Neural Information Processing Systems, pp. 3856–3866, 2017.
[11] A. Shahroudnejad, P. Afshar, K. N. Plataniotis, and A. Mohammadi, “Improved explainability of capsule networks: Relevance path by agreement,” in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 549–553, IEEE, 2018.
[12] G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” in 6th International Conference on Learning Representations, ICLR, 2018.
[13] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard arti- facts,” Distill, vol. 1, no. 10, p. e3, 2016.
[14] G. E. Hinton, A. Krizhevsky, and S. D. Wang, “Transforming auto-encoders,” in International Conference on Artificial Neural Networks, pp. 44–51, Springer, 2011.
[15] E. Mansimov, E. Parisotto, J. L. Ba, and R. Salakhutdinov, “Generating images from captions with attention,” arXiv preprint arXiv:1511.02793, 2015.
[16] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, pp. 2048–2057, 2015.
[17] L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.-S. Chua, “Sca- cnn: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5659–5667, 2017.
[18] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132– 7141, 2018.
[19] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual attention network for image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition. (CVPR), pp. 3156–3164, 2017.
[20] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the Eu- ropean Conference on Computer Vision (ECCV), pp. 286–301, 2018.
[21] X. Wang, K. Yu, C. Dong, and C. Change Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 606–615, 2018.
[22] Y. Hu, J. Li, Y. Huang, and X. Gao, “Channel-wise and spatial feature modula- tion network for single image super-resolution,” IEEE Transactions on Circuits and Systems for Video Technology, 2019.
[23] Matrix-Capsules-EM-Tensorflow, “https://github.com/www0wwwjs1/matrix- capsules-em-tensorflow,”
[24] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super- resolution: Dataset and study,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, vol. 3, p. 2, 2017.
[25] MATLAB, 9.5.0.944444 (R2018b). Natick, Massachusetts: The MathWorks Inc., 2018.
[26] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse- representations,” in International conference on curves and surfaces, pp. 711– 730, Springer, 2010.
[27] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierar- chical image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 898–916, 2011.
[28] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor- ward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256, 2010.
[29] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[30] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighbor-hood regression for fast super-resolution,” in Asian conference on computer vi- sion, pp. 111–126, Springer, 2014.
[31] S. Schulter, C. Leistner, and H. Bischof, “Fast and accurate image upscaling with super-resolution forests,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp. 3791–3799, 2015.
[32] H. Zhang, Z. Yang, L. Zhang, and H. Shen, “Super-resolution reconstruction for multi-angle remote sensing images considering resolution differences,” Remote Sensing, vol. 6, no. 1, pp. 637–657, 2014.
[33] J. S. Isaac and R. Kulkarni, “Super resolution techniques for medical image processing,” in 2015 International Conference on Technologies for Sustainable Development (ICTSD), pp. 1–6, IEEE, 2015.
[34] J. Gao, Y. Guo, and M. Yin, “Restricted boltzmann machine approach to couple dictionary training for image super-resolution,” in IEEE International Confer- ence on Image Processing, pp. 499–503, IEEE, 2013.
[35] Y. Zhou, Y. Qu, Y. Xie, and W. Zhang, “Image super-resolution using deep belief networks,” in Proceedings of International Conference on Internet Multimedia Computing and Service, p. 28, ACM, 2014.
[36] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
[37] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
[38] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
[39] D. E. Rumelhart, G. E. Hinton, R. J. Williams, et al., “Learning representations by back-propagating errors,” Cognitive modeling, vol. 5, no. 3, p. 1, 1988.
[40] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
[41] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information pro- cessing systems, pp. 1097–1105, 2012.
[42] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Master- ing the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[43] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
[44] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in Proceedings of The 33rd International Conference on Machine Learning, vol. 3, pp. 1060–1069, 2016.
[45] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative ad- versarial networks,” in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
[46] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2758–2766, 2015.
[47] D. Teney and M. Hebert, “Learning to extract motion from videos in convolu- tional neural networks,” in Asian Conference on Computer Vision, pp. 412–428, Springer, 2016.
[48] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video deblurring,” CoRR, vol. abs/1611.08387, 2016.
[49] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 633–640, 2013.
[50] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[51] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” The Journal of physiology, vol. 160, no. 1, pp. 106–154, 1962.
[52] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, p. 3, 2013.
[53] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpass- ing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
[54] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
[55] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual net- works for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
[56] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional net- works,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535, 2010.
[57] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard arti- facts,” Distill, vol. 1, no. 10, p. e3, 2016.
[58] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recog- nition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[59] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely con- nected convolutional networks,” in Proceedings of the IEEE conference on com- puter vision and pattern recognition, pp. 4700–4708, 2017.
[60] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
[61] T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense skip connections,” in Proceedings of the IEEE International Conference on Com- puter Vision, pp. 4799–4807, 2017.
[62] Y. Wang, F. Perazzi, B. McWilliams, A. Sorkine-Hornung, O. Sorkine- Hornung, and C. Schroers, “A fully progressive approach to single-image super- resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 864–873, 2018.
[63] Y. Bengio, P. Simard, P. Frasconi, et al., “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
[64] E. L. Denton, S. Chintala, R. Fergus, et al., “Deep generative image models using a laplacian pyramid of adversarial networks,” in Advances in neural infor- mation processing systems, pp. 1486–1494, 2015.
[65] Y. Romano, J. Isidoro, and P. Milanfar, “RAISR: rapid and accurate image super resolution,” CoRR, vol. abs/1606.01299, 2016.
[66] R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in Proceedings of the IEEE interna- tional conference on computer vision, pp. 1920–1927, 2013.
[67] M. Bevilacqua, A. Roumy, C. Guillemot, and M. line Alberi Morel, “Low- complexity single-image super-resolution based on nonnegative neighbor em- bedding,” in Proceedings of the British Machine Vision Conference, pp. 135.1– 135.10, BMVA Press, 2012.
[68] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from trans- formed self-exemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206, 2015.
[69] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
[70] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadar- rama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embed- ding,” in Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678, ACM, 2014.
[71] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for mat- lab,” in Proceedings of the 23rd ACM international conference on Multimedia, pp. 689–692, ACM, 2015.
校內:2022-09-01公開