簡易檢索 / 詳目顯示

研究生: 王奕舜
Wang, Yi-Shun
論文名稱: 一個基於CNN的有效注意力網路用於單影像去動態模糊
An Effective CNN-based Attention Network for Single Image Motion Deblurring
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 75
中文關鍵詞: 動態模糊影像去模糊特徵提取自注意力機制
外文關鍵詞: motion blur, image deblur, feature extraction, self-attention
相關次數: 點閱:35下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 動態模糊是一種在影像中常見的現象,它是在拍攝中物體或場景因運動而導致的模糊效果。這種模糊不僅在視覺上會使影像失真,同時在針對物件辨識和偵測的電腦視覺任務中也會影響其演算法的效能,降低準確性。
    目前基於深度學習的去模糊演算法大多使用卷積神經網路,但這種方法難以充分利用影像的整體資訊,也無法根據不同的輸入影像動態調整權重。近期越來越多研究開始採用自注意力機制的Transformer模型來進行影像修復,然而自注意力方法所需的計算量過大,導致其發展受到限制。本篇論文提出了一個編碼器-解碼器的單影像去模糊網路,首先在編碼端通過改良過的卷積層,減少輸入影像的冗餘特徵,加強特徵提取,接著再利用更有效的自注意力機制來進行計算,最後透過解碼端輸出乾淨影像。在幾個具有代表性的資料集上進行測試,並且與目前主流的影像去模糊相比,本篇論文所提出的方法不論是在視覺效果上或是評估方法比較上都有最好的表現。

    Motion blur is a common phenomenon observed in images, where objects or scenes look blurry because of motion during capture. It not only distorts the visual appearance of the image but also affects the performance of algorithms in computer vision tasks such as object recognition and detection, leading to decreased accuracy.
    Currently, most deep learning-based algorithms for image deblur are based on convolutional neural networks. However, these methods are unable to fully leverage the global information and lack the ability to dynamically adjust weights based on different input images. Recently, Transformer models with self-attention are developed for image restoration tasks. Nevertheless, high computational cost has constrained its development. In this Thesis, an encoder-decoder architecture is proposed for image deblur. Our approach begins with an encoder stage that utilizes improved convolutional layers to reduce redundant features and enhance feature extraction. Subsequently, a more efficient self-attention mechanism is employed for computation, followed by the decoder stage to generate clean images. Through experiments conducted on several benchmark datasets, the proposed method performs best compared with the state-of-the-art methods in both visual and quantitative results.

    摘要 i Abstract ii Acknowledgments iii Contents iv List of Tables vii List of Figures viii Chapter 1 Introduction 1 Chapter 2 Background and Related Works 4 2.1 Overview of Single Image Deblurring 4 2.1.1 Conventional Method 5 2.1.2 Deep learning-based Method 8 2.2 Related Work 10 2.2.1 Vision Transformer 10 2.2.2 Swin Transformer 13 2.2.3 Uformer 14 2.2.4 Stripformer 15 2.2.5 Sharpformer 16 Chapter 3 The Proposed Method 17 3.1 Network Architecture 17 3.1.1 Residual Block 20 3.1.2 Local Transformer Block (LT Block) 23 3.1.3 Global Transformer Block (GT Block) 25 3.1.4 Multi Scale Feature Fusion 27 3.2 Loss Function 28 3.2.1 Contrastive Loss 28 3.2.2 Charbonnier Loss 28 3.2.3 Edge Loss 29 3.2.4 Total Loss 29 3.3 Algorithm Flow 30 3.3.1 Training Stage 30 3.3.2 Testing Stage 31 Chapter 4 Experimental Results 32 4.1 Experimental Datasets 32 4.1.1 GoPro Dataset 32 4.1.2 HIDE Dataset 34 4.1.3 RealBlur Dataset 34 4.2 Implementation Detail 38 4.2.1 Experimental Environment 38 4.2.2 Training Strategy 38 4.3 Evaluation Metrics 39 4.3.1 PSNR 39 4.3.2 SSIM 39 4.4 Experimental Results 40 4.4.1 Quantitative Results on GoPro Dataset 41 4.4.2 Visual Results on GoPro Dataset 41 4.4.3 Quantitative Results on HIDE Dataset 46 4.4.4 Visual Results on HIDE Dataset 46 4.4.5 Quantitative Results on RealBlur Dataset 51 4.3.2 Visual Results on RealBlur Dataset 51 4.5 Ablation Study 56 Chapter 5 Conclusions and Future Works 58 5.1 Conclusion 58 5.2 Future Work 58 References 59

    [1] Chan, Tony F., and Chiu-Kwong Wong. "Total variation blind deconvolution." IEEE transactions on Image Processing 7.3: 370-375, 1998.
    [2] Xu, Li, Shicheng Zheng, and Jiaya Jia. "Unnatural l0 sparse representation for natural image deblurring." Proceedings of the IEEE conference on computer vision and pattern recognition. 2013.
    [3] Pan, Jinshan, et al. "Deblurring images via dark channel prior." IEEE transactions on pattern analysis and machine intelligence 40.10: 2315-2328, 2017.
    [4] Levin, Anat, et al. "Understanding and evaluating blind deconvolution algorithms." 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009.
    [5] Nah, Seungjun, Tae Hyun Kim, and Kyoung Mu Lee. "Deep multi-scale convolutional neural network for dynamic scene deblurring." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [6] Sun, Jian, et al. "Learning a convolutional neural network for non-uniform motion blur removal." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
    [7] Schuler, Christian J., et al. "Learning to deblur." IEEE transactions on pattern analysis and machine intelligence 38.7: 1439-1451, 2015.
    [8] Gong, D.; Yang, J.; Liu, L.; Zhang, Y.; Reid, I.; Shen, C.; Van Den Hengel, A.; Shi, Q. From motion blur to motion flow: A deep learning solution for removing heterogeneous motion blur. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2319–2328, 2017. 
    [9] Nah, S.; Hyun Kim, T.; Mu Lee, K. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3883–3891, 2017.
    [10] Tao, X.; Gao, H.; Shen, X.; Wang, J.; Jia, J. Scale-recurrent network for deep image deblurring. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8174–8182, 2018.
    [11] S. -J. Cho, S. -W. Ji, J. -P. Hong, S. -W. Jung and S. -J. Ko, "Rethinking Coarse-to-Fine Approach in Single Image Deblurring," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4621-4630, 2021.
    [12] Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp. 234–241, 2015.
    [13] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proc. Neural Information Processing Systems. 2017.
    [14] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
    [15] Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833– 1844, 2021. 
    [16] Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17683–17693, 2022.
    [17] Biemond, Jan, Reginald L. Lagendijk, and Russell M. Mersereau. "Iterative methods for image deblurring." Proceedings of the IEEE 78.5: 856-883, 1990.
    [18] Biswas, Prodip, Abu Sufian Sarkar, and Mohammed Mynuddin. "Deblurring images using a Wiener filter." International Journal of Computer Applications 109.7: 36-38, 2015.
    [19] Kundur, Deepa, and Dimitrios Hatzinakos. "Blind image deconvolution." IEEE signal processing magazine 13.3: 43-64, 1996.
    [20] Xu, Li, et al. "Deep convolutional neural network for image deconvolution." Advances in neural information processing systems 27. 2014.
    [21] G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, "Densely Connected Convolutional Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261-2269, 2017.
    [22] Zhang, J.; Pan, J.; Ren, J.; Song, Y.; Bao, L.; Lau, R.W.; Yang, M.H. Dynamic scene deblurring using spatially variant recurrent neural networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2521–2529, 2018.
    [23] Park, D.; Kang, D.U.; Kim, J.; Chun, S.Y. Multi-temporal recurrent neural networks for progressive non-uniform single image deblurring with incremental temporal training. In Proceedings of the European Conference on Computer Vision. Springer, pp. 327–343, 2020. 
    [24] Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8183–8192, 2018.
    [25] Kupyn, O.; Martyniuk, T.; Wu, J.; Wang, Z. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8878–8887, 2019.
    [26] T. -Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, "Feature Pyramid Networks for Object Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, 2017.
    [27] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
    [28] Tsai, Fu-Jen, et al. "Stripformer: Strip transformer for fast image deblurring." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
    [29] Yan, Qingsen, et al. "SharpFormer: learning local feature preserving global representations for image deblurring." IEEE Transactions on Image Processing. 2023.
    [30] Y. Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu, “Dynamic convolution: Attention over convolution kernels,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 11030–11039, 2020.
    [31] H. Gao, X. Tao, X. Shen, and J. Jia, “Dynamic scene deblurring with parameter selective sharing and nested skip connections,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3848–3856, 2019. 
    [32] Kim, Kiyeon, Seungyong Lee, and Sunghyun Cho. "Mssnet: Multi-scale-stage network for single image deblurring." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
    [33] Li, Jiafeng, Ying Wen, and Lianghua He. "Scconv: spatial and channel reconstruction convolution for feature redundancy." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
    [34] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
    [35] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
    [36] D. Glasner, S. Bagon and M. Irani, "Super-resolution from a single image", Proc. IEEE Int. Conf on Computer Vision, pp. 349-356, 2009.
    [37] Han, Dongchen, et al. "Agent Attention: On the Integration of Softmax and Linear Attention." arXiv preprint arXiv:2312.08874. 2023.
    [38] Zamir, Syed Waqas, et al. "Multi-stage progressive image restoration." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
    [39] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1735–1742, 2006.
    [40] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proc. Int’l Conf. Learning Representations. 2015.
    [41] Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. Two deterministic half-quadratic regularization algorithms for computed imaging. In ICIP, 1994. 
    [42] Zhang, Yulun, et al. "Image super-resolution using very deep residual channel attention networks." Proceedings of the European conference on computer vision (ECCV). 2018.
    [43] Lai, Wei-Sheng, et al. "Fast and accurate image super-resolution with deep laplacian pyramid networks." IEEE transactions on pattern analysis and machine intelligence 41.11: 2599-2613, 2018.
    [44] Z. Shen et al., "Human-Aware Motion Deblurring," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5571-5580, 2019.
    [45] Rim, J.; Lee, H.; Won, J.; Cho, S. Real-world blur dataset for learning and benchmarking deblurring algorithms. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer, pp. 184–201, 2020.
    [46] Zamir, Syed Waqas, et al. "Restormer: Efficient transformer for high-resolution image restoration." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
    [47] Chen, Zheng, et al. "Hierarchical integration diffusion model for realistic image deblurring." Advances in Neural Information Processing Systems 36. 2024.
    [48] Chakrabarti, Ayan, "A Neural Approach to Blind Motion Deblurring", 2016 European Conference on Computer Vision, 9907. 221-235, 2016.

    下載圖示 校內:2025-08-01公開
    校外:2025-08-01公開
    QR CODE