簡易檢索 / 詳目顯示

研究生: 蘇瑜婕
Su, Yu-Chieh
論文名稱: 一個用於單張陰影去除的增強型注意力網路
An Enhanced Attention Network for Single-Image Shadow Removal
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 63
中文關鍵詞: 深度學習陰影去除TransformerAttention module
外文關鍵詞: Transformer, Attention module, Deep Learning, Shadow Removal
相關次數: 點閱:61下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度學習的方法在陰影去除問題上獲得良好的結果。傳統的卷積神經網路在處理陰影問題時由於其卷積核固定的感受野會導致捕捉上下文的能力受到限制,進而影響陰影去除的效果。同時,現有的陰影去除方法主要針對陰影或陰影區域區域內進行局部性的優化,這會導致陰影區與非陰影區的顏色不一致,在陰影邊界處容易出現明顯的偽影,而陰影去除的關鍵為整張影像全局色調的統一。
    為解決此問題,本研究提出了一種全局優化演算法。此方法採用了基於Transformer的架構,並透過新型的注意力模組來捕捉全局資訊,強化了模型對於空間上長距離依賴的處理能力。這不僅有助於模型更深入理解圖像中的上下文資訊,進一步提升陰影去除的效果,在多個測試資料集上超越現有方法的表現。

    Deep learning approaches have outperformed traditional methods in the task of shadow removal. Traditional methods, mostly those based on convolutional neural networks (CNNs), are limited in their ability to capture large amounts of contextual information due to the fixed receptive fields of their convolutional kernels. Furthermore, existing shadow removal techniques are mainly focused on optimizing local shadows or non-shadowed regions, which often leads to artifacts along shadow boundaries and color inconsistencies between shadowed and non-shadowed regions, failing to maintain uniform color tones throughout the image.
    In this Thesis, a global optimization strategy is proposed to address these problems in order to provide a more effective shadow removal network. The proposed method is a transformer-based architecture with a unique multi-scale attention module to capture global information. This module greatly enhances the model's ability of long-distance spatial relationships, leading to a greater understanding of image contextual information and improved shadow removal. Additionally, it outperforms previous approaches across different test datasets without simultaneously raising the parameter.

    中文摘要 i Abstract ii Acknowledgements iii Contents iv List of Tables vi List of Figures vii 1 Introduction 1 2 Background and Related Works 4 2.1 Vision Transformer (ViT) 4 2.1.1 Multi-Head Attention 5 2.2 Swin Transformer 6 2.2.1 W-MSA (Window Multi-Head Self Attention) 6 2.2.2 SW-MSA (Shifted Windows Multi-Head Self-Attention) 7 2.3 Uformer 7 2.4 ShadowFormer 9 3 Proposed Algorithm 11 3.1 Proposed network architecture 11 3.2 Encoder and decoder architecture 14 3.3 Multi-Scale Attention Block (MAB) 15 3.3.1 Multi-Scale Large Kernel Attention (MLKA) 15 3.3.2 Gate Spatial Attention Unit (GSAU) 18 3.3.3 Shuffle Attention (SA) 20 3.4 Agent Attention Module 22 3.4.1 Agent Attention (AG) 23 3.5 Loss Function 26 3.6 Algorithm flow 27 3.6.1 Training stage 27 3.6.2 Testing stage 29 4 Experimental Results 30 4.1 Experimental Dataset 30 4.2 Experimental Setting 33 4.3 Experimental Results 34 4.3.1 Evaluation metrics 34 4.3.2 Quantitative Results 37 4.3.3 Visual results comparison 40 4.4 Ablation Experimental Results 46 5 Conclusions and Future works 49 5.1 Conclusions 49 5.2 Future works 49 References 51

    [1] L. Zhang, Q. Zhang, and C. Xiao, ”Shadow remover: Image shadow removal based on illumination recovering optimization,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4623–4636, 2015.
    [2] M. Gryka, M. Terry, and G. J. Brostow, ”Learning to remove soft shadows,” ACM Transactions on Graphics (TOG), vol. 34, no. 5, pp. 1–15, 2015.
    [3] R. Guo, Q. Dai, and D. Hoiem, ”Paired regions for shadow detection and removal,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2956–2967, 2012.
    [4] L. Guo, Z. Zha, S. Ravishankar, and B. Wen, ”Self-convolution: A highly-efficient operator for non-local image restoration,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1860–1864, 2021.
    [5] L. Guo, Z. Zha, S. Ravishankar, and B. Wen, ”Exploiting non-local priors via self-convolution for highly-efficient image restoration,” IEEE Transactions on Image Processing, vol. 31, pp. 1311–1324, 2022.
    [6] H. Le and D. Samaras, ”Shadow removal via shadow image decomposition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8578–8587, 2019.
    [7] L. Qu, J. Tian, S. He, Y. Tang, and R. W. H. Lau, ”DeshadowNet: A multicontext embedding deep network for shadow removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4067–4075, 2017.
    [8] X. Cun, C.-M. Pun, and C. Shi, ”Towards ghost-free shadow removal via dual hierarchical aggregation network and shadow matting GAN,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 10680–10687, 2020.
    [9] A. Dosovitskiy, ”An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
    [10] J. Wan, H. Yin, Z. Wu, X. Wu, Z. Liu, and S. Wang, ”CRFormer: A cross-region transformer for shadow removal,” arXiv preprint arXiv:2207.01600, 2022.
    [11] L. Guo, S. Huang, D. Liu, H. Cheng, and B. Wen, ”Shadowformer: Global context helps image shadow removal,” arXiv preprint arXiv:2302.01650, 2023.
    [12] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, ”Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021.
    [13] Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, ”Uformer: A general u-shaped transformer for image restoration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17683–17693, 2022.
    [14] O. Ronneberger, P. Fischer, and T. Brox, ”U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, Springer, pp. 234–241, 2015.
    [15] Y. Wang, Y. Li, G. Wang, and X. Liu, ”Multi-scale attention network for single image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5950–5960, 2024.
    [16] Q.-L. Zhang and Y.-B. Yang, ”SA-Net: Shuffle attention for deep convolutional neural networks,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 2235–2239, 2021.
    [17] D. Han, T. Ye, Y. Han, Z. Xia, S. Song, and G. Huang, ”Agent attention: On the integration of softmax and linear attention,” arXiv preprint arXiv:2312.08874, 2023.
    [18] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, ”Mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.
    [19] J. Wang, X. Li, and J. Yang, ”Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1788–1797, 2018.
    [20] Y. Liu, Q. Guo, L. Fu, Z. Ke, K. Xu, W. Feng, I. W. Tsang, and R. W. H. Lau, ”Structure-informed shadow removal networks,” IEEE Transactions on Image Processing, 2023.

    無法下載圖示 校內:2029-08-07公開
    校外:2029-08-07公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE