簡易檢索 / 詳目顯示

研究生: 鍾皓宇
Chung, Hao-Yu
論文名稱: 一個應用於單影像雨紋移除的雙分支階層式通道維度注意力網路
A Two-branch Hierarchical Channel-wise Attention Network for Single Image Deraining
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 62
中文關鍵詞: 單張影像除雨雨紋移除自注意力機制卷積神經網路
外文關鍵詞: single image deraining, rain streak removal, self-attention, convolutional neural network
相關次數: 點閱:123下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在雨天拍攝的影像會被雨紋所干擾,進而影響物體辨識、影像分割等電腦視覺任務的效能,因此設計一個影像除雨演算法用來作為高階視覺任務的前處理十分重要。
    現有的影像除雨方法大多基於卷積神經網路,然而此方法難以獲取影像全局資訊,也無法配合輸入影像動態調整權重。近年來基於自注意力的方法逐漸興起,然而過於龐大的運算量造成此方法在影像修復領域的發展受到限制,因此許多研究嘗試設計出更有效的自注意力運算。本篇論文結合以上兩種方法提出一個混合卷積以及自注意力的單張影像除雨網路,並且在目前具有代表性的資料集進行測試,和目前主流的影像除雨方法相比,本論文所提出的方法在多個評估方法以及視覺效果上都有最佳的表現。

    Images captured in rainy weather are often degraded by rain streaks, which affects the performance of computer vision applications, such as object detection, image segmentation, etc. Therefore, it is important to develop an image deraining algorithm as a pre-processing strategy for high-level vision tasks.
    Most of the existing methods are based on convolutional neural networks, which suffer from aggregating global information and lack the flexibility of content adaptation. Recently, various methods based on self-attention are developed. However, huge computational cost limits its development in image restoration tasks. Therefore, there are some alternative architectures designed for solving this problem. In this Thesis, a single image deraining network is developed based on convolution layers and self-attention. The proposed method is trained on several benchmark datasets and performs best in both quantitative and visual results compared with the state-of-the-art methods.

    摘 要 i Abstract ii Acknowledgments iii Contents iv Lists of Tables vi Lists of Figures vii Chapter 1 Introduction 1 Chapter 2 Background and Related Works 4 2.1 Single Image Deraining 4 2.1.1 Rain model 4 2.1.2 model-based method 5 2.1.3 learning-based method 5 2.2 Vision Transformer 7 2.3 Efficient Transformer 10 2.4 Swin Transformer 12 2.5 Pyramid Vision Transformer 14 2.6 Selective Kernel Network 15 2.7 MobileNet 16 Chapter3 The Proposed Algorithm 18 3.1 Propose Network architecture 18 3.1.1 Tranformer Block 22 3.1.2 Selective Kernel Fusion 25 3.1.3 Fusion Transformer Block 26 3.2 loss function 28 3.2.1 L1 loss 28 3.2.2 L2 loss 28 3.3 Algorithm flow 30 3.3.1 training stage 30 3.3.2 testing stage 31 Chapter4 Experimental Result 32 4.1 Experimental dataset 32 4.2 Experimental settings 35 4.2.1 Experimental environment 35 4.2.2 Training strategy 35 4.3 Experimental settings 36 4.3.1 PSNR 36 4.3.2 SSIM 36 4.4 Experimental Results 37 4.4.1 Quantitative results 37 4.4.2 Visual comparisons 39 4.5 Ablation Experimental Results 50 4.6 Application 52 Chapter5 Discussion, Conclusion and Future Work 54 5.1 Discussion 54 5.2 Conclusion 55 5.3 Future Work 55 References 56

    [1]Kshitiz Garg and S. K. Nayar. Detection and removal of rain from videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages I–I, 2004.
    [2]L. W. Kang, C. W. Lin, and Y. H. Fu. Automatic single image-based rain streaks removal via image decomposition. IEEE Transactions on Image Processing, 21(4):1742–1755, 2012.
    [3]Yu Li. Rain streak removal using layer priors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2736–2744, 2016.
    [4]Luo Yu, Xu Yong, and Ji Hui. Removing rain from a single image via discriminative sparse coding. In Proceedings of the IEEE International Conference on Computer Vision, pages 3397–3405, 2015.
    [5]W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1357–1366, 2017.
    [6]Xueyang Fu, Jiabin Huang, Delu Zeng, Huang Yue, Xinghao Ding, and John Paisley. Removing rain from single images via a deep detail network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3855–3863, 2017.
    [7]He Zhang and Vishal M Patel. Density-aware single image de-raining using a multi-stream dense network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 695–704, 2018.
    [8]Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin Zha. Recurrent squeeze-and-excitation context aggregation net for single image deraining. In Proceedings of the European Conference on Computer Vision, pages 254–269, 2018.
    [9]Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. Progressive image deraining networks: a better and simpler baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3937–3946, 2019.
    [10]Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12270–12279, 2019.
    [11]Hong Wang, Qi Xie, Qian Zhao, and Deyu Meng. A modeldriven deep neural network for single image rain removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3103–3112, 2020.
    [12]Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Baojin Huang, Yimin Luo, Jiayi Ma, and Junjun Jiang. Multi-scale progressive fusion network for single image deraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8346–8355, 2020.
    [13]Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14821–14831, 2021.
    [14]Qiaosi Yi, Juncheng Li, Qinyan Dai, Faming Fang, Guixu Zhang, and Tieyong Zeng. Structure-preserving deraining with residue channel prior guidance. In Proceedings of the IEEE International Conference on Computer Vision, pages 4238– 4247, 2021.
    [15]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
    [16]A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
    [17]Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021
    [18]Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833– 1844, 2021.
    [19]Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17683–17693, 2022.
    [20]Jie Xiao, Xueyang Fu, Aiping Liu, Feng Wu, and ZhengJun Zha. Image de-raining transformer. IEEE Trans. Pattern Anal. Mach. Intell., pages 1–18, 2022.
    [21]Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
    [22]Alaaeldin Ali, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, et al. Xcit: Cross-covariance image transformers. Advances in neural information processing systems, 34:20014–20027, 2021.
    [23]Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. Rethinking attention with performers. arXiv preprint arXiv:2009.14794, 2020.
    [24]Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and Franc¸ois Fleuret. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning (ICML), pages 5156–5165, July 2020.
    [25]Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5728– 5739, 2022.
    [26]Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 568–578, 2021.
    [27]Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8(3):415–424, 2022.
    [28]Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
    [29]Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. Selective kernel networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 510–519, 2019.
    [30]Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
    [31]Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
    [32]Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1314–1324, 2019.
    [33]Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for real image restoration and enhancement. In European Conference on Computer Vision, pages 492–511. Springer, 2020.
    [34]Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for fast image restoration and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1934–1948, 2022.
    [35]T. Wang, K. Zhang, T. Shen, W. Luo, B. Stenger, and T. Lu, “Ultra-high-definition low-light image enhancement: A benchmark and transformer based method,” arXiv preprint arXiv:2212.11548, 2022.
    [36]Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32:1927–1941, 2023.
    [37]Yubo Dong, Dahua Gao, Tian Qiu, Yuyan Li, Minxi Yang, and Guangming Shi. Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging. arXiv preprint arXiv:2211.06891, 2022.
    [38]Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. In Proceedings of European Conference on Computer Vision, pages 17–33. Springer, 2022.
    [39]Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 933–941. JMLR. org, 2017.
    [40]Xiang Chen, Hao Li, Mingqiang Li, and Jinshan Pan. Learning a sparse transformer network for effective image deraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5896–5905, 2023.
    [41]Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, and Hongsheng Li. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3531–3539, 2021.
    [42]Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, ´ Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
    [43]Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Radu Timofte, and Luc Van Gool. Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 745–755, June 2022.
    [44]Weijian Xu, Yifan Xu, Tyler Chang, and Zhuowen Tu. Co-scale conv-attentional image transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9981–9990, 2021.
    [45]Ali Hatamizadeh, Hongxu Yin, Jan Kautz, and Pavlo Molchanov. Global context vision transformers. arXiv preprint arXiv:2206.09959, 2022.
    [46]Yuanchu Liang, Saeed Anwar, and Yang Liu. Drt: A lightweight single image deraining recursive transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 589–598, June 2022.
    [47]Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, and Shuicheng Yan. Inception transformer. arXiv preprint arXiv:2205.12956, 2022.
    [48]S.-I. Jang, T. Pan, Y. Li, P. Heidari, J. Chen, Q. Li, and K. Gong, “Spach transformer: Spatial and channel-wise transformer based on local and global self-attentions for PET image denoising,” arXiv preprint arXiv:2209.03300, 2022.
    [49]Pinjun Luo, Guoqiang Xiao, Xinbo Gao, and Song Wu. Lkd-net: Large kernel convolution network for single image dehazing. arXiv preprint arXiv:2209.01788, 2022.
    [50]Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging, 3(1):47–57, 2016.
    [51]Namuk Park and Songkuk Kim. How do vision transformers work? arXiv preprint arXiv:2202.06709, 2022.
    [52]Google Vision API: https://cloud.google.com/vision

    下載圖示 校內:2024-09-01公開
    校外:2024-09-01公開
    QR CODE