簡易檢索 / 詳目顯示

研究生: 柯念佐
Ke, Nian-Zuo
論文名稱: 一個應用於共同顯著物件偵測的改良型注意力網路
An Improved Group Attention Network for Co-Salient Object Detection
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 49
中文關鍵詞: 共同顯著對象檢測顯著對象檢測注意力機制深度學習
外文關鍵詞: Co-salient Object Detection, CoSOD, Salient Object Detection, SOD, attention mechanism, deep learning
相關次數: 點閱:155下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 共同顯著對象檢測(Co-Salient Object Detection)旨在透過物件的相似性來發現重複出現在多個相關圖像中的共同顯著對象。由於其探索圖像中相似性與顯著性的特性,被廣泛應用在多種視覺任務上的前處理步驟。與顯著對象檢測(Salient Object Detection)相比,必須同時兼顧顯著性與共同性,由於不同圖像間的複雜變化,這是一個具有挑戰性的任務。
    為了抑制複雜背景的冗餘,進而提升影響尋找共同顯著目標的準確性,本論文提出了一個以顯著目標圖像作為先驗指導的改良模型架構。由交叉注意力機制與顯著目標檢測模塊能有效改善複雜背景所造成的影響,藉此更好的尋找共同顯著目標。所提出方法使用的訓練資料集為DUTS,測試資料集為CoCA、CoSOD3k和Cosal2015。實驗結果可以顯示,本篇論文提出的方法能在共同顯著對象檢測的議題上提供一定的幫助。

    Common Salient Object Detection(CoSOD) aims to discover salient objects that are common across multiple images through object similarity. Because of its property of exploring global similarity and local saliency in a set of images, it is widely used as a pre-processing step for various visual tasks, such as weakly supervised semantic segmentation, image surveillance, and video analysis. Compared to Salient Object Detection(SOD), these two properties must be taken into account. It is a challenging task due to the complex variation between images.
    To suppress the redundancy of complex backgrounds and further improve the accuracy of finding common salient objects, this Thesis proposes an improved model framework using a saliency map as a priori guidance. The cross-attention mechanism and the saliency map can effectively explore co-salient objects from complex backgrounds. The proposed method uses the training dataset DUTS and the testing datasets CoCA, CoSOD3k, and Cosal2015. The experimental results can show that the proposed method can be helpful for co-salient object detection.

    摘 要 i Acknowledgments iii Contents iv List of Tables vi List of Figures vii Chapter 1 Introduction 1 Chapter 2 Background and Related Works 4 2.1 Background 4 2.1 Recently deep-based models 6 2.1.1 GICD 6 2.1.2 CoEGNet 7 2.1.3 GCoNet 7 2.1.4 RCAN 9 2.1.5 CSMG 10 2.1.6 GCAGC-CSD 11 2.2 VGG16 13 2.3 Attention 14 2.3.1 Self Attention 14 2.3.2 Cross Attention 15 2.3.3 Channel and Spatial Attention 16 Chapter 3 The Proposed Algorithm 19 3.1 Proposed Network Architecture 20 3.2 Proposed background redundancy reducing method 21 3.2.1 Cross Attention Module 22 3.2.2 Saliency Map 23 3.4 Loss Function 24 3.4.1 IoU loss 24 3.4.2 BCE Loss 25 3.4.3 SSIM Loss 25 3.4.4 Dice Loss 26 3.4.5 Proposed Total Loss 26 Chapter 4 Experimental Results 28 4.1 Experimental Dataset 28 4.2 Parameter and Experimental Setting 32 4.3 Evaluation Metric 33 4.4 Experimental Results of Simulated Images 35 4.5 Ablation Experimental Result 40 Chapter 5 Conclusion and Future Work 45 5.1 Conclusion 45 5.2 Future Work 45 References 46

    [1] Wei, Y., et al.: Stc: a simple to complex framework for weakly-supervised semantic segmentation. IEEE TPAMI 39(11), 2314–2320 (2016).
    [2] Zeng, Y., Zhuge, Y., Lu, H., Zhang, L.: Joint learning of saliency detection and weakly supervised semantic segmentation. In: ICCV, pp. 7223–7233 (2019).
    [3] Gao, Z., Xu, C., Zhang, H., Li, S., de Albuquerque, V.H.C.: Trustful internet of surveillance things based on deeply-represented visual co-saliency detection. IEEE Internet Things J. 7, 4092–4100 (2020).
    [4] Luo, Y., Jiang, M., Wong, Y., Zhao, Q.: Multi-camera saliency. IEEE TPAMI 37(10), 2057–2070 (2015).
    [5] Jerripothula, K.R., Cai, J., Yuan, J.: CATS: co-saliency activated tracklet selection for video co-localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 187–202. Springer, Cham (2016). https://doi. org/10.1007/978-3-319-46478-7 12.
    [6] Jerripothula, K.R., Cai, J., Yuan, J.: Efficient video object co-localization with co-saliency activated tracklets. IEEE TCSVT 29(3), 744–755 (2018).
    [7] Zhao Zhang, Wenda Jin, Jun Xu, and Ming-Ming Cheng. Gradient-induced co-saliency detection. In ECCV, 2020.
    [8] Deng-Ping Fan, Tengpeng Li, Zheng Lin, Ge-Peng Ji, Dingwen Zhang, Ming-Ming Cheng, Huazhu Fu, and Jianbing Shen. Re-thinking co-salient object detection. TPAMI, 2021.
    [9] Dingwen Zhang, Junwei Han, Chao Li, Jingdong Wang, and Xuelong Li. Detection of co-salient objects by looking deep and wide. IJCV, 120(2):215–232, 2016.
    [10] Jia-Xing Zhao, Jiang-Jiang Liu, Deng-Ping Fan, Yang Cao, Jufeng Yang, and Ming-Ming Cheng. EGNet: Edge Guidance Network for Salient Object Detection. In IEEE ICCV, pages 8779–8788, 2019.
    [11] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. In IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 2921–2929.
    [12] Qi Fan, Deng-Ping Fan, Huazhu Fu, Chi-Keung Tang, Ling Shao, Yu-Wing Tai. Group Collaborative Learning for Co-Salient Object Detection. In CVPR, 2021.
    [13] Bo Li, Zhengxing Sun, Lv Tang, Yunhan Sun, and Jinlong Shi. Detecting robust co-saliency with recurrent co-attention neural network. In IJCAI, 2019.
    [14] Kaihua Zhang, Tengpeng Li, Bo Liu, and Qingshan Liu. Cosaliency detection via mask-guided fully convolutional networks with multi-scale label smoothing. In CVPR, 2019.
    [15] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    [16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of CVPR, pages 1–9, 2015. arxiv.org/abs/1409.4842.
    [17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [18] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.
    [19] R. Hou, H. Chang, M. Bingpeng, S. Shan, and X. Chen, Cross Attention Network For Few-shot Classification. In NeurIPS, 2019.
    [20] J. Shi, Q. Yan, L. Xu, and J. Jia. Hierarchical image saliency detection on extended CSSD. TPAMI, 2016.
    [21] Optimizing intersection-over-union in deep neural networks for image segmentation. In International Symposium on Visual Computing, pages 234–244. Springer, 2016.
    [22] Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein. A tutorial on the cross-entropy method. Annals OR, 134(1):19–67, 2005.
    [23] Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. IEEE, 2003.
    [24] Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3D Vision. pp. 565–571. IEEE, 2016.
    [25] Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, Dong Wang, Baocai Yin, and Xiang Ruan. Learning to detect salient objects with image-level supervision. In CVPR, 2017.
    [26] Dingwen Zhang, Junwei Han, Chao Li, Jingdong Wang, and Xuelong Li. Detection of co-salient objects by looking deep and wide. IJCV, 120(2):215–232, 2016.
    [27] Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
    [28] Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: IJCAI, pp. 698–704, 2018.
    [29] Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. Structure-measure: A new way to evaluate foreground maps. In ICCV, 2017.
    [30] Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. Frequency-tuned salient region detection. In CVPR, 2009.
    [31] Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, and Nigel Crook. Efficient salient region detection with soft image abstraction. In ICCV, 2013.
    [32] John Winn, Antonio Criminisi, and Thomas Minka. Object categorization by learned universal visual dictionary. In ICCV, 2005.
    [33] Dhruv Batra, Adarsh Kowdle, Devi Parikh, Jiebo Luo, and Tsuhan Chen. icoseg: Interactive co-segmentation with intelligent scribble guidance. In CVPR, 2010.
    [34] T. Li, K. Zhang, S. Shen, B. Liu, Q. Liu, and Z. Li, Image co-saliency detection and instance co-segmentation using attention graph clustering based graph convolutional network, IEEE Transactions on Multimedia, 2022.
    [35] Tsung-Yi Lin, Piotr Doll´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.
    [36] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon. CBAM: Convolutional block attention module. arXiv preprint arXiv:1807.06521, 2018.
    [37] Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation networks. CVPR, 2018.
    [38] X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, and M. Jagersand, “BASNet: Boundary-aware salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. In CVPR,2019, pp. 7479–7489.
    [39] Qijian Zhang, Runmin Cong, Junhui Hou, Chongyi Li, and Yao Zhao. CoADNet: Collaborative aggregation-and-distribution networks for co-salient object detection. In NeurIPS, 2020.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE