| 研究生: |
陳奕叡 Chen, Yi-Jui |
|---|---|
| 論文名稱: |
一個應用於偽裝物件偵測的精煉多注意力網路 A Refined Multi-Attention Network for Camouflaged Object Detection |
| 指導教授: |
戴顯權
Tai, Shen-Chuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 76 |
| 中文關鍵詞: | 偽裝物件偵測 、深度學習 、注意力機制 、特徵融合 |
| 外文關鍵詞: | Camouflaged Object Detection, Deep Learning, Attention Mechanisms, Feature Fusion |
| 相關次數: | 點閱:136 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
偽裝物件偵測是一個具挑戰性的任務,它旨在從其周圍環境中識別和分割偽裝物體。由於偽裝物體通常與周圍環境在顏色、紋理或形狀上高度相似,這使得區分它們和背景成為一項複雜的視覺處理過程。
本論文提出了一個為偽裝目標分割設計的改良型模型架構,這個架構使用一個多層次的注意力引導融合模組,該模組結合了全局注意力、局部注意力和反向注意力機制,以提高模型對偽裝物體的識別能力。其次,引入了鄰接連接解碼器和跨尺度特徵聚合模組,以分別處理多尺度特徵和優化特徵融合過程。實驗結果顯示,本文提出的方法能以較少的參數量以及計算量達到了與其他方法相近的表現。
Camouflaged Object Detection (COD) is a challenging task that aims to identify and segment camouflaged objects from their surrounding environment. The high degree of color, texture, or shape similarity between camouflaged objects and their backgrounds makes distinguishing them a complex visual processing task.
This Thesis proposes a refined model architecture designed to segment camouflaged targets. It utilizes a Triple Attention Guided Fusion (TAGF) module, which integrates global attention, local attention, and reverse attention mechanisms to enhance the model's ability to recognize camouflaged objects. Furthermore, the Neighbor Connection Decoder (NCD) and Cross-Scale Feature Fusion (CSFF) modules are introduced to handle multi-scale features and optimize the feature fusion process. Experimental results demonstrate that the method proposed in this Thesis achieves results that are as impressive as those of other methods while utilizing fewer parameters and computational resources.
[1] D.-P. Fan et al., "Camouflaged object detection." in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 2777–2787.
[2] Y. Lv et al., “Simultaneously localize, segment and rank the camouflaged objects.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 11591–11601.
[3] H. Xing, S. Gao, Y. Wang, X. Wei, H. Tang and W. Zhang, "Go Closer to See Better: Camouflaged Object Detection via Object Area Amplification and Figure-Ground Conversion." in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 10, pp. 5444-5457, Oct. 2023.
[4] Y. Liu, H. Li, J. Cheng and X. Chen, "MSCAF-Net: A General Framework for Camouflaged Object Detection via Learning Multi-Scale Context-Aware Features." in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 4934-4947, Sept. 2023.
[5] M. B. Neider and G. J. Zelinsky, “Searching for camouflaged targets: Effects of target-background similarity on visual search.” Vision Research, vol. 46, no. 14, pp. 2217–2235, Jul. 2006.
[6] L. Song and W. Geng, “A new camouflage texture evaluation method based on WSSIM and nature image features.” in Proc. Int. Conf. Multimedia Technol., Oct. 2010, pp. 1–4.
[7] H. Guo et al., “A robust foreground segmentation method by temporal averaging multiple video frames.” in Proc. Int. Conf. Audio, Lang. Image Process. (ICALIP), 2008, pp. 878–882.
[8] M. Galun, E. Sharon, R. Basri, and A. Brandt, “Texture segmentation by multiscale aggregation of filter responses and shape elements.” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2003, pp. 716–723.
[9] N. U. Bhajantri and P. Nagabhushan, “Camouflage defect identification: A novel approach.” in Proc. 9th Int. Conf. Inf. Technol. (ICIT), Dec. 2006, pp. 145–148.
[10] X. Zhang et al., “A Bayesian approach to camouflaged moving object detection.” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 9, pp. 2001–2013, Sep. 2017.
[11] W. R. Boot et al., “Training and transfer of training in the search for camouflaged targets.” Attention, Perception, & Psychophysics, vol. 71, no. 4, pp. 950–963, May. 2009.
[12] T. E. Boult et al., “Into the woods: Visual surveillance of noncooperative and camouflaged targets in complex outdoor settings.” Proceedings of the IEEE, vol. 89, no. 10, pp. 1382–1402, Oct. 2001.
[13] Y. Pan et al., “Study on the camouflaged target detection method based on 3D convexity.” Modern Applied Science, vol. 5, no. 4, pp. 152–157, Aug. 2011.
[14] S. Li et al., “A fusion framework for camouflaged moving foreground detection in the wavelet domain.” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 3918–3930, Aug. 2018.
[15] J. Y. Y. H. Wendi Hou Jinping Li, “Detection of the mobile object with camouflage color under dynamic background based on optical flow.” Proceedings of Engineering, vol. 15, pp. 2201–2205, Dec. 2011.
[16] Y. Beiderman, M. Teicher, J. Garcia, V. Mico, and Z. Zalevsky, “Optical technique for classification, recognition and identification of obscured objects.” Optical Communications, vol. 283, no. 21, pp. 4274–4282, Nov. 2010.
[17] S. Kim, “Unsupervised spectral-spatial feature selection-based camouflaged object detection using VNIR hyperspectral camera.” Scientific World Journal, vol. 2015, Mar. 2015, Art. no. 834635.
[18] A. Mittal and N. Paragios, “Motion-based background subtraction using adaptive kernel density estimation.” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2004, pp. 1–3.
[19] I. Huerta, D. Rowe, M. Mozerov, and J. Gonzàlez, “Improving background subtraction based on a casuistry of colour-motion segmentation problems.” in Proc. Iberian Conf. Pattern Recognit. Image Anal. (IbPRIA), 2007, pp. 475–482.
[20] P. Siricharoen, S. Aramvith, T. H. Chalidabhongse, and S. Siddhichai, “Robust outdoor human segmentation based on color-based statistical approach and edge combination.” in Proc. Int. Conf. Green Circuits Syst., Jun. 2010, pp. 463–468.
[21] T.-N. Le, T. V. Nguyen, Z. Nie, M.-T. Tran, and A. Sugimoto, “Anabranch network for camouflaged object segmentation.” Computer Vision and Image Understanding, vol. 184, pp. 45–56, Jul. 2019.
[22] J. Ren et al., "Deep Texture-Aware Features for Camouflaged Object Detection." in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1157-1167, March 2023.
[23] H. Mei, G.-P. Ji, Z. Wei, X. Yang, X. Wei, and D.-P. Fan, “Camouflaged object segmentation with distraction mining.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 8772–8781.
[24] Q. Zhai, X. Li, F. Yang, C. Chen, H. Cheng, and D.-P. Fan, “Mutual graph learning for camouflaged object detection.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 12997–13007.
[25] K. Wang, H. Bi, Y. Zhang, C. Zhang, Z. Liu and S. Zheng, "D2C-Net: A Dual-Branch, Dual-Guidance and Cross-Refine Network for Camouflaged Object Detection." in IEEE Transactions on Industrial Electronics, vol. 69, no. 5, pp. 5364-5374, May 2022.
[26] D. -P. Fan, G. -P. Ji, M. -M. Cheng and L. Shao, "Concealed Object Detection." in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6024-6042, 1 Oct. 2022.
[27] Alex Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network.” Physica D: Nonlinear Phenomena, vol. 404, Mar. 2020, Art. no. 132306.
[28] S. Chen, X. Tan, B. Wang, and X. Hu, “Reverse attention for salient object detection.” in Proc. Eur. Conf. Comput. Vis. (ECCV), Jul. 2018, pp. 234–250.
[29] H. Fan et al., “Multiscale vision transformers.” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Oct. 2021, pp. 6824–6835.
[30] K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in transformer.” in NeurIPS, Dec. 2021.
[31] J. Ho, N. Kalchbrenner, D. Weissenborn, and T. Salimans, “Axial Attention in Multidimensional Transformers.” arXiv:1912.12180 [cs.CV], Dec. 2019.
[32] A. Vaswani et al., “Scaling local self-attention for parameter efficient visual backbones.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 12894–12904.
[33] P. Zhang et al., “Multi-scale vision longformer: A new vision transformer for high-resolution image encoding.” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 2978–2988.
[34] R. Pappagari, P. Zelasko, J. Villalba, Y. Carmiel, and N. Dehak, “Hierarchical transformers for long document classification.” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 838–844.
[35] H. Wu et al., “CvT: Introducing convolutions to vision transformers.” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 22–31.
[36] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows.” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9992–10002.
[37] D.-P. Fan, G.-P. Ji, G. Sun, M.-M. Cheng, J. Shen, and L. Shao, “Camouflaged object detection.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 2774–2784.
[38] Z. Wu, L. Su, and Q. Huang, “Cascaded partial decoder for fast and accurate salient object detection.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3902–3911.
[39] M. Zhuge, X. Lu, Y. Guo, Z. Cai, and S. Chen, "Cubenet: X-shape connection for camouflaged object detection." Pattern Recognition, vol. 127, 2022, Art. no. 108644.
[40] P. Skurowski, H. Abdulameer, J. Błaszczyk, T. Depta, A. Kornacki, and P. Kozieł, “Animal camouflage analysis: Chameleon database.” Unpublished manuscript, vol. 2, no. 6, p. 7, 2018.
[41] X. Yang, H. Zhu, G. Mao and S. Xing, "OAFormer: Occlusion Aware Transformer for Camouflaged Object Detection." in 2022 IEEE International Conference on Multimedia and Expo (ICME), 2022, pp. 1421-1426.
[42] L. Yuan et al., “Tokens-to-token ViT: Training vision transformers from scratch on ImageNet.” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 558–567.
[43] P. Li, X. Yan, H. Zhu, M. Wei, X.-P. Zhang, and J. Qin, “FindNet: Can you find me? Boundary-and-texture enhancement network for camouflaged object detection.” IEEE Transactions on Image Processing, vol. 31, pp. 6396–6411, Oct. 2022.
[44] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need." in NeurIPS, vol. 30, 2017.
[45] D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, "Structure-measure: A new way to evaluate foreground maps." in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 4548–4557.
[46] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., "An image is worth 16x16 words: Transformers for image recognition at scale." in ICLR, May. 2021.
[47] N. Parmar, A. Vaswani, J. Uszkoreit, Ł. Kaiser, N. Shazeer, A. Ku, and D. Tran, "Image transformer." in ICML, Jul. 2018.
[48] T. Yao, Y. Pan, Y. Li, C. Ngo, and T. Mei, “Wave-vit: Unifying wavelet and transformers for visual representation learning.” in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2022, pp. 328–345.
[49] Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L., "Pvtv2: Improved baselines with pyramid vision transformer.:, arXiv preprint arXiv:2106.13797 [cs.CV], Jul. 2021.
[50] W. Wang et al., "Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions." in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 548-558.
[51] Q. Zhang, X. Sun, Y. Chen, Y. Ge and H. Bi, "Attention-induced semantic and boundary interaction network for camouflaged object detection.", Comput. Vis. Image Understand., vol. 233, Aug. 2023.
[52] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 2337–2346.
[53] G.-P. Ji, D.-P. Fan, Y.-C. Chou, D. Dai, A. Liniger, and L. Van Gool, "Deep gradient learning for efficient camouflaged object detection." Machine Intelligence Research, vol. 20, pp. 92-108, Jan. 2023.
[54] D.P. Fan, C. Gong, Y. Cao, B. Ren, M.M. Cheng, and A. Borji, "Enhanced-alignment measure for binary foreground map evaluation." in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Jul. 2018, pp. 698-704.
[55] R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?.” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 248–255.
[56] F. Perazzi, P. Krahenbühl, Y. Pritch, and A. Hornung, "Saliency filters: Contrast based filtering for salient region detection." in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2012, pp. 733–740.