| 研究生: |
張承浤 Chang, Cheng-Hong |
|---|---|
| 論文名稱: |
應用於組合零樣本學習的注意力機制及權重調整 Self-Attention with State-Object Weighted Combination for Compositional Zero Shot Learning |
| 指導教授: |
蔡佩璇
Tsai, Pei-Hsuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 製造資訊與系統研究所 Institute of Manufacturing Information and Systems |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 28 |
| 中文關鍵詞: | 組合零樣本學習 、狀態-物件辨識 、注意力機制 |
| 外文關鍵詞: | Compositional Zero Shot Learning, state-object recognition, self-attention |
| 相關次數: | 點閱:85 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前,物件辨識在工業、農業等領域有著廣泛的應用。然而,大多數現有的應用僅限於單獨辨識物件,而不考慮物件的相關狀態。能夠同時辨識狀態和物件(組合)的技術,仍然不太常見。解決這個問題的一種方法是在訓練期間將“狀態和物件”視為一個類別。然而,這種方法在資料蒐集和訓練會有許多困難之處,因為它需要所有組合的全面資料才能夠訓練。
組合零樣本學習(CZSL)通過在訓練期間將狀態和物件視為單獨的類別來學習。即使沒有所有組合的資料,CZSL也能夠辨識沒有出現在訓練資料中的組合。當前State-Of-The-Art (SOTA)方法 KG-SP 利用兩個分類器來分別訓練狀態和物件,並利用語意模型來評估狀態和物件組成的合理性。然而,KG-SP的準確性還有提升的空間,而且在組成組合時沒有考慮到狀態和物件的差異性,以相同的權重看待狀態和物件。
在此研究中提出了SASOW,這是KG-SP的增強版,SASOW考慮狀態和對象的權重,同時提高構圖識別的準確性。 首先,將自注意力機制引入狀態和物件的分類器中,從而提高了辨識兩者的準確性。 此外,在組成組合的過程中考慮了狀態和對象的權重,以生成更合理和準確的組合。
本研究在三個基準資料集上驗證了 SASOW 的有效性,實驗結果表明 SASOW 取得了有競爭力的性能。 與最先進的方法 KG-SP 相比,SASOW 在 MIT-States、UT Zappos 和 C-GQA 上未見成分的準確度分別提高了 2.1%、1.7% 和 0.4% 數據集。
Currently, object recognition has found widespread applications in industries, agriculture, and other fields. However, most existing applications are limited to identifying objects alone, without considering their associated states. The ability to recognize both the state and object simultaneously remains less common. One approach to address this is by treating "state and object" as a single category during training. However, this approach poses challenges in data collection and training since it requires comprehensive data for all possible combinations. Compositional Zero-shot Learning (CZSL) offers a solution by treating the state and object as separate categories during training. CZSL enables recognition of unseen compositions during training, even without data for all possible combinations. The current state-of-the-art method, KG-SP, tackles this problem by training separate classifiers for states and objects and leveraging a semantic model to assess the plausibility of composed combinations. However, KG-SP's accuracy in state and object recognition can be further improved, and it fails to consider the weighting of states and objects during composition. In this study, we propose SASOW, an enhancement of KG-SP that considers the weighting of states and objects while improving composition recognition accuracy. First, we introduce self-attention mechanisms into the classifiers for states and objects, leading to enhanced accuracy in recognizing both. Additionally, we incorporate the weighting of states and objects during composition to generate more reasonable and accurate compositions. We validate the effectiveness of SASOW on three benchmark datasets, and the experimental results demonstrate that SASOW achieves competitive performance. Compared to the state-of-the-art OW-CZSL method KG-SP, SASOW shows improvements of 2.1%, 1.7%, and 0.4% in the accuracy for unseen compositions on the MIT-States, UT Zappos, and C-GQA datasets, respectively.
[1] Ishan Misra, Abhinav Gupta, and Martial Hebert, “ From red wine to red tomato: Composition with context,” In CVPR, 2017.
[2] Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling,“ Learning to detect unseen object classes by between-class attribute transfer,” In CVPR, 2009.
[3] Karthik, S and Mancini, M and Akata, Zeynep, “ KG-SP: Knowledge Guided Simple Primitives for Massimiliano,” 35th IEEE Conference on Computer Vision and Pattern Recognition,2022.
[4] Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, and Marc’Aurelio Ranzato, “ Task-driven modular networks for zero-shot compositional learning,” In ICCV, 2019.
[5] Tushar Nagarajan and Kristen Grauman, “ Attributes as operators: factorizing unseen attribute-object compositions,“ In ECCV, 2018.
[6] Yong-Lu Li, Yue Xu, Xiaohan Mao, and Cewu Lu, “ Symmetry and group in attribute-object compositions,” In CVPR, 2020.
[7] Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, and Zeynep Akata,“ Learning graph embeddings for compositional zero-shot learning,” In CVPR, 2021.
[8] Frank Ruis, Gertjan J Burghouts, and Doina Bucur, “ Independent prototype propagation for zero-shot compositionality,” In NeurIPS, 2021.
[9] Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata, “Open world compositional zero shot learning,” In CVPR, 2021.
[10] C.-Y. Chen and K. Grauman, “Inferring analogous attributes,” in CVPR, 2014.
[11] R. S. Cruz, B. Fernando, A. Cherian, and S. Gould, “Neural algebra of classifiers,” In WACV, 2018.
[12] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin,“Attention Is All You Need” In NIPS, 2017.
[13] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” In ICLR,2021
[14] D. Comite and N. Pierdicca, "Decorrelation of the near-specular land scattering in bistatic radar systems," IEEE Trans. Geosci. Remote Sens., early access, doi: 10.1109/TGRS.2021.3072864. (Note: This format is used for articles in early access. The doi must be included.)
[15] H. V. Habi and H. Messer, "Recurrent neural network for rain estimation using commercial microwave links," IEEE Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 3672-3681, May 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9153027
[16] Robyn Speer, Joshua Chin, and Catherine Havasi. “Conceptnet 5.5: An open multilingual graph of general knowledge.” In AAAI, 2017.
[17] Phillip Isola, Joseph J Lim, and Edward H Adelson. Discovering states and transformations in image collections. In CVPR, 2015.
[18] Aron Yu and Kristen Grauman. Fine-grained visual comparisons with local learning. In CVPR, 2014.
[19] Aron Yu and Kristen Grauman. Semantic jitter: Dense supervision for visual comparisons via synthetic images. In ICCV, 2017.
[20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition, ” In CVPR, 2016.
[21] F. Murtagh, “Multilayer perceptrons for classification and regression,” Neurocomputing, vol. 2, nos. 5–6, pp. 183–197, 1991.
[22] Devi Parikh and Kristen Grauman. “Relative attributes,” In ICCV, 2011.
[23] Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. “Describing objects by their attributes,” In CVPR, 2009.
[24] Thomas N. Kipf, Max Welling, “Semi-supervised classification with graph convolutional networks,” In ICLR, 2017.
[25] C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei. “Visual relationship detection with language priors,” In ECCV, 2016
[26] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” In Advances in Neural Information Processing Systems 32 (pp. 8024–8035).
校內:2028-08-21公開