簡易檢索 / 詳目顯示

研究生: 彭思安
Ponce, Andres
論文名稱: 基於重加權的長尾識別自監督學習方法
A Reweighting Based Self-supervied Learning Method for Long-tailed Recognition
指導教授: 郭耀煌
Kuo, Yau-Hwang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 66
中文關鍵詞: 人工智慧深度學習電腦視覺長尾學習自監督學習
外文關鍵詞: Artificial Intelligence, Deep Learning, Computer Vision, Self-Supervised Learning, Long-Tailed Learning
相關次數: 點閱:113下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來深度學習已成為在許多領域成功的表徵學習方法。深度學習模型受益於更大數據集狀況下因為它們可以學習數據之間更一般的關係。然而,獲取有標記的數據的成本很快就變得過高。擴展數據集或模型性能時這成本提出一個很嚴重的問題。近年來自監督學習 (SSL) 在學習未標記訓練數據集的表徵方面取得了巨大進展。自監督模型不是訓練來預測預定地標記,而是來預測輸入數據的一部分或屬性。這些從輸入數據獲得的新 “標記” 可以減輕有標記數據的需求。這種方法在跨領域的任務帶來了很大的進展。訓練影像識別模型另外一個經常遇到的問題就是常尾的訓練數據分布,其中數據集中某些類別的樣本數量比其他類別多很多倍。在長尾數據集訓練深度學習模型會導致有偏差的模型,它在很常見類別的分類正確率會比樣本較少的類別高。本論文提出 Long-Tailed Reweighting (LoTaR) 來提高長尾自監督模型的正確率。通過在嵌入空間中映射更遠的經過資料增強的樣本版本分配更大的權重,LoTaR 增加一個有價值的信號來協助損失公式。LoTaR 不只在樣本最少類別比一般自監督模型提高正確率,但它還比最近提出的架構 SDCLR 和 BCL 增加正確率。在長尾 CIFAR10 和 CIFAR100 LoTaR 都提高樣本最少還有樣本多的正確率2–3%。此外,LoTaR 增加的額外運算跟內存成本還比 SDCLR, BCL 少多達 10 倍。透過理論分析和實驗表明 LoTaR 的有用性,還有表明處理長尾未標記數據不需要過複雜的架構。

    Deep Learning (DL) has seen remarkable success in a variety of tasks and drastically changed many industries. DL models benefit from larger training data as they can learn more complicated relationships. However, obtaining labeled training data can quickly become too costly. This poses a severe problem when scaling models and datasets to improve performance. In recent years, Self-Supervised Learning (SSL) has made great progress in learning representations on unlabeled training datasets. Instead of training a model to predict a predetermined label, self-supervised models train to predict a part or property of the input data. These new “labels” obtained directly from the input data can alleviate the need for manually labeled data. This approach has led to competitive results across domains and tasks. Another common problem encountered when training deep learning models to perform recognition is that of long-tailed data, where some classes in the training dataset are much more represented than others. Directly training a classifier on this data will produce a biased classifier, with much higher accuracy on common classes than rare classes. This work proposes Long-Tailed Reweighting (LoTaR), a simple and competitive new method for self-supervised recognition on long-tailed data. By placing a greater weight on augmented samples that get mapped farther away in the embedding space, LoTaR adds a valuable additional training signal to the contrastive loss. LoTaR both increases accuracy on the rare classes and increases overall accuracy compared to a self-supervised baseline and the recently proposed SDCLR and BCL architectures by around 2–3% on long-tailed versions of CIFAR10 and CIFAR100. LoTaR also scales much more efficiently in terms of computation and memory requirements than SDCLR and BCL, reducing additional memory overhead by as much as 10x compared to BCL and around 15x compared to SDCLR.

    摘要 i Abstract iii Acknowledgements v Table of Contents vii List of Tables ix List of Figures x Chapter 1. Introduction 1 1.1. Self-Supervised Learning 2 1.1.1. Computer Vision 4 1.1.2. Natural Language Processing 6 1.2. Long-tailed Learning 7 1.2.1. Relation Between SSL and Long-Tailed Learning 9 1.3. Contributions 10 1.4. Structure of the Work 11 Chapter 2. Related Work 12 2.1. Long-Tailed Reweighting 12 2.1.1. Other Approaches 16 2.2. Self-Supervised Long-Tailed Learning 17 2.3. Self-Damaging Contrastive Learning 21 2.4. Boosted Contrastive Learning 23 Chapter 3. LoTaR 26 3.1. Contrastive Learning 26 3.2. LoTaR 30 3.3. Complexity Analysis 33 3.3.1. Self-damaging Contrastive Learning 34 3.3.2. Boosted Contrastive Learning 36 3.3.3. LoTaR 37 Chapter 4. Experiments 41 4.1. Evaluation Objective 41 4.2. Evaluation Procedure 42 4.3. Datasets 43 4.4. Results 46 4.4.1. Experiment Settings 46 4.4.2. CIFAR10-LT 47 4.4.3. CIFAR100-LT 48 4.4.4. Effect of Imbalance Ratio ρ 49 4.4.5. Time and Memory Requirements 50 4.4.6. LoTaR as a General Reweighting Method 54 4.4.7. Natural Imbalance 55 4.5. Effect of Supervision on Object Recognition 57 Chapter 5. Conclusion & Future Work 59 5.1. Conclusions 59 5.2. Future Work 60 References 62

    [AZLL19]Zeyuan Allen-Zhu, Yuanzhi Li, and Yingyu Liang. Learning and general-ization in overparameterized neural networks, going beyond two layers. Ad-vances in neural information processing systems, 32, 2019.
    [CBHK02]Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
    [CCP+ 20]Hsin-Ping Chou, Shih-Chieh Chang, Jia-Yu Pan, Wei Wei, and Da-Cheng Juan. Remix: rebalanced mixup. In Computer Vision–ECCV 2020 Work-shops: Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 95–110. Springer, 2020.
    [CH21]Xinlei Chen and Kaiming He. Exploring simple siamese representation learn-ing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15750–15758, June 2021.
    [CJL+ 19]Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277, 2019.
    [CKNH20]Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations, 2020.
    [CMM+ 20]Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
    [CWG+ 19]Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Ad-vances in neural information processing systems, 32, 2019.
    [CZSL20]Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaug-ment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020.
    [DCLT18]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert:Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [DDS+ 09]Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Ima-genet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
    [EGLH22]Linus Ericsson, Henry Gouk, Chen Change Loy, and Timothy M Hospedales. Self-supervised representation learning: Introduction, advances, and chal-lenges. IEEE Signal Processing Magazine, 39(3):42–62, 2022.
    [FBJL23]Emanuele Francazi, Marco Baity-Jesi, and Aurelien Lucchi. A theoretical analysis of the learning dynamics under class imbalance. 2023.
    [GCB+ 23]Quentin Garrido, Yubei Chen, Adrien Bardes, Laurent Najman, and Yann Le-Cun. On the duality between contrastive and non-contrastive self-supervised learning. In The Eleventh International Conference on Learning Represen-tations, 2023.
    [GSA+ 20]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhao-han Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. Bootstrap your own latent - a new approach to self-supervised learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Sys-tems, volume 33, pages 21271–21284. Curran Associates, Inc., 2020.
    [GSK18]Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised repre-sentation learning by predicting image rotations. In International Conference on Learning Representations, 2018.
    [GZLW20]Long Gao, Lei Zhang, Chang Liu, and Shandong Wu. Handling imbal-anced medical image data: A deep-learning-based one-class classification approach. Artificial intelligence in medicine, 108:101935, 2020.
    [HCC+ 19]Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, and Andrea Frome. What do compressed deep neural networks forget? arXiv preprint arXiv:1911.05248, 2019.
    [HCX+ 22]Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
    [HFW+ 20]Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momen-tum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,pages 9729–9738, 2020.
    [HZRS16]Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    [JCMW21]Ziyu Jiang, Tianlong Chen, Bobak J Mortazavi, and Zhangyang Wang. Self-damaging contrastive learning. In International Conference on Machine Learning, pages 4927–4939. PMLR, 2021.
    [JVLT22]Li Jing, Pascal Vincent, Yann LeCun, and Yuandong Tian. Understanding dimensional collapse in contrastive self-supervised learning, 2022.
    [KH+ 09]Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
    [KK20]Byungju Kim and Junmo Kim. Adjusting decision boundary for class imbal-anced learning. IEEE Access, 8:81674–81685, 2020.
    [KLX+ 21]Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, and Jiashi Feng. Exploring bal-anced feature spaces for representation learning. In International Conference on Learning Representations, 2021.
    [KSH17]Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi-fication with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
    [KXR+ 19]Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. Decoupling representation and classifier for long-tailed recognition. In International Conference on Learning Repre-sentations, 2019.
    [LHGM22]Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, and Tengyu Ma. Self-supervised learning is more robust to dataset imbalance. In International Conference on Learning Representations, 2022.
    [LMZ+ 19]Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. Large-scale long-tailed recognition in an open world. In Pro-ceedings of the IEEE/CVF conference on computer vision and pattern recog-nition, pages 2537–2546, 2019.
    [MJR+ ]Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjust-ment. In International Conference on Learning Representations.
    [Ope]OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt. Ac-cessed July 12, 2023.
    [OWJ+ 22]Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feed-back. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
    [PG20]Senthil Purushwalkam and Abhinav Gupta. Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. Ad-vances in Neural Information Processing Systems, 33:3407–3418, 2020.
    [RNS+ 18]Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Im-proving language understanding by generative pre-training. 2018.
    [SRKP+ 21]Chaehwan Song, Ali Ramezani-Kebrya, Thomas Pethick, Armin Eftekhari, and Volkan Cevher. Subquadratic overparameterization for shallow neural networks. Advances in Neural Information Processing Systems, 34:11247–11259, 2021.
    [TCG21]Yuandong Tian, Xinlei Chen, and Surya Ganguli. Understanding self-supervised learning dynamics without contrastive pairs. In International Conference on Machine Learning, pages 10268–10278. PMLR, 2021.
    [THvdO21]Yonglong Tian, Olivier J Henaff, and Aäron van den Oord. Divide and contrast: Self-supervised learning from uncurated data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10063–10074, 2021.
    [TSP+ 20]Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning? Advances in Neural Information Processing Systems, 33:6827–6839, 2020.
    [vdOLV18]Aäron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. ArXiv, abs/1807.03748, 2018.
    [VHMAS+ 18] Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inatural-ist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
    [WI20a]Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Interna-tional Conference on Machine Learning, pages 9929–9939. PMLR, 2020.
    [WI20b]Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Interna-tional Conference on Machine Learning, pages 9929–9939. PMLR, 2020.
    [WL22]Zixin Wen and Yuanzhi Li. The mechanism of prediction head in non-contrastive self-supervised learning. Advances in Neural Information Pro-cessing Systems, 35:24794–24809, 2022.
    [WXYL18]Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742, 2018.
    [YHO+ 19]Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF inter-national conference on computer vision, pages 6023–6032, 2019.
    [YX20]Yuzhe Yang and Zhi Xu. Rethinking the value of labels for improving class-imbalanced learning. Advances in neural information processing systems, 33:19290–19301, 2020.
    [YZC21]Han-Jia Ye, De-Chuan Zhan, and Wei-Lun Chao. Procrustean training for imbalanced deep learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 92–102, 2021.
    [ZCDLP]Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
    [ZKH+ 21]Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
    [ZLY+ 21]Songyang Zhang, Zeming Li, Shipeng Yan, Xuming He, and Jian Sun. Dis-tribution alignment: A unified framework for long-tail visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2361–2370, 2021.
    [ZYW+ 22]Zhihan Zhou, Jiangchao Yao, Yan-Feng Wang, Bo Han, and Ya Zhang. Con-trastive learning with boosted memorization. In International Conference on Machine Learning, pages 27367–27377. PMLR, 2022.

    無法下載圖示 校內:2028-08-01公開
    校外:2028-08-01公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE