簡易檢索 / 詳目顯示

研究生: 陳韋志
Chen, Wei-Chi
論文名稱: 基於自監督式學習的自蒸餾學習
Self­-Supervised Self­-Distillation
指導教授: 朱威達
Chu, Wei-Ta
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 28
中文關鍵詞: 自監督學習自蒸餾學習無監督學習
外文關鍵詞: Self-­Supervised Learning, Self Distillation, Unsupervised Learning
相關次數: 點閱:69下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在有資料標記的情況下,自蒸餾(Self Distillation, SD)被提出用於開發輕量化但有效 的模型,而且無須事先提供複雜的老師模型(teacher model)。這樣的方法需要資料 標記來引導自蒸餾的過程。受到自監督學習 (Self­-Supervised Learning, SSL) 的啟發, 我們提出了一種自監督式自蒸餾(Self­-Superivsed Self Distillation, SSSD)的方法。 基於未標記的影像資料集,模型以自監督的方式被建構用於學習視覺表示(visual representations)。這個預訓練模型接著被用於抽取目標資料集的視覺表示,並通過聚 類生成偽標籤(pseudo labels)。偽標籤會指導 SD 的學習過程,從而使 SD 能夠以無 監督的方式進行(從頭到尾不需要資料標籤)。我們基於對 CIFAR­10、CIFAR­100 和 ImageNet­1K 數據集的評估驗證了這一想法,並證明了這種無監督 SD 方法的有效 性。我們展示了優於類似框架的性能。

    With labeled data, self distillation (SD) has been proposed to develop compact but effective models without a complex teacher model available in advance. Such approaches need la­ beled data to guide the self distillation process. Inspired by self­-supervised learning (SSL), we propose a self-­supervised self distillation (SSSD) approach in this work. Based on an unla­ beled image dataset, a model is constructed to learn visual representations in a self­supervised manner. This pre-trained model is then adopted to extract visual representations of the target dataset and generates pseudo labels via clustering. The pseudo labels guide the SD process, and thus enable SD to proceed in an unsupervised way (no data labels are required at all). We verify this idea based on evaluations on the CIFAR­10, CIFAR­100, and ImageNet­1K datasets, and demonstrate the effectiveness of this unsupervised SD approach. Performance outperforming similar frameworks is also shown.

    摘要 i Abstract ii Table of Contents iii List of Tables v List of Figures vi Chapter 1. Introduction 1 1.1. Motivation 1 1.2. Overview 2 1.3. Contributions 3 1.4. Thesis Organization 3 Chapter 2. Related Works 5 2.1. Self-Supervised Learning 5 2.2. Knowledge Distillation 6 2.3. Self Distillation 7 2.4. Summary 8 Chapter 3. Self-Supervised Self Distillation 9 3.1. Pseudo Label Generation 9 3.1.1. Self-Supervised Learning 9 3.1.2. Feature Extractionand Clustering 9 3.2. Self Distillation 10 3.2.1. Network Architecture 10 3.2.2. Loss Functions 11 Chapter 4. Experimental Results 13 4.1. Evaluation Methods 13 4.2. Experimental Settings 13 4.2.1. Self-Supervised Training 13 4.2.2. Feature Extraction and Clustering 13 4.2.3. SSSD 14 4.3. Performance Evaluation 14 4.4. Performance Comparison 16 4.4.1. Distillation Schemes 17 4.4.2. Pseudo Labels 17 4.5. Semi-Supervised Evaluation 18 4.6. Object Detection and Segmentation 19 4.7. Ablation Study on K 20 4.8. Study on SSL Pre-trained Dataset Dpre and Target Dataset Dtar 21 4.9. Different Self-Supervised Pre-training Methods 22 Chapter 5. Conclusion 23 5.1. Conclusion 23 5.2. Future Work 23 5.2.1. Distillation Loss 23 5.2.2. Different Classifiers with Different K 24 References 25

    [1] Mohammed Adnan, Yani A. Ioannou, Chuan­Yung Tsai, and Graham W. Taylor. Domain­-agnostic clustering with self­-distillation. In Proceedings of NeurIPS Workshop on Self­-Supervised Learning: Theory and Practice, 2021.
    [2] Zeyuan Allen­Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge dis­tillation and self­distillation in deep learning. In arXiv:2012.09816, 2021.
    [3] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep cluster­ing for unsupervised learning of visual features. In Proceedings of European Confer­ ence on Computer Vision, September 2018.
    [4] Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Ar­ mand Joulin. Unsupervised learning of visual features by contrasting cluster assign­ments. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9912–9924, 2020.
    [5] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In Proceedings of Inter­ national Conference on Machine Learning, 2020.
    [6] Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. In arXiv:2003.04297, 2020.
    [7] Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, and Junzhou Huang. On self­distilling graph neural network. In Proceedings of International Joint Conference on Artificial Intelligence, 2021.
    [8] Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. Model compression and accelera­ tion for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1):126–136, 2018.
    [9] Jia Deng, Wei Dong, Richard Socher, Li­Jia Li, Kai Li, and Li Fei­Fei. Imagenet: A large­scale hierarchical image database. In Proceedings of IEEE International Confer­ ence on Computer Vision and Pattern Recognition, 2009.
    [10] Carl Doersch, Abhinav Gupta, and Alexei A. Efros. Unsupervised visual represen­tation learning by context prediction. In Proceedings of International Conference on Computer Vision, 2015.
    [11] Mark Everingham, Luc Van­ Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 2009.
    [12] Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, and Zicheng Liu. Seed: Self­-supervised distillation for visual representation. In Proceedings of International Conference on Learning Representations, 2021.
    25
    [13] Zeyu Feng, Chang Xu, and Dacheng Tao. Self­-supervised representation learning by rotation feature decoupling. In Proceedings of IEEE International Conference on Com­puter Vision and Pattern Recognition, 2019.
    [14] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In Proceedings of International Conference on Learning Representations, 2018.
    [15] Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, and Mei Chen. Muse: Feature self-distillation with mutual information and self­-information. In Proceedings of British Machine Vision Conference, 2021.
    [16] Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao. Knowledge dis­tillation: A survey. International Journal of Computer Vision, 129:1789–1819, 2021.
    [17] Priya Goyal, Quentin Duval, Jeremy Reizenstein, Matthew Leavitt, Min Xu, Benjamin Lefaudeux, Mannat Singh, Vinicius Reis, Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Ishan Misra. Vissl. https://github.com/facebookresearch/vissl, 2021.
    [18] Jean­Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. Boot­ strap your own latent ­ a new approach to self­supervised learning. In Advances in Neural Information Processing Systems, 2020.
    [19] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum con­ trast for unsupervised visual representation learning. In Proceedings of IEEE Interna­ tional Conference on Computer Vision and Pattern Recognition, 2020.
    [20] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r­cnn. In Pro­ ceedings of IEEE International Conference on Computer Vision, Oct 2017.
    [21] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2016.
    [22] Rian He, Shubin Cai, Zhong Ming, and Jialei Zhang. Weighted self distillation for chinese word segmentation. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2022.
    [23] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In Proceedings of NIPS Deep Learning Workshop, 2014.
    [24] Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference, 2014.
    [25] Longlong Jing and Yingli Tian. Self-Supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):4037–4058, 2021.
    [26] Soroush Abbasi Koohpayegani, Ajinkya Tejankar, and Hamed Pirsiavash. Compress: Self­-supervised learning by compressing representations. In Proceedings of Neural Information Processing Systems, 2020.
    [27] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, 2009.
    [28] Phuc H. Le­Khac, Graham Healy, and Alan F. Smeaton. Contrastive representation learning: A framework and review. IEEE Access, 8:193907–193934, 2020.
    [29] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. In Proceedings of International Conference on Learning Representations, 2017.
    [30] Tsung­-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ra­ manan, Piotr Dollár, and Larry Zitnick. Microsoft coco: Common objects in context. In Proceedings of European Conference on Computer Vision, 2014.
    [31] Hossein Mobahi, Mehrdad Farajtabar, and Peter L. Bartlett. Self-­distillation amplifies regularization in hilbert space. In Proceedings of Conference on Neural Information Processing Systems, 2020.
    [32] Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proceedings of European Conference on Computer Vision, 2016.
    [33] Wonpyo Park,Dongju Kim,Yan Lu, and Minsu Cho. Relational knowledge distillation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019.
    [34] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r­cnn: Towards real­ time object detection with region proposal networks. In Proceedings of Advances in Neural Information Processing Systems, 2015.
    [35] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. In Proceedings of Inter­ national Conference on Learning Representations, 2015.
    [36] Karsten Roth, Timo Milbich, Bjorn Ommer, Joseph Paul Cohen, and Marzyeh Ghas­ semi. Simultaneous similarity­-based self­-distillation for deep metric learning. In Pro­ ceedings of International Conference on Machine Learning, 2021.
    [37] Frederick Tung and Greg Mori. Similarity­-preserving knowledge distillation. In Pro­ ceedings of IEEE/CVF International Conference on Computer Vision, October 2019.
    [38] Zhirong Wu, Yuanjun Xiong, Stella Yu, and Dahua Lin. Unsupervised feature learning via non­parametric instance­-level discrimination. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2018.
    [39] Ting­-Bing Xu and Cheng-­Lin Liu.Data­-distortion guided self-distillation for deep neu­ral networks. In Proceedings of AAAI Conference on Artificial Intelligence, 2019.
    [40] Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, and Dhruv Mahajan. Clusterfit: Improving generalization of visual representations. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2020.
    [41] Chenglin Yang, Lingxi Xie, Chi Su, and Alan L. Yuille. Snapshot distillation: Teacher­ student optimization in one generation. In Proceedings of IEEE International Confer­ ence on Computer Vision and Pattern Recognition, 2019.
    [42] Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. S4l: Self­ supervised semi­supervised learning. In Proceedings of IEEE International Conference on Computer Vision, 2019.
    [43] Linfeng Zhang, Chenglong Bao, and Kaisheng Ma. Self­-distillation: Towards efficient and compact neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
    [44] Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of IEEE International Conference on Computer Vision, 2019.

    下載圖示 校內:2023-08-31公開
    校外:2023-08-31公開
    QR CODE