| 研究生: |
翁琇甄 Weng, Hsiu-Chen |
|---|---|
| 論文名稱: |
基於注意力一致性與可信賴資料利用之半監督學習 Fully Used Reliable Data and Attention Consistency for Semi-Supervised Learning |
| 指導教授: |
張瑞紘
Chang, Jui-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 40 |
| 中文關鍵詞: | 深度學習 、半監督學習 、注意力一致性 、可信賴資料 |
| 外文關鍵詞: | deep learning, semi-supervised learning, attention consistency, reliable data |
| 相關次數: | 點閱:81 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
足夠的標籤資料對深度學習訓練結果的影響很重要,但大量的標記資料代表著需耗費大量的資源,因此半監督學習目標是在有限的標籤資料下利用大量的無標籤資料來輔助訓練結果的提升。在近期的研究中,許多半監督學習的方法透過多樣的資料增強訓練模型從這些變化中找出分類的規則,而造成模型需要花費大量的時間適應變化。除此之外,如何降低用來訓練的無標籤資料中的噪音也是半監督學習時常探討的問題,通過定義可信賴資料來避免錯誤資訊帶來的干擾,例如透過對模型預測機率結果的閾值來定義高於閾值的預測結果是較有信心的資料,目的是從無標籤資料用來訓練的標籤結果中找出較可能正確的資料來避免訓練受巨大偏差的錯誤結果影響,但同時此種方法也造成不少的資料無法被有效的利用。因此本文提出一個半監督學習的架構,包含了Attention Consistency (AC) 和 One Supervised (OS) 的演算法,透過加強模型對重要特徵的學習以及利用條件去判斷模型可以在無法有效地從現有的資料學習的時候,去使用更多無標籤資料來訓練,來改善模型學習分類的效率與結果。經過實驗結果的比較,可以有效在更短的訓練過程中達到與其他方法相近的結果;此外,本研究也對特徵結果的分布進行分析,並且提出一個衡量的方法來幫助瞭解資料分布的訊息。
The impact of enough labeled data is important for the performance of deep learning, but large labeled datasets represent costly consuming resources of human labor. Therefore, semi-supervised learning leverages a large amount of unlabeled data to improve the training results in limited labels. In recent studies, many methods of semi-supervised learning utilize diverse data augmentations to improve model learning the classification rule from these changes, requiring model to spend a lot of time to adapt the changes. Besides, how to reduce the noise in trained unlabeled data is also an issue that is often discussed in semi-supervised learning. Through defining reliable data, such as defining that the data, the predicted probability of which from model is higher than a threshold, is considered more confident, the inference from error information can be avoided. The use of defining reliable data aims to find higher probability that the unlabeled data is correct so that model avoids the influence from deviation of error unlabeled data predictions. However, it also leads to the fact a lot of data cannot be effectively used. Thus, this study proposes a semi-supervised framework, including Attention Consistency (AC) and One Supervised (OS) algorithms, which improves efficiency and performance of the model learning by guiding model paying attention on classified features and judging whether model cannot be effectively trained in existing reliable data so that model fully uses unlabeled data to train. The experiment results and comparisons show that it can achieve similar results with other methods in a shorter training process. This paper also analyzes the distribution of feature results and proposes a new measurement to find out the information in the distribution.
[1] Ba, J. L., Kiros, J. R., & Hinton, G. E., “Layer normalization.” arXiv preprint arXiv:1607.06450, 2016.
[2] Bengio, Y., Louradour, J., Collobert, R., & Weston, J., “Curriculum learning.” Proceedings of the 26th annual international conference on machine learning, pp. 41-48, Jun. 2009.
[3] Bennett, K., & Demiriz, A., “Semi-supervised support vector machines.” Advances in Neural Information processing systems, pp. 368-374, 1999.
[4] Berthelot, D., Carlini, N., Cubuk, E. D., Kurakin, A., Sohn, K., Zhang, H., & Raffel, C., “Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring.” Eighth International Conference on Learning Representations, 2019.
[5] Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., and Raffel, C., “Mixmatch: A holistic approach to semi-supervised learning.” Advances in Neural Information Processing Systems, pp. 5050–5060, 2019.
[6] Cascante-Bonilla, P., Tan, F., Qi, Y., & Ordonez, V., “Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning.” Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 8, pp. 6912-6920, May. 2021.
[7] Chapelle, O. & Zien, A., “Semi-supervised classification by low density separation.” International workshop on artificial intelligence and statistics. pp. 57-64, 2005.
[8] Chapelle, O., Scholkopf, B., & Zien, A., “Semi-supervised learning.” IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 542-542, 2009.
[9] Chen, J., Yang, M., & Ling, J., “Attention-based label consistency for semi-supervised deep learning based image classification.” Neurocomputing, vol. 453, pp. 731-741, Sep. 2021.
[10] Chong, Y., Ding, Y., Yan, Q., & Pan, S., “Graph-based semi-supervised learning: A review.” Neurocomputing, vol. 408, 216-230, 2020.
[11] Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V., “Autoaugment: Learning augmentation policies from data.” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019.
[12] Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V., “Randaugment: Practical automated data augmentation with a reduced search space.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702-703, 2020.
[13] Enguehard, J., O’Halloran, P., & Gholipour, A., “Semi-supervised learning with deep embedded clustering for image classification and segmentation.” IEEE Access, vol. 7, pp. 11093-11104, 2019.
[14] Fan, H., Zheng, L., Yan, C., & Yang, Y., “Unsupervised person re-identification: Clustering and fine-tuning.” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 14, vol. 4, pp. 1-18, 2018.
[15] Grandvalet, Y., & Bengio, Y., “Semi-supervised learning by entropy minimization.” CAP, vol. 367, pp. 281-296, 2005.
[16] Guo, H., Zheng, K., Fan, X., Yu, H., & Wang, S., “Visual attention consistency under image transforms for multi-label image classification.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 729-739, 2019.
[17] Hacohen, G., & Weinshall, D., “On the power of curriculum learning in training deep networks.” International Conference on Machine Learning, pp. 2535-2544, May. 2019.
[18] Hataya, R., & Nakayama, H., “Unifying semi-supervised and robust learning by mixup.” ICLR 2019 Workshop on Learning from Unlabeled Data, 2019.
[19] Hinton, G., Vinyals, O., & Dean, J., “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531, 2015.
[20] Hu, R., Andreas, J., Darrell, T., & Saenko, K., “Explainable neural computation via stack neural module networks.” Proceedings of the European conference on computer vision (ECCV), pp. 53-69, 2018.
[21] J Hu, J., Shen, L., & Sun, G., “Squeeze-and-excitation networks.” Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 7121-7141, 2018.
[22] Jiang, L., Meng, D., Yu, S. I., Lan, Z., Shan, S., & Hauptmann, A., “Self-paced learning with diversity.” Advances in Neural Information Processing Systems, vol. 27, pp, 2078-2086, 2014.
[23] Kumar, M., B Kumar, M., Packer, B., & Koller, D., "Self-paced learning for latent variable models." Advances in neural information processing systems, vol. 23, pp. 1189-1197, 2010.
[24] Kuo, C. W., Ma, C. Y., Huang, J. B., & Kira, Z., “Featmatch: Feature-based augmentation for semi-supervised learning.” European Conference on Computer Vision, Springer, Cham, pp. 479-495, Aug. 2020.
[25] Laine, S., & Aila, T., “Temporal ensembling for semi-supervised learning.” International Conference on Learning Representations 2017.
[26] Lee, D. H., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks.” Workshop on challenges in representation learning (ICML), Vol. 3, No. 2, pp. 896, Jun. 2013.
[27] Miyato, T., Maeda, S. I., Koyama, M., & Ishii, S., “Virtual adversarial training: a regularization method for supervised and semi-supervised learning.” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 8, pp. 1979-1993, 2018.
[28] Ocasio, W., “Attention to attention.” Organization science, vol. 22, no. 5, 1286-1296, 2011.
[29] Oliver, A., Odena, A., Raffel, C., Cubuk, E. D., & Goodfellow, I. J., “Realistic evaluation of deep semi-supervised learning algorithms.” Advances in Neural Information Processing Systems, pp. 3235-3246, 2018.
[30] Rawat, W., & Wang, Z., “Deep convolutional neural networks for image classification: A comprehensive review,” Neural Computation, vol. 29, no. 9, pp. 2352-2449, 2017.
[31] Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). “Semi-supervised self-training of object detection models.” Proceedings of the Seventh IEEE Workshops on Application of Computer Vision, 2005.
[32] Sajjadi, M., Javanmardi, M., & Tasdizen, T., “Regularization with stochastic transformations and perturbations for deep semi-supervised learning.” Advances in neural information processing systems, vol. 29, pp.1163-1171, 2016.
[33] Simonyan, K., & Zisserman, A., “Very deep convolutional networks for large-scale image recognition.” ICLR, 2015.
[34] Sohn, K., Berthelot, D., Li, C. L., Zhang, Z., Carlini, N., Cubuk, E. D., Kurakin, A., Zhang, H., and Raffel, C., “Fixmatch: Simplifying semi-supervised learning with consistency and confidence.” Advances in Neural Information Processing Systems, 2020.
[35] Song, S., Yu, H., Miao, Z., Guo, D., Ke, W., Ma, C., & Wang, S., “An easy-to-hard learning strategy for within-image co-saliency detection.” Neurocomputing, vol. 358, pp. 166-176, 2019.
[36] Tarvainen, A., & Valpola, H., “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.” Advances in Neural Information Processing Systems, 2017.
[37] Triguero, I., García, S., & Herrera, F., “Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study.” Knowledge and Information systems, vol. 42, no.2, pp. 245-284, 2015.
[38] Van Engelen, J. E., & Hoos, H. H., “A survey on semi-supervised learning.” Machine Learning, vol. 109, no. 2, pp. 373-440, 2020.
[39] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I., “Attention is all you need.” Advances in neural information processing systems, pp. 5998-6008, 2017.
[40] Verma, V., Lamb, A., Beckham, C., Najafi, A., Courville, A., Mitliagkas, I., & Bengio, Y, “Manifold mixup: learning better representations by interpolating hidden states.”, 2018.
[41] Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., & Tang, X., “Residual attention network for image classification.” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156-3164, 2017.
[42] Xie, Q., Dai, Z., Hovy, E., Luong, M. T., & Le, Q. V., “Unsupervised data augmentation for consistency training.” arXiv preprint arXiv:1904.12848, 2019.
[43] Xie, Q., Luong, M. T., Hovy, E., & Le, Q. V., “Self-training with noisy student improves imagenet classification.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687-10698, 2020.
[44] Yang, Y., Zhu, N., Wu, Y., Cao, J., Zhan, D., & Xiong, H., “A semi-supervised attention model for identifying authentic sneakers.” Big Data Mining and Analytics, vol. 3, no. 1, pp. 29-40, 2019.
[45] Yarowsky, D., “Unsupervised word sense disambiguation rivaling supervised methods.” 33rd annual meeting of the association for computational linguistics, pp. 189-196, 1995.
[46] Zagoruyko, S., & Komodakis, N., “Wide residual networks.” Proceedings of the British Machine Vision Conference (BMVC), 2016.
[47] Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D., “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412, 2017.
[48] Zhang, Z., Zhao, M., & Chow, T. W., “Graph based constrained semi-supervised learning framework via label propagation over adaptive neighborhood.” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no.9, pp. 2362-2376, 2013.
[49] Zheng, H., Fu, J., Mei, T., & Luo, J., “Learning multi-attention convolutional neural network for fine-grained image recognition.” Proceedings of the IEEE international conference on computer vision, pp. 5209-5217, 2017.
[50] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A., “Learning deep features for discriminative localization.” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921-2929, 2016.
[51] Zoph, B., Ghiasi, G., Lin, T. Y., Cui, Y., Liu, H., Cubuk, E. D., & Le, Q. V., “Rethinking pre-training and self-training.” Advances in Neural Information Processing Systems, 2020.