簡易檢索 / 詳目顯示

研究生: 陳敏涵
Chen, Min-Han
論文名稱: 基於對比學習之正樣本雙線性模型用於多標籤分類任務
Positive-label Bilinear Model with Contrastive Learning for Multi-label Classification
指導教授: 郭耀煌
Kuo, Yau-Hwang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 81
中文關鍵詞: 深度學習影像辨識多標籤分類相互資訊對比學習異質資料分析
外文關鍵詞: Deep Learning, Image Recognition, Multi-label Classification, Mutual Information, Contrastive Learning, Heterogeneous data analysis
相關次數: 點閱:140下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 因應社群網路媒體的發達,分享隨手拍攝的照片、影片成為普通人生活的常態,這讓影像資料的蒐集變得簡單、資料類型也更為多樣化,例如伴隨著影像的文字簡述、Hashtag,或是心情符號,這些資料提供了不同角度的資訊。故而,電腦視覺及其他多種資料類型整合之關鍵技術成為資料分析中相當重要的一環。其中多標籤分類任務與語義資料的結合運用無疑是影像視覺領域中最重要的問題。與一般多類別分類不同的地方在於,多標籤分類中類別並不是互相排斥,而是共存的,每筆資料都可以有1個以上的標籤。在社群網路的推薦系統中,我們需要用到標籤語義來做推薦。在醫學影像的疾病診斷中,病人可能擁有一種以上的疾病,我們也需要將多種疾病定義為不同的標籤。但是現今的人工智慧方法,皆是依靠著訓練大量乾淨且準確的資料來做訓練。這並不符合現實的應用情境,蒐集資料雖然輕而易舉,但蒐集標記正確且乾淨的資料卻是需要耗費大量人力物力來完成的。舉例來說,我們可以直接從社群網路蒐集到帶文字標籤的影像資料,但我們無法直接蒐集到現今公開資料庫裡的這種包含所有類別標記的資料。縱然近期有研究者開始討論噪音標籤的問題,但是此問題的定義並不完全符合實際應用情境。
    本論文提出了一種新的正標籤問題(Positive-label problem),因為原始的多標籤分類資料,是為電腦量身打造,而不是真實世界會存在的資料格式。現實中人類並不會刻意去標記負樣本標籤,只會標記正樣本標籤。在訓練資料中僅使用正標籤標註更符合實際情境。同時,本論文也提出了新的方法架構 Positive-label bilinear model with contrastive learning (PLBCL),這架構除了做到提升一般多標籤分類任務的表現外,也能解決Positive-label problem。PLBCL是一個運用對比學習概念並結合相互資訊和神經網路的模型。首先,我們使用圖片和文字的編碼器分別取得各自的特徵向量。在文字編碼器的部份我們會抽樣一個正樣本和多個邊際樣本,藉以學習標籤樣本的機率分布。之後,我們會使用全域及區域的影像特徵將異質資料用雙線性模型做特徵融合,再映射到共同的向量空間。在損失函數方面,我們最大化雙邊的相互資訊含量讓影像資料學習預測的是標籤特徵,而不是資料點。
    我們進行大量實驗來評估PLBCL的表現。在一般的多標籤分類問題上PLBCL的結果依然可以在各式評估標準下得到較好的結果。在Positive-label problem,我們以現有的二元交叉熵損失建立的模型為比較標準,PLBCL也能獲得比它更好的表現。未來,本論文所提之模型亦可應用於其他各式不同的異質資料分析問題,例如:各類型資料的多標籤分類問題、相似度評估問題等。

    In response to the development of social network media, sharing photos and videos has become a normal life of people. This makes the collection of image data easier and more diverse types of data, such as text descriptions, hashtag, or mood symbol. Therefore, computer vision fusion multiple data types have become a very important part of data analytics. Among them, the combined application of multi-label classification tasks and semantic data is undoubtedly the most important problem in the field of image vision. The difference from the general multi-category classification is that the categories in the multi-label classification are not mutually exclusive but coexist. Each piece of data can have more than one label. For example, in the recommendation system of social networks, we need to use tag semantics to make recommendations. In the diagnosis of diseases in medical imaging, patients may have more than one disease, and we also need to define multiple diseases as different labels. However, today's artificial intelligence methods rely on training a large amount of clean and accurate data for training. This is not in line with the actual application situation. Although collecting data is easy, collecting correctly marked and clean data requires a lot of manpower and material resources. For example, we can directly collect image data with text tags from social networks, but we cannot directly collect the data that includes all types of tags. Even though some researchers have recently begun to discuss the problem of noise labeling, the definition of this problem does not completely conform to the actual application situation.
    In this thesis, we investigate a new problem on the multi-label classification tasks, called the positive-label problem. In the multi-label classification task, it contains positive labels and negative labels in the training set and testing set. However, humans in reality do not deliberately label negative samples, but only label positive samples. Using only positive labels in the training set is more in line for the actual situation. Therefore, this thesis proposes a novel positive-label bilinear model with contrastive learning (PLBCL) for multi-label classification tasks. It can solve the positive-label problem, and improve the performance of multi-label classification tasks for heterogeneous data. Our PLBCL is a model that uses the concept of contrastive learning and combines mutual information and neural networks. First, we use image and text encoders to obtain their respective feature vectors. In the text encoder part, we sample a positive sample and multiple marginal samples to learn the probability distribution of label samples. After that, we use the global and regional image features to fuse the heterogeneous data with a bilinear model, and then map them to a common vector space. In the loss function, we maximize the mutual information content of both sides so that the image data learns to predict the label features, not the data points.
    We conduct a large number of experiments to evaluate the performance of PLBCL. In the general multi-label classification problem, the results of PLBCL can get better results under various evaluation standards. In the positive-label problem, we use the existing binary cross-entropy loss model as the comparison standard, and PLBCL can also achieve better performance than it. In the future, the model proposed in this thesis can also be applied to other different types of data analysis applications, such as multi-label classification problems of various types of data, similarity evaluation problems, etc.

    CHAPTER 1 INTRODUCTION 1 1.1 BACKGROUND 1 1.2 PROBLEM DESCRIPTION 7 1.3 MOTIVATION 9 1.4 CONTRIBUTION 12 1.5 ORGANIZATION 13 CHAPTER 2 RELATED WORK 14 2.1 BINARY CROSS ENTROPY LOSS 14 2.2 FASTTAG 16 2.3 GRAPH-BASED METHOD 17 2.4 F-MSML 22 2.5 CONTRASTIVE LEARNING METHODS 23 CHAPTER 3 POSITIVE-LABEL BILINEAR MODEL WITH CONTRASTIVE LEARNING 25 3.1 WORD REPRESENTATION 26 3.2 NOISE-CONTRASTIVE ESTIMATION 27 3.3 NEURAL NETWORK ENCODER 29 3.4 MUTUAL INFORMATION ESTIMATION NETWORK 33 3.5 OBJECTIVE FUNCTION 43 CHAPTER 4 EXPERIMENTS AND DISCUSSION 44 4.1 DATASET 44 4.2 IMPLEMENT DETAIL 49 4.3 EVALUATION METRICS 51 4.4 EXPERIMENTAL RESULTS 54 CHAPTER 5 CONCLUSION 66 CHAPTER 6 FUTURE WORK 67 REFERENCES 68 APPENDIX 73

    [BEL 18] M.I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, R.D. Hjelm, “Mutual Information Neural Network” ICML 2018
    [BUT 00] A. J. Butte, I. S. Kohane, “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements”, 2000.
    [CHE 12] M. Chen, Z. Xu, K. Weinberger, and F. Sha, “Marginalized denoising autoencoders for domain adaptation.” ICML 2012.
    [CHE 13] M. Chen, A. Zheng, K.Q. Weinberger, “Fast Image Tagging” 2013
    [CHE 20] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations”, 2020
    [DEN 09] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei, “ImageNet: A large-scale hierarchical image database”, IEEE Conference on Computer Vision and Pattern Recognition, 2009
    [DEN 12] J. Deng, A. Berg, S. Satheesh, H.Su, A. Khosla, and L. FeiFei, ”ImageNet Large Scale Visual Recognition Competition”, 2012
    [DUR 19] T. Durand, N. Mehrasa, G.Mori, “Learning a Deep ConvNet for Multi-label Classification with Partial Labels”, 2019
    [FAD 15] Facebook AI Director Yann LeCun on His Quest to Unleash Deep Learning and Make Machines Smarter
    https://spectrum.ieee.org/automaton/robotics/artificial-intelligence/facebook-ai-director-yann-lecun-on-deep-learning/

    [GAO 16] Y. Gao, O. Beijbom, N. Zhang, T. Darrell, “Compact Bilinear Pooling”, 2016
    [GAO 17] G. Huang, Z. Liu, L. Maaten, K.Q. Weinberger, “Densely Connected Convolutional Networks”, 2017
    [GE 18] Z. Ge, D. Mahapatra, S. Sedai, R. Chakravorty, R. Garnavi, “Chest X-rays Classification a Multi-Label and Fine-Grained Problem”, 2018
    [GOO 16] I. Goodfellow, Y. Bengio, A. Courville, “Deep Learning”, 2016
    [GUO 16] B. Guo, C. HOU, F. Nie, D. Yi “Semi-Supervised Multi-Label Dimensionality Reduction”, 2016
    [GUT 10] M. Gutmann, A. Hyv¨arinen, “Noise-contrastive estimation: A new estimation principle for unnormalized statistical models”, 2010
    [HE 15] K. He, X. Zhang, S. Ren, J. Sun “Deep Residual Learning for Image Recognition”, 2015
    [HE 19] K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, “Momentum Contrast for Unsupervised Visual Representation Learning”, 2019
    [HJE 19] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildom, K. Grewal, P. Bachman, A. Trischler, Y. Bengio, “Learning Deep Representations by Mutual Information Estimation and Maximization”, 2019
    [HYV 04] A. Hyv¨arinen, J. Karhunen, E. Oja, “Independent component analysis”, 2004.
    [JAY 57] E.T. Jaynes, “Information Theory and Statistical Mechanics”, 1957
    [KEM 06] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. "Learning systems of concepts with an infinite relational model", 2006.
    [KOL 19] A. Kolesnikov, X. Zhai, L. Beyer, “Revisiting Self-Supervised Visual Representation Learning”, 2019
    [KWA 02] N. Kwak, C.-H. Choi, “Input feature selection by mutual information based on parzen window”, 2002.
    [LAM 09] C. H. Lampert, H. Nickisch, and S. Harmeling. "Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer", 2009
    [LAM 13] C. H. Lampert, H. Nickisch, and S. Harmeling. "Attribute-Based Classification for Zero-Shot Visual Object Categorization", 2013
    [LEC 89] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel, "Backpropagation applied to handwritten zip code recognition", Neural compution, pp. 541-551, 1989
    [MAE 97] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, P. Suetens, “Multimodality image registration by maximization of mutual information”, 1997.
    [NAU 20] N. Nauata, H. Hu, G.T. Zhou, Z. Deng, Z. Liao, G. Mori, “Structured Label Inference for Visual Understanding” 2020
    [OLI 10] A. Oliva, and A. Torralba, J. Xiao, J. Hays, K. Ehinger, "SUN Database: Large-scale Scene Recognition from Abbey to Zoo", 2010.
    [OLG 15] ILSVRC organizer Olga Russakovsky on the comment from Jacob Aron interview in the NewScientist,
    https://www.newscientist.com/article/dn28206-forget-the-turing-test-there-are-better-ways-of-judging-ai/

    [OOR 16] A. Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, K. Kavukcuoglu, “Conditional Image Generation with PixelCNN Decoders”, 2016
    [OOR 19] A. Oord, Y. Li, O. Vinyals, “Representation Learning with Contrastive Predictive Coding”, 2019
    [OSH 91] D. N. Osherson, J. Stern, O. Wilkie, M. Stob, and E. E. Smith. "Default probability". Cognitive Science, 1991.
    [PEN 05] H. Peng, F. Long, C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy”, 2005.
    [PEN 14] J. Pennington, R. Socher, C. D. Manning, “GloVe: Global Vectors for Word Representation”, 2014
    [RAJ 17] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, “Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning”, 2017
    [SAI 13] T. N. Sainath, A. R. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep Cvolutional neural networks for LVCSR”, International Conference on coustics, Speech and Signal Processing, 2013
    [SAO 16] P. Sadowski, “Note on Backpropagation”, 2016
    [SIM 14] K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” 2014.
    [TIS 00] N. Tishby, F.C. Pereira, W. Bialek, “The information bottleneck method”, 2000.
    [WAN 17] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases”, 2017
    [WAN 18] L. Wang, Z. Ding, Y. Fu, “Adaptive Graph Guided Embedding for Multi-label Annotation”, 2018
    [WEL 10] P. Welinder and S. Branson and T. Mita and C. Wah, F. Schroff, S. Belongie, P. Perona, “California Institute of Technology”, 2010
    [WU 15] B. Wu, S. Lyu, B. Ghanem, “Ml-mg: multi-label learning with missing labels using a mixed graph”, 2015.
    [XIA 17] W., Xiaosong, P., Yifan, L. Le, L. Zhiyong, B. Mohammadhadi, S. Ronald, “ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases”, 2017
    [XIA 18] Y. Xian, C. H. Lampert, B. Schiele, Z. Akata. "Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly", 2018.

    無法下載圖示 校內:2025-08-10公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE