研究生: |
廖傑恩 Liao, Jay Chiehen |
---|---|
論文名稱: |
表格資料之圖結構學習及其下游分類任務應用 Learning Graph Structures from Tabular Data for Downstream Classification Tasks |
指導教授: |
李政德
Li, Cheng-Te |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 數據科學研究所 Institute of Data Science |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 64 |
中文關鍵詞: | 表格資料 、圖神經網路 、圖結構學習 、對比學習 、Transformer 、集成學習 、模型預訓練 、模型微調 、端到端訓練 |
外文關鍵詞: | Tabular Data, Graph Neural Network, Graph Structure Learning, Contrastive Learning, Transformer, Ensemble Learning, Model Pre-training, Model Fine-tuning, End-to-end Training |
相關次數: | 點閱:211 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
表格資料是真實世界中最常見的資料型態之一。目前已有許多深度學習方法被提出來對表格資料進行建模。這些方法大都考慮了特徵的交互關係,但忽略了觀測值之間的關聯。然而,觀測值之間的關聯可能有助於下游分類任務,因為表格資料中不同的觀測值可能在特徵和預測目標之間共享相似的相關模式。這些資訊可以透過圖神經網路來捕獲,其需要可靠的圖結構。目前已有幾項研究專注於圖結構學習,從具有雜訊的圖中獲得更好的圖結構,但其大多數沒有考慮從最初沒有圖結構的表格資料中去學習圖結構。此外,目前還不清楚如何將深度學習構件與圖結構學習結合,設計出合適的訓練情景。因此,本研究旨在將圖結構學習方法融合其他兩種深度學習構件(即特徵提取器和來自多個資料視角的集成學習方法),從表格資料中學習圖結構以用於下游分類任務。在30個表格資料上進行的實驗顯示,學習圖結構有助於對表格資料進行編碼(encoding)並在使下游分類模型具備良好的預測表現。本研究也發現基於Transformer的特徵萃取器通過提取特徵交互的語意訊息有助於學習圖結構。結合Transformer萃取器的圖結構學習方法,在端到端情景下進行訓練,在下游分類任務的表現上,顯著優於與其他設定下的圖結構學習方法與基線模型(baseline),包含在表格資料建模上長期居於領先地位的梯度提升決策樹(gradient boosting decision tree, GBDT)。此外,本研究也對於從表格資料中學習圖結構的訓練情景設計提供了建議,可供未來相關研究參考。
Tabular data is one of the most common data types in the real-world. Many deep learning methods were proposed to model tabular data. Most of them consider feature interactions but ignore associations among instances. However, associations among instances can be helpful for downstream classification tasks because instances in tabular data may share similar patterns of correlations among features and the target labels. Such information can be captured by graph neural networks, which require credible graph structures. Several works focused on graph structure learning to obtain a better graph structure from a noisy one, but most do not consider learning graph structures from tabular data where no graph is available originally. Besides, it is unclear how to integrate deep learning components with graph structural learning and design the appropriate training setting. As a result, this study aims to integrate a graph structure learning method with other deep learning components, namely the Transformer-based feature extractor and the ensemble learning from multiple data views, to learn graph structures from tabular data for downstream classification tasks. Experiments on 30 tabular datasets show that learning graph structures help encode tabular data and perform downstream classification tasks competitively. We find that the Transformer-based extractor is helpful for learning graph structures by extracting contextual information about feature interactions. The graph structural learning method combined with the Transformer-based extractor and trained with the end-to-end setting significantly outperforms baselines and graph structural learning methods with other designing. This study also discusses training settings for learning graph structures from tabular data and provides suggestions for future studies.
[1] Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery amp; Data Mining (New York, NY, USA, 2019), KDD'19, Association for Computing Machinery, p. 2623–2631.
[2] Arik, S. Ö., and Pfister, T. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence (2021), vol. 35, pp. 6679–6687.
[3] Bischl, B., Casalicchio, G., Feurer, M., Hutter, F., Lang, M., Mantovani, R. G., van Rijn, J. N., and Vanschoren, J. Openml benchmarking suites. arXiv preprint arXiv:1708.03731 (2017).
[4] Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., and Kasneci, G. Deep neural networks and tabular data: A survey. arXiv preprint arXiv:2110.01889 (2021).
[5] Caron, M., Bojanowski, P., Joulin, A., and Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV) (2018), pp. 132–149.
[6] Chen, T., and Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016), pp. 785–794.
[7] Chen, Y., Wu, L., and Zaki, M. Iterative deep graph learning for graph neural networks: Better and robust node embeddings. Advances in neural information processing systems 33 (2020), 19314–19326.
[8] Covington, P., Adams, J., and Sargin, E. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (New York, NY, USA, 2016), RecSys’16, Association for Computing Machinery, p. 191– 198.
[9] Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016).
[10] Demšar, J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research 7 (2006), 1–30.
[11] Dong, X., Thanou, D., Toni, L., Bronstein, M., and Frossard, P. Graph signal processing for machine learning: A review and new perspectives. IEEE Signal Processing Magazine 37, 6 (2020), 117–127.
[12] Falkner, S., Klein, A., and Hutter, F. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning (2018), PMLR, pp. 1437–1446.
[13] Fatemi, B., El Asri, L., and Kazemi, S. M. Slaps: Self-supervision improves structure learning for graph neural networks. In Advances in Neural Information Processing Systems (2021), M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34, Curran Associates, Inc., pp. 22667–22681.
[14] Franceschi, L., Niepert, M., Pontil, M., and He, X. Learning discrete structures for graph neural networks. In International conference on machine learning (2019), PMLR, pp. 1972–1982.
[15] Gao, X., Hu, W., and Guo, Z. Exploring structure-adaptive graph learning for robust semi-supervised classification. In 2020 IEEE International Conference on Multimedia and Expo (ICME) (2020), IEEE, pp. 1–6.
[16] Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In International conference on machine learning (2017), PMLR, pp. 1263–1272.
[17] Gorishniy, Y., Rubachev, I., Khrulkov, V., and Babenko, A. Revisiting deep learning models for tabular data. In NeurIPS (2021).
[18] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33 (2020), 21271–21284.
[19] Guo, H., Tang, R., Ye, Y., Li, Z., and He, X. Deepfm: A factorization-machine based neural network for ctr prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (2017), IJCAI’17, AAAI Press, p. 1725–1731.
[20] Guo, X., Quan, Y., Zhao, H., Yao, Q., Li, Y., and Tu, W. Tabgnn: Multiplex graph neural network for tabular data prediction. arXiv preprint arXiv:2108.09127 (2021).
[21] Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
[22] Huang, X., Khetan, A., Cvitkovic, M., and Karnin, Z. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678 (2020).
[23] Ivanov, S., and Prokhorenkova, L. Boostthenconvolve: Gradient boosting meets graph neural networks. arXiv preprint arXiv:2101.08543 (2021).
[24] Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., and Tang, J. Graph structure learning for robust graph neural networks. KDD ’20, Association for Computing Machinery, p. 66– 74.
[25] Kadra, A., Lindauer, M., Hutter, F., and Grabocka, J. Well-tuned simple nets excel on tabular datasets. Advances in neural information processing systems 34 (2021), 23928– 23941.
[26] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).
[27] Ke, G., Zhang, J., Xu, Z., Bian, J., and Liu, T.-Y. Tabnn: A universal neural network solution for tabular data.
[28] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[29] Kipf, T. N., and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (2017).
[30] Kossen, J., Band, N., Lyle, C., Gomez, A. N., Rainforth, T., and Gal, Y. Self-attention between datapoints: Going beyond individual input-output pairs in deep learning. Advances in Neural Information Processing Systems 34 (2021), 28742–28756.
[31] Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765–6816.
[32] Liu, Y., Zheng, Y., Zhang, D., Chen, H., Peng, H., and Pan, S. Towards unsupervised deep graph structure learning. In Proceedings of the ACM Web Conference 2022 (New York, NY, USA, 2022), WWW ’22, Association for Computing Machinery, p. 1392– 1403.
[33] Luo, Y., Zhou, H., Tu, W.-W., Chen, Y., Dai, W., and Yang, Q. Network on network for tabular data classification in real-world applications. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020), pp. 2317–2326.
[34] Mockus, J. Application of bayesian approach to numerical methods of global and stochastic optimization. Journal of Global Optimization 4, 4 (1994), 347–365.
[35] Oord, A. v. d., Li, Y., and Vinyals, O. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
[36] Park, C., Park, J., and Park, S. Agcn: Attention-based graph convolutional networks for drug-drug interaction extraction. Expert Systems with Applications 159 (2020), 113538.
[37] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[38] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. Catboost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (2018), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, Eds., vol. 31, Curran Associates, Inc.
[39] Qu, Y., Cai, H., Ren, K., Zhang, W., Yu, Y., Wen, Y., and Wang, J. Product-based neural networks for user response prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM) (2016), IEEE, pp. 1149–1154.
[40] Rubachev, I., Alekberov, A., Gorishniy, Y., and Babenko, A. Revisiting pretraining objectives for tabular deep learning, 2022.
[41] Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems 29 (2016).
[42] Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C. B., and Goldstein, T. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342 (2021).
[43] Uçar, T., Hajiramezanali, E., and Edwards, L. Subtab: Subsetting features of tabular data for self-supervised representation learning. Advances in Neural Information Processing Systems 34 (2021).
[44] Van der Maaten, L., and Hinton, G. Visualizing data using t-sne. Journal of machine learning research 9, 11 (2008).
[45] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. Advances in neural information processing systems 30 (2017).
[46] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations (2018).
[47] Woolson, R. F. Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials (2007), 1–3.
[48] Wu, Q., Yang, C., and Yan, J. Towards open-world feature extrapolation: An inductive graph learning approach. In Advances in Neural Information Processing Systems (NeurIPS) (2021).
[49] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (2021), 4–24.
[50] Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
[51] You, J., Ma, X., Ding, Y., Kochenderfer, M. J., and Leskovec, J. Handling missing data with graph representation learning. In Advances in Neural Information Processing Systems (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., pp. 19075–19087.
[52] You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., and Shen, Y. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems 33 (2020), 5812–5823.
[53] Yu, D., Zhang, R., Jiang, Z., Wu, Y., and Yang, Y. Graph-revised convolutional network. In Joint European conference on machine learning and knowledge discovery in databases (2020), Springer, pp. 378–393.
[54] Zhang, W., Du, T., and Wang, J. Deep learning over multi-field categorical data. In European conference on information retrieval (2016), Springer, pp. 45–57.
[55] Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81.
[56] Zhu, Y., Xu, W., Zhang, J., Liu, Q., Wu, S., and Wang, L. Deep graph structure learning for robust representations: A survey. CoRR abs/2103.03036 (2021).