簡易檢索 / 詳目顯示

研究生: 許喬雅
Hsu, Chiao-Ya
論文名稱: 邁向兼顧準確和現實的表格資料預測:自監督梯度提升圖神經網路
Toward Accurate and Realistic Predictions in Tabular Data: Self-Supervised Gradient Boosting with Graph Neural Networks
指導教授: 李政德
Li, Cheng-Te
學位類別: 碩士
Master
系所名稱: 管理學院 - 數據科學研究所
Institute of Data Science
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 71
中文關鍵詞: 表格資料預測圖神經網路圖特徵學習特徵增量學習遷移學習
外文關鍵詞: Tabular Data Prediction, Graph Neural Networks, Graph Representation Learning, Feature Incremental Learning, Transfer Learning
相關次數: 點閱:87下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 表格資料在實際應用中是最常見的資料型態,已經有眾多強大的機器學習方法與神經網路模型被開發用於表格資料的預測,也發展出很好的預測效能,但這些模型大多主要專注於兩兩特徵間的互動,卻忽視了在表格數據預測中資料點與資料點之間、資料點與特徵之間的互動資訊。因此,有許多研究將表格資料轉換為圖的結構,並利用圖神經網路來完成此任務。
    在本研究中,我們提出了一種新的框架「自監督梯度提升圖神經網絡」此框架藉由圖特徵學習的強大能力,並結合梯度提升決策樹,來更新由圖神經網路的梯度優化的樹狀結構。此外,我們將集成學習納入我們的框架中,以增加模型的多樣性並減少雜訊與異常值的干擾,將表格資料劃分為多重圖,並匯總來自多視角表示的預測結果。我們還為我們的模型設計了多種訓練策略,將對比學習、聚類學習和特徵重建(Feature Reconstruction)作為預訓練階段或微調階段的組件,以優化圖特徵學習的輸出結果(embeddings)。
    為了更符合實際任務的需求,我們設計了GBGNN模型的變體,讓我們的模型可以應用於特徵增量學習和遷移學習,以擴大實務上的應用領域。此表格資料預測模型在20種不同類型的表格數據集上進行了實驗,包括多類別、高維度和巨量資料集。結果顯示我們的模型相比現有最先進的方法相比有更好的效能及穩健性,並實現了開發一個兼顧真實性與準確性模型的目標,可供未來研究作為參考的指標。

    Tabular data is the most prevalent data type in real-world applications and has seen numerous machine learning methods devised for tabular data learning. Despite the ease of use of these traditional methods, models based on various neural networks have consistently demonstrated superior prediction performance. Most tabular data learning (TDL) models, however, primarily focus on pairwise feature interactions, overlooking the significance of instance-feature interactions in tabular data prediction. Consequently, several studies have transformed tabular data into graph-based structures, exploiting graph representation learning (GRL) to accomplish the task. In this study, we propose a novel framework, Gradient Boosting Graph Neural Networks (GBGNN), which harnesses the power of GRL via Graph Neural Networks (GNNs) and leverages the efficacy of Gradient Boosting Decision Trees (GBDTs) to create new trees optimized by GNNs' gradients. Furthermore, we incorporate ensemble learning into our framework, partitioning tabular data into multi-graph and aggregating predictions from multi-view representations, to enhance the model diversity and reduce the influence of noisy data and outliers. In addition, we design various training strategies for our models, incorporating Contrastive Learning, Clustering Learning, and Feature Reconstruction as components in either the pre-training or fine-tuning stage to optimize GRL output. To align more closely with the demands of real-world tasks, we also devise variants of GBGNN that address feature incremental learning and transfer learning, thereby extending the field of application. We conducted experiments on 20 different types of tabular datasets, including multi-class, high-dimensional, and large-scale data, demonstrating the superior performance of our model compared to existing state-of-the-art methods, showing the robustness and efficacy of our approach, and achieving the goal of developing a model that is both realistic and accurate.

    摘要 i Abstract ii 誌謝 iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1. Introduction 1 1.1 Research Background 1 1.2 Motivations 2 1.2.1. Room to Improve Accuracy 2 1.2.2. TDL Realistic Scenarios 2 1.3 Research Goals 3 1.4 Challenges 4 1.5 Main Components of our Works 5 1.6 Contributions 6 Chapter 2. Related Works 7 2.1 Graph Representation Learning 7 2.2 Tabular Data Learning on Neural Networks 7 2.3 Tabular Data Learning on Graph Neural Networks 8 2.4 Feature Incremental Learning 8 2.5 Transfer Learning 9 2.6 Summary of Related Works 10 Chapter 3. Problem Statement 12 3.1 Tabular Data Learning 12 3.2 Feature Incremental TDL 12 3.3 Transferable TDL 13 3.4 Summary of Problem Statement 14 Chapter 4. Methodology 15 4.1 Approach Sketch 15 4.2 Gradient Boosting Graph Neural Networks (GBGNN) 16 4.2.1. Architecture Overview 16 4.2.2. Gradient Boosting Training 17 4.2.3. Graph Construction 17 4.2.4. Gradient Boosting Updating 20 4.3 GNNs Module 21 4.3.1. Graph Representation Learning 23 4.3.2. Contrastive Learning 26 4.3.3. Clustering Learning 27 4.3.4. MLP Predictor 28 4.4 Ensemble GBGNN (E-GBGNN) 29 4.4.1. Multi-graph Construction 29 4.4.2. Weights Sharing Mechanism 30 4.4.3. Information Aggregation 32 4.5 Training Strategy 32 4.6 Incremental GBGNN (I-GBGNN) 33 4.7 Transferable GBGNN (T-GBGNN) 36 4.7.1. Application Scenarios 39 Chapter 5. Experiments 41 5.1 Experimental Settings 41 5.1.1. Datasets 42 5.1.2. Competing Methods 43 5.2 Tabular Data Learning 45 5.2.1. Experiment Results of Small-scale Datasets 45 5.2.2. Experiment Results of Large-scale Datasets 45 5.3 Feature Incremental TDL 48 5.4 Transferable TDL 49 5.4.1. Datasets 49 5.4.2. Experiment Results 51 5.4.3. Application Scenarios 52 5.5 Noise Study 55 5.6 Ablation Study 57 5.6.1. Effect of Training Strategy 57 5.6.2. Effect of Components in GBGNN 58 5.7 Hyper-parameters Study 59 5.7.1. Effect of GNN Layers 59 5.7.2. Effect of Ensemble Configurations 60 Chapter 6. Conclusion 62 References 65

    [1] X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua, “Neural graph collaborative filtering,” in Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, 2019, pp. 165–174.
    [2] Z. Li, Z. Cui, S. Wu, X. Zhang, and L. Wang, “Fi-gnn: Modeling feature interactions via graph neural networks for ctr prediction,” in Proceedings of the 28th ACM international conference on information and knowledge management, 2019, pp. 539–548.
    [3] H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo, “Multi-task feature learning for knowledge graph enhanced recommendation,” in The world wide web conference, 2019, pp. 2000–2010.
    [4] J. Wu, X. Wang, F. Feng, X. He, L. Chen, J. Lian, and X. Xie, “Self-supervised graph learning for recommendation,” in Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, 2021, pp. 726–735.
    [5] H. Peng, J. Li, Y. He, Y. Liu, M. Bao, L. Wang, Y. Song, and Q. Yang, “Large-scale hierarchical text classification with recursively regularized deep graph-cnn,” in Proceedings of the 2018 world wide web conference, 2018, pp. 1063–1072.
    [6] L. Yao, C. Mao, and Y. Luo, “Graph convolutional networks for text classification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 7370–7377.
    [7] Y. Lin, Y. Meng, X. Sun, Q. Han, K. Kuang, J. Li, and F. Wu, “Bertgcn: Transductive text classification by combining gcn and bert,” arXiv preprint arXiv:2105.05727, 2021.
    [8] S. Rhee, S. Seo, and S. Kim, “Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification,” arXiv preprint arXiv:1711.05859, 2017.
    [9] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy side effects with graph convolutional networks,” Bioinformatics, vol. 34, no. 13, pp. i457–i466, 2018.
    [10] Y. Li and A. Gupta, “Beyond grids: Learning graph representations for visual recognition,” Advances in neural information processing systems, vol. 31, 2018.
    [11] Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5177–5186.
    [12] W. Shi and R. Rajkumar, “Point-gnn: Graph neural network for 3d object detection in a point cloud,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 1711–1719.
    [13] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
    [14] H. Guo, R. Tang, Y. Ye, Z. Li, and X. He, “Deepfm: a factorization-machine based neural network for ctr prediction,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, pp. 1725–1731.
    [15] S. Rendle, “Factorization machines,” in 2010 IEEE International conference on data mining. IEEE, 2010, pp. 995–1000.
    [16] S. Ivanov and L. Prokhorenkova, “Boost then convolve: Gradient boosting meets graph neural networks,” in International Conference on Learning Representations, 2021.
    [17] X. Guo, Y. Quan, H. Zhao, Q. Yao, Y. Li, and W. Tu, “Tabgnn: Multiplex graph neural network for tabular data prediction,” The 3rd International Workshop on Deep Learning Practice for High-Dimensional Sparse Data with KDD 2021, 2021.
    [18] K. Zhou, Z. Liu, R. Chen, L. Li, S.-H. Choi, and X. Hu, “Table2graph: Transforming tabular data to unified weighted graph,” 2022.
    [19] D. McElfresh, S. Khandagale, J. Valverde, G. Ramakrishnan, M. Goldblum, C. White et al., “When do neural nets outperform boosted trees on tabular data?” arXiv preprint arXiv:2305.02997, 2023.
    [20] Q. Wu, C. Yang, and J. Yan, “Towards open-world feature extrapolation: An inductive graph learning approach,” Advances in Neural Information Processing Systems, vol. 34, pp. 19435–19447, 2021.
    [21] Z. Wang and J. Sun, “Transtab: Learning transferable tabular transformers across tables,” in Advances in Neural Information Processing Systems, 2022.
    [22] T. Ucar, E. Hajiramezanali, and L. Edwards, “Subtab: Subsetting features of tabular data for self-supervised representation learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 18853–18865, 2021.
    [23] J. Xia, L. Wu, J. Chen, B. Hu, and S. Z. Li, “Simgrace: A simple framework for graph contrastive learning without data augmentation,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 1070–1079.
    [24] J. You, X. Ma, Y. Ding, M. J. Kochenderfer, and J. Leskovec, “Handling missing data with graph representation learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 19075–19087, 2020.
    [25] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710.
    [26] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864.
    [27] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th international conference on world wide web, 2015, pp. 1067–1077.
    [28] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2016.
    [29] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
    [30] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2017.
    [31] P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax.” ICLR (Poster), vol. 2, no. 3, p. 4, 2019.
    [32] J. Yoon, Y. Zhang, J. Jordon, and M. van der Schaar, “Vime: Extending the success of self-and semi-supervised learning to tabular domain,” Advances in Neural Information Processing Systems, vol. 33, pp. 11033–11043, 2020.
    [33] Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko, “Revisiting deep learning models for tabular data,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 932–18 943, 2021.
    [34] D. Bahri, H. Jiang, Y. Tay, and D. Metzler, “Scarf: Self-supervised contrastive learning using random feature corruption,” in International Conference on Learning Representations, 2021.
    [35] R. Polikar, L. Upda, S. S. Upda, and V. Honavar, “Learn++: An incremental learning algorithm for supervised neural networks,” IEEE transactions on systems, man, and cybernetics, part C (applications and reviews), vol. 31, no. 4, pp. 497–508, 2001.
    [36] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
    [37] R. Kemker and C. Kanan, “Fearnet: Brain-inspired model for incremental learning,” arXiv preprint arXiv:1711.10563, 2017.
    [38] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
    [39] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
    [40] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [41] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
    [42] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
    [43] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018.
    [44] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
    [45] Z. Hu, Y. Dong, K. Wang, K.-W. Chang, and Y. Sun, “Gpt-gnn: Generative pre-training of graph neural networks,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1857–1867.
    [46] J. Zhang, H. Zhang, C. Xia, and L. Sun, “Graph-bert: Only attention is needed for learning graph representations,” arXiv preprint arXiv:2001.05140, 2020.
    [47] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
    [48] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: unbiased boosting with categorical features,” Advances in neural information processing systems, vol. 31, 2018.
    [49] S. Ö. Arik and T. Pfister, “Tabnet: Attentive interpretable tabular learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 6679–6687.
    [50] Y. Xie, Z. Wang, Y. Li, B. Ding, N. M. Gürel, C. Zhang, M. Huang, W. Lin, and J. Zhou, “Fives: Feature interaction via edge search for large-scale tabular data,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 3795–3805.
    [51] K. Du, W. Zhang, R. Zhou, Y. Wang, X. Zhao, J. Jin, Q. Gan, Z. Zhang, and D. Wipf, “Learning enhanced representation for tabular data via neighborhood propagation,” in Advances in Neural Information Processing Systems, 2022.
    [52] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg, I. Titov, and M. Welling, “Modeling relational data with graph convolutional networks,” in European Semantic Web Conference. Springer, 2018, pp. 593–607.
    [53] Y. Liu, Y. Zheng, D. Zhang, H. Chen, H. Peng, and S. Pan, “Towards unsupervised deep graph structure learning,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 1392–1403.
    [54] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
    [55] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
    [56] K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” Advances in neural information processing systems, vol. 29, 2016.
    [57] S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information theory, vol. 28, no. 2, pp. 129–137, 1982.
    [58] D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
    [59] M. Feurer, J. N. Van Rijn, A. Kadra, P. Gijsbers, N. Mallik, S. Ravi, A. Müller, J. Vanschoren, and F. Hutter, “Openml-python: an extensible python api for openml,” The Journal of Machine Learning Research, vol. 22, no. 1, pp. 4573–4577, 2021.
    [60] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
    [61] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
    [62] (2021) Project data sphere. [Online]. Available: https://www.projectdatasphere.org/projectdatasphere/html/home
    [63] B. Knyazev, G. W. Taylor, and M. Amer, “Understanding attention and generalization in graph neural networks,” Advances in neural information processing systems, vol. 32, 2019.
    [64] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29–43, Oct. 2003. [Online]. Available: http://doi.acm.org/10.1145/1165389.945450
    [65] X. Guo, Y. Quan, H. Zhao, Q. Yao, Y. Li, and W. Tu, “Tabgnn: Multiplex graph neural network for tabular data prediction,” 2021.

    無法下載圖示 校內:2028-08-03公開
    校外:2028-08-03公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE