| 研究生: |
李旻昊 Li, Man-Ho |
|---|---|
| 論文名稱: |
融合資料補全、模型剪枝與知識蒸餾聯合學習之輕量穩健圖神經網路 iPaDGNN: Joint Learning with Imputation, Pruning, and Distillation for Robust Lightweight Graph Neural Networks |
| 指導教授: |
李政德
Li, Cheng-Te |
| 學位類別: |
碩士 Master |
| 系所名稱: |
敏求智慧運算學院 - 智慧運算碩士學位學程 MS Degree in Intelligent Computing |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 圖神經網路 、資料補全 、模型剪枝 、知識蒸餾 、協同學習 |
| 外文關鍵詞: | Graph Neural Network, Data Imputation, Model Pruning, Knowledge Distillation, Collaborative Learning |
| 相關次數: | 點閱:103 下載:83 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
圖神經網路(Graph Neural Networks, GNNs)於多種圖結構資料分析任務中表現卓越。然而,其在實務應用上仍面臨三大環環相扣的挑戰:資料不完整、高昂的訓練成本,以及過高的推論開銷。儘管現有方法各自處理單一問題,卻未能提供一個整體的解決方案。
為解決此問題,本研究提出 iPaD(Joint Learning with Imputation, Pruning, And Distillation for Robust Lightweight Graph Neural Networks),一個建立在「協同優化」核心原則之上的統一框架,旨在對 GNN 的完整流程進行聯合優化。本框架透過一套連貫的流程來實現此設計哲學:首先,以高效的特徵補全技術確保資料的完整性;接著,採用一創新的代理剪枝策略,打造出一個輕量且強健的 GNN 教師模型;最終,再將其知識蒸餾至一個更高效的 MLP 學生模型中,以達成快速推論的目的。
大量的實驗證明,iPaD不僅能在高達 99.5% 特徵缺失的資料集上保持高準確性,其推論速度更比基準 GNN 教師模型提升超過五倍。本研究證明,此一整體性的設計方法,為建構能夠實際應用於真實世界、兼具魯棒性、高效率與可部署性的 GNN,提供了一條實用且強而有力的途徑。
Graph Neural Networks (GNNs) have demonstrated remarkable performance in various graph-structured data analysis tasks. However, their practical deployment is severely hindered by a trio of interconnected challenges: incomplete data, high training costs, and prohibitive inference overhead. While existing methods tackle these issues in isolation, they fail to provide a holistic solution.
To address this gap, we introduce iPaD (Joint Learning with Imputation, Pruning, And Distillation for Robust Lightweight Graph Neural Networks), a unified framework built on the core principle of synergistic optimization across the entire GNN pipeline. Our framework actualizes this philosophy by first ensuring data integrity with an efficient feature imputation technique. It then employs a novel proxy-based pruning strategy to create a lightweight yet robust GNN teacher, whose knowledge is subsequently distilled into an even more efficient MLP student for fast inference.
Extensive experiments show that iPaD not only maintains high accuracy on datasets with up to 99.5% of features missing but also achieves over a 5x reduction in inference time compared to its baseline GNN teacher. Our work demonstrates that this holistic approach provides a practical and powerful pathway for building robust, efficient, and deployable GNNs for real-world applications.
[1] Y. Luo, L. Shi, and X.-M. Wu, “Classic gnns are strong baselines: Reassessing gnns for node classification,” arXiv preprint arXiv:2406.08993, 2024.
[2] L. Waikhom and R. Patgiri, “A survey of graph neural networks in various learning paradigms: methods, applications, and challenges,” Artificial Intelligence Review, vol. 56, no. 7, pp. 6295–6364, 2023.
[3] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
[4] B. Khemani, S. Patil, K. Kotecha, and S. Tanwar, “A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions,” Journal of Big Data, vol. 11, no. 1, p. 18, 2024.
[5] S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: a survey,” ACM Computing Surveys, vol. 55, no. 5, pp. 1–37, 2022.
[6] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 974–983, 2018.
[7] J. Jiang, C. Wilson, X. Wang, W. Sha, P. Huang, Y. Dai, and B. Y. Zhao, “Understanding latent interactions in online social networks,” ACM Transactions on the Web (TWEB), vol. 7, no. 4, pp. 1–39, 2013.
[8] H. Zhang, B. Wu, X. Yuan, S. Pan, H. Tong, and J. Pei, “Trustworthy graph neural networks: Aspects, methods, and trends,” Proceedings of the IEEE, 2024.
[9] K. Huang, J. Zhai, Z. Zheng, Y. Yi, and X. Shen, “Understanding and bridging the gaps in current gnn performance optimizations,” in Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 119–132, 2021.
[10] A. Mohi ud din and S. Qureshi, “A review of challenges and solutions in the design and implementation of deep graph neural networks,” International Journal of Computers and Applications, vol. 45, no. 3, pp. 221–230, 2023.
[11] X. Chen, S. Chen, J. Yao, H. Zheng, Y. Zhang, and I. W. Tsang, “Learning on attributemissing graphs,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 2, pp. 740–757, 2020.
[12] B. Jiang and Z. Zhang, “Incomplete graph representation and learning via partial graph neural networks,” arXiv preprint arXiv:2003.10130, 2020.
[13] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI open, vol. 1, pp. 57–81, 2020.
[14] Z. Huan, Y. Quanming, and T. Weiwei, “Search to aggregate neighborhood for graph neural network,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 552–563, IEEE, 2021.
[15] H. Ma, Y. Rong, and J. Huang, “Graph neural networks: Scalability,” Graph Neural Networks: Foundations, Frontiers, and Applications, pp. 99–119, 2022.
[16] C. Gao, Y. Zheng, N. Li, Y. Li, Y. Qin, J. Piao, Y. Quan, J. Chang, D. Jin, X. He, et al., “A survey of graph neural networks for recommender systems: Challenges, methods, and directions,” ACM Transactions on Recommender Systems, vol. 1, no. 1, pp. 1–51, 2023.
[17] X. Zheng, Y. Wang, Y. Liu, M. Li, M. Zhang, D. Jin, P. S. Yu, and S. Pan, “Graph neural networks for graphs with heterophily: A survey,” arXiv preprint arXiv:2202.07082, 2022.
[18] J. You, X. Ma, Y. Ding, M. J. Kochenderfer, and J. Leskovec, “Handling missing data with graph representation learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 19075–19087, 2020.
[19] E. Rossi, H. Kenlay, M. I. Gorinova, B. P. Chamberlain, X. Dong, and M. M. Bronstein, “On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features,” in Learning on graphs conference, pp. 11–1, PMLR, 2022.
[20] H. Taguchi, X. Liu, and T. Murata, “Graph convolutional networks for graphs containing missing features,” Future Generation Computer Systems, vol. 117, pp. 155–168, 2021.
[21] D. Um, J. Park, S. Park, and J. young Choi, “Confidence-based feature imputation for graphs with partially known features,” in The Eleventh International Conference on Learning Representations, 2023.
[22] X. ZhuЃ and Z. GhahramaniЃн, “Learning from labeled and unlabeled data with label propagation,” ProQuest number: information to all users, 2002.
[23] H. Zhou, A. Srivastava, H. Zeng, R. Kannan, and V. Prasanna, “Accelerating large scale real-time gnn inference using channel pruning,” arXiv preprint arXiv:2105.04528, 2021.
[24] D. Gurevin, M. Shan, S. Huang, M. A. Hasan, C. Ding, and O. Khan, “Prunegnn: Algorithm-architecture pruning framework for graph neural network acceleration,” in 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 108–123, IEEE, 2024.
[25] T. Chen, Y. Sui, X. Chen, A. Zhang, and Z. Wang, “A unified lottery ticket hypothesis for graph neural networks,” in International conference on machine learning, pp. 1695– 1706, PMLR, 2021.
[26] Y.-D. Sui, X. Wang, T. Chen, M. Wang, X.-N. He, and T.-S. Chua, “Inductive lottery ticket learning for graph neural networks,” Journal of Computer Science and Technology, vol. 39, no. 6, pp. 1223–1237, 2024.
[27] Y. Yang, J. Qiu, M. Song, D. Tao, and X. Wang, “Distilling knowledge from graph convolutional networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7074–7083, 2020.
[28] B. Yan, C. Wang, G. Guo, and Y. Lou, “Tinygnn: Learning efficient graph neural networks,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1848–1856, 2020.
[29] S. Zhang, Y. Liu, Y. Sun, and N. Shah, “Graph-less neural networks: Teaching old mlps new tricks via distillation,” arXiv preprint arXiv:2110.08727, 2021.
[30] L. Wu, H. Lin, Y. Huang, , and S. Z. Li, “Quantifying the knowledge in gnns for reliable distillation into mlps,” in International Conference on Machine Learning, PMLR, 2023.
[31] Y. Chen, Y. Bian, X. Xiao, Y. Rong, T. Xu, and J. Huang, “On self-distilling graph neural network,” arXiv preprint arXiv:2011.02255, 2020.
[32] L. Rampášek, M. Galkin, V. P. Dwivedi, A. T. Luu, G. Wolf, and D. Beaini, “Recipe for a general, powerful, scalable graph transformer,” Advances in Neural Information Processing Systems, vol. 35, pp. 14501–14515, 2022.
[33] J. Chen, K. Gao, G. Li, and K. He, “Nagphormer: A tokenized graph transformer for node classification in large graphs,” arXiv preprint arXiv:2206.04910, 2022.
[34] H. Shirzad, A. Velingker, B. Venkatachalam, D. J. Sutherland, and A. K. Sinop, “Exphormer: Sparse transformers for graphs,” in International Conference on Machine Learning, pp. 31613–31632, PMLR, 2023.
[35] K. Kong, J. Chen, J. Kirchenbauer, R. Ni, C. B. Bruss, and T. Goldstein, “Goat: A global transformer on large-scale graphs,” in International Conference on Machine Learning, pp. 17375–17390, PMLR, 2023.
[36] Q. Wu, W. Zhao, Z. Li, D. P. Wipf, and J. Yan, “Nodeformer: A scalable graph structure learning transformer for node classification,” Advances in Neural Information Processing Systems, vol. 35, pp. 27387–27401, 2022.
[37] Q. Wu, W. Zhao, C. Yang, H. Zhang, F. Nie, H. Jiang, Y. Bian, and J. Yan, “Sgformer: Simplifying and empowering transformers for large-graph representations,” Advances in Neural Information Processing Systems, vol. 36, pp. 64753–64773, 2023.
[38] C. Deng, Z. Yue, and Z. Zhang, “Polynormer: Polynomial-expressive graph transformer in linear time,” arXiv preprint arXiv:2403.01232, 2024.
[39] K. Sharma, Y.-C. Lee, S. Nambi, A. Salian, S. Shah, S.-W. Kim, and S. Kumar, “A survey of graph neural networks for social recommender systems,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–34, 2024.
[40] X.-M. Zhang, L. Liang, L. Liu, and M.-J. Tang, “Graph neural networks and their current applications in bioinformatics,” Frontiers in genetics, vol. 12, p. 690049, 2021.
[41] H. Zhang, B. Wu, X. Yuan, S. Pan, H. Tong, and J. Pei, “Trustworthy graph neural networks: Aspects, methods, and trends,” Proceedings of the IEEE, vol. 112, no. 2, pp. 97–139, 2024.
[42] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.
[43] X. Jiang, Z. Tian, and K. Li, “A graph-based approach for missing sensor data imputation,” IEEE Sensors Journal, vol. 21, no. 20, pp. 23133–23144, 2021.
[44] D. Adhikari, W. Jiang, J. Zhan, Z. He, D. B. Rawat, U. Aickelin, and H. A. Khorshidi, “A comprehensive survey on imputation of missing data in internet of things,” ACM Computing Surveys, vol. 55, no. 7, pp. 1–38, 2022.
[45] Z. Lin, C. Li, Y. Miao, Y. Liu, and Y. Xu, “Pagraph: Scaling gnn training on large graphs via computation-aware caching,” in Proceedings of the 11th ACM Symposium on Cloud Computing, pp. 401–415, 2020.
[46] D. Zheng, C. Ma, M. Wang, J. Zhou, Q. Su, X. Song, Q. Gan, Z. Zhang, and G. Karypis, “Distdgl: Distributed graph neural network training for billion-scale graphs,” in 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pp. 36–44, IEEE, 2020.
[47] Z. Xue, Y. Yang, and R. Marculescu, “Sugar: Efficient subgraph-level training via resource-aware graph partitioning,” IEEE Transactions on Computers, vol. 72, no. 11, pp. 3167–3177, 2023.
[48] W. Lu, Z. Guan, W. Zhao, and Y. Yang, “Adagmlp: Adaboosting gnn-to-mlp knowledge distillation,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2060–2071, 2024.
[49] X. Han, T. Zhao, Y. Liu, X. Hu, and N. Shah, “MLPInit: Embarrassingly simple GNN training acceleration with MLP initialization,” in International Conference on Learning Representations, 2023.
[50] G. Fang, X. Ma, M. Song, M. B. Mi, and X. Wang, “Depgraph: Towards any structural pruning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16091–16101, 2023.
[51] J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-adaptive sparsity for the magnitudebased pruning,” arXiv preprint arXiv:2010.07611, 2020.
[52] C. Shu, Y. Liu, J. Gao, Y. Zheng, and C. Shen, “Channel-wise knowledge distillation for dense prediction,” in ICCV, 2021.
[53] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,” AI magazine, vol. 29, no. 3, pp. 93–93, 2008.
[54] O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann, “Pitfalls of graph neural network evaluation,” arXiv preprint arXiv:1811.05868, 2018.
[55] P. Mernyei and C. Cangea, “Wiki-cs: A wikipedia-based benchmark for graph neural networks,” arXiv preprint arXiv:2007.02901, 2020.
[56] F. Monti, M. Bronstein, and X. Bresson, “Geometric matrix completion with recurrent multi-graph neural networks,” Advances in neural information processing systems, vol. 30, 2017.
[57] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631, 2019.