簡易檢索 / 詳目顯示

研究生: 李旻昊
Li, Man-Ho
論文名稱: 融合資料補全、模型剪枝與知識蒸餾聯合學習之輕量穩健圖神經網路
iPaDGNN: Joint Learning with Imputation, Pruning, and Distillation for Robust Lightweight Graph Neural Networks
指導教授: 李政德
Li, Cheng-Te
學位類別: 碩士
Master
系所名稱: 敏求智慧運算學院 - 智慧運算碩士學位學程
MS Degree in Intelligent Computing
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 94
中文關鍵詞: 圖神經網路資料補全模型剪枝知識蒸餾協同學習
外文關鍵詞: Graph Neural Network, Data Imputation, Model Pruning, Knowledge Distillation, Collaborative Learning
相關次數: 點閱:103下載:83
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 圖神經網路(Graph Neural Networks, GNNs)於多種圖結構資料分析任務中表現卓越。然而,其在實務應用上仍面臨三大環環相扣的挑戰:資料不完整、高昂的訓練成本,以及過高的推論開銷。儘管現有方法各自處理單一問題,卻未能提供一個整體的解決方案。
    為解決此問題,本研究提出 iPaD(Joint Learning with Imputation, Pruning, And Distillation for Robust Lightweight Graph Neural Networks),一個建立在「協同優化」核心原則之上的統一框架,旨在對 GNN 的完整流程進行聯合優化。本框架透過一套連貫的流程來實現此設計哲學:首先,以高效的特徵補全技術確保資料的完整性;接著,採用一創新的代理剪枝策略,打造出一個輕量且強健的 GNN 教師模型;最終,再將其知識蒸餾至一個更高效的 MLP 學生模型中,以達成快速推論的目的。
    大量的實驗證明,iPaD不僅能在高達 99.5% 特徵缺失的資料集上保持高準確性,其推論速度更比基準 GNN 教師模型提升超過五倍。本研究證明,此一整體性的設計方法,為建構能夠實際應用於真實世界、兼具魯棒性、高效率與可部署性的 GNN,提供了一條實用且強而有力的途徑。

    Graph Neural Networks (GNNs) have demonstrated remarkable performance in various graph-structured data analysis tasks. However, their practical deployment is severely hindered by a trio of interconnected challenges: incomplete data, high training costs, and prohibitive inference overhead. While existing methods tackle these issues in isolation, they fail to provide a holistic solution.
    To address this gap, we introduce iPaD (Joint Learning with Imputation, Pruning, And Distillation for Robust Lightweight Graph Neural Networks), a unified framework built on the core principle of synergistic optimization across the entire GNN pipeline. Our framework actualizes this philosophy by first ensuring data integrity with an efficient feature imputation technique. It then employs a novel proxy-based pruning strategy to create a lightweight yet robust GNN teacher, whose knowledge is subsequently distilled into an even more efficient MLP student for fast inference.
    Extensive experiments show that iPaD not only maintains high accuracy on datasets with up to 99.5% of features missing but also achieves over a 5x reduction in inference time compared to its baseline GNN teacher. Our work demonstrates that this holistic approach provides a practical and powerful pathway for building robust, efficient, and deployable GNNs for real-world applications.

    摘要 i Abstract ii 誌謝 iii Table of Contents / 目錄 iv List of Tables / 表格 vii List of Figures / 圖片 xi Chapter 1. Introduction 1 1.1. Motivation 1 1.2. Challenges 2 1.2.1. Incomplete Data 2 1.2.2. High Training Cost 2 1.2.3. Inefficient Inference 2 1.2.4. Lack of Unified Solutions 2 1.3. Research Goal 3 1.4. Contributions 3 1.5. Thesis Organization 4 Chapter 2. Related Work 5 2.1. Feature Imputation in Graph Neural Networks 5 2.2. Model Pruning in Graph Neural Networks 6 2.3. Knowledge Distillation for Graph Models 7 2.4. High-Level Comparison of SOTA GNNs 8 2.5. Chapter Summary 9 Chapter 3. Problem Statement 11 3.1. Problem Background 11 3.2. Notations 12 3.3. Formal Statement 12 3.4. Summary 14 Chapter 4. Methodology 15 4.1. Framework Overview 15 4.2. APCFI: Approximate Pseudo-Confidence Feature Imputation 17 4.2.1. Motivation and Core Idea 17 4.2.2. Technical Workflow 18 4.2.3. Implementation and Pseudocode 22 4.3. TunedGNN: Architecture Optimization for Robustness 25 4.3.1. Key Design Components 25 4.4. MPP: Mirror Projection Pruning 27 4.4.1. Mirror Projection Pruning (MPP): Theory and Workflow 27 4.5. Teacher Training 31 4.6. MP-KRD: Mirror Projection Knowledge-inspired Reliable Distillation 32 4.6.1. Motivation: The Need for Reliable Distillation 33 4.6.2. The MP-KRD Workflow 33 4.6.3. Deployment 37 4.7. Innovation Summary 38 Chapter 5. Experiment 41 5.1. Experimental Setup 41 5.1.1. Datasets 41 5.1.2. Compared Methods 42 5.1.3. Training Details 42 5.1.4. Missing Data Settings 42 5.1.5. Dataset Split Strategy 42 5.2. Validating APCFI: A Superior Balance of Accuracy and Efficiency 43 5.2.1. Efficiency Comparison 43 5.2.2. Accuracy under High Missing Rates 44 5.3. Baseline Performance: Establishing Competitiveness with Complete Data 46 5.4. Validating MPP: Quantifying Gains in Training and Inference Efficiency 48 5.4.1. Impact on Computational Costs 48 5.4.2. Accuracy-Efficiency Trade-off and the “Sweet Spot” 49 5.5. Stress Test: Evaluating Robustness under Extreme Data Corruption 50 5.5.1. Performance under Edge Missing 50 5.5.2. Performance under Feature Missing 52 5.6. Dissecting the Framework: An Ablation Study on Core Modules 56 5.6.1. Ablation Results with GCN Backbone 56 5.6.2. Ablation Results with SAGE Backbone 57 5.6.3. Conclusion of Ablation Study 58 5.7. Component-Level Validation: Internal Ablation Studies 59 5.7.1. Validating the Robustness of the TunedGNN Backbone 59 5.7.2. Validating the APCFI Design: A Component-wise Ablation Study 59 5.7.3. Validating the MP-KRD Design 61 Chapter 6. Discussion and Conclusion 65 6.1. Summary of Findings and Contributions 65 6.2. Limitations 66 6.3. Future Work 66 6.4. Concluding Remarks 67 References 68 Appendix A. Hyperparameters 73 A.1. Analysis of APCFI Hyperparameters 73 A.2. Analysis of MP-KRD Hyperparameters 74 Appendix B. Generalization Ability 75 B.1. Generalization of iPaD to Various GNN Architectures 75 Appendix C. Effect of MPP Pruning on Teacher Models 77 C.1. Rationale for Applying MPP Pruning to Teacher Models 77 Appendix D. Joint Training Imputer 78 D.1. Joint Training of FP-based Imputer with MPP and KRD 78 Appendix E. Complex Missing Conditions 79 E.1. Performance under Combined Feature and Edge Missing Conditions 79

    [1] Y. Luo, L. Shi, and X.-M. Wu, “Classic gnns are strong baselines: Reassessing gnns for node classification,” arXiv preprint arXiv:2406.08993, 2024.
    [2] L. Waikhom and R. Patgiri, “A survey of graph neural networks in various learning paradigms: methods, applications, and challenges,” Artificial Intelligence Review, vol. 56, no. 7, pp. 6295–6364, 2023.
    [3] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
    [4] B. Khemani, S. Patil, K. Kotecha, and S. Tanwar, “A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions,” Journal of Big Data, vol. 11, no. 1, p. 18, 2024.
    [5] S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: a survey,” ACM Computing Surveys, vol. 55, no. 5, pp. 1–37, 2022.
    [6] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 974–983, 2018.
    [7] J. Jiang, C. Wilson, X. Wang, W. Sha, P. Huang, Y. Dai, and B. Y. Zhao, “Understanding latent interactions in online social networks,” ACM Transactions on the Web (TWEB), vol. 7, no. 4, pp. 1–39, 2013.
    [8] H. Zhang, B. Wu, X. Yuan, S. Pan, H. Tong, and J. Pei, “Trustworthy graph neural networks: Aspects, methods, and trends,” Proceedings of the IEEE, 2024.
    [9] K. Huang, J. Zhai, Z. Zheng, Y. Yi, and X. Shen, “Understanding and bridging the gaps in current gnn performance optimizations,” in Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 119–132, 2021.
    [10] A. Mohi ud din and S. Qureshi, “A review of challenges and solutions in the design and implementation of deep graph neural networks,” International Journal of Computers and Applications, vol. 45, no. 3, pp. 221–230, 2023.
    [11] X. Chen, S. Chen, J. Yao, H. Zheng, Y. Zhang, and I. W. Tsang, “Learning on attributemissing graphs,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 2, pp. 740–757, 2020.
    [12] B. Jiang and Z. Zhang, “Incomplete graph representation and learning via partial graph neural networks,” arXiv preprint arXiv:2003.10130, 2020.
    [13] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI open, vol. 1, pp. 57–81, 2020.
    [14] Z. Huan, Y. Quanming, and T. Weiwei, “Search to aggregate neighborhood for graph neural network,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 552–563, IEEE, 2021.
    [15] H. Ma, Y. Rong, and J. Huang, “Graph neural networks: Scalability,” Graph Neural Networks: Foundations, Frontiers, and Applications, pp. 99–119, 2022.
    [16] C. Gao, Y. Zheng, N. Li, Y. Li, Y. Qin, J. Piao, Y. Quan, J. Chang, D. Jin, X. He, et al., “A survey of graph neural networks for recommender systems: Challenges, methods, and directions,” ACM Transactions on Recommender Systems, vol. 1, no. 1, pp. 1–51, 2023.
    [17] X. Zheng, Y. Wang, Y. Liu, M. Li, M. Zhang, D. Jin, P. S. Yu, and S. Pan, “Graph neural networks for graphs with heterophily: A survey,” arXiv preprint arXiv:2202.07082, 2022.
    [18] J. You, X. Ma, Y. Ding, M. J. Kochenderfer, and J. Leskovec, “Handling missing data with graph representation learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 19075–19087, 2020.
    [19] E. Rossi, H. Kenlay, M. I. Gorinova, B. P. Chamberlain, X. Dong, and M. M. Bronstein, “On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features,” in Learning on graphs conference, pp. 11–1, PMLR, 2022.
    [20] H. Taguchi, X. Liu, and T. Murata, “Graph convolutional networks for graphs containing missing features,” Future Generation Computer Systems, vol. 117, pp. 155–168, 2021.
    [21] D. Um, J. Park, S. Park, and J. young Choi, “Confidence-based feature imputation for graphs with partially known features,” in The Eleventh International Conference on Learning Representations, 2023.
    [22] X. ZhuЃ and Z. GhahramaniЃн, “Learning from labeled and unlabeled data with label propagation,” ProQuest number: information to all users, 2002.
    [23] H. Zhou, A. Srivastava, H. Zeng, R. Kannan, and V. Prasanna, “Accelerating large scale real-time gnn inference using channel pruning,” arXiv preprint arXiv:2105.04528, 2021.
    [24] D. Gurevin, M. Shan, S. Huang, M. A. Hasan, C. Ding, and O. Khan, “Prunegnn: Algorithm-architecture pruning framework for graph neural network acceleration,” in 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 108–123, IEEE, 2024.
    [25] T. Chen, Y. Sui, X. Chen, A. Zhang, and Z. Wang, “A unified lottery ticket hypothesis for graph neural networks,” in International conference on machine learning, pp. 1695– 1706, PMLR, 2021.
    [26] Y.-D. Sui, X. Wang, T. Chen, M. Wang, X.-N. He, and T.-S. Chua, “Inductive lottery ticket learning for graph neural networks,” Journal of Computer Science and Technology, vol. 39, no. 6, pp. 1223–1237, 2024.
    [27] Y. Yang, J. Qiu, M. Song, D. Tao, and X. Wang, “Distilling knowledge from graph convolutional networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7074–7083, 2020.
    [28] B. Yan, C. Wang, G. Guo, and Y. Lou, “Tinygnn: Learning efficient graph neural networks,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1848–1856, 2020.
    [29] S. Zhang, Y. Liu, Y. Sun, and N. Shah, “Graph-less neural networks: Teaching old mlps new tricks via distillation,” arXiv preprint arXiv:2110.08727, 2021.
    [30] L. Wu, H. Lin, Y. Huang, , and S. Z. Li, “Quantifying the knowledge in gnns for reliable distillation into mlps,” in International Conference on Machine Learning, PMLR, 2023.
    [31] Y. Chen, Y. Bian, X. Xiao, Y. Rong, T. Xu, and J. Huang, “On self-distilling graph neural network,” arXiv preprint arXiv:2011.02255, 2020.
    [32] L. Rampášek, M. Galkin, V. P. Dwivedi, A. T. Luu, G. Wolf, and D. Beaini, “Recipe for a general, powerful, scalable graph transformer,” Advances in Neural Information Processing Systems, vol. 35, pp. 14501–14515, 2022.
    [33] J. Chen, K. Gao, G. Li, and K. He, “Nagphormer: A tokenized graph transformer for node classification in large graphs,” arXiv preprint arXiv:2206.04910, 2022.
    [34] H. Shirzad, A. Velingker, B. Venkatachalam, D. J. Sutherland, and A. K. Sinop, “Exphormer: Sparse transformers for graphs,” in International Conference on Machine Learning, pp. 31613–31632, PMLR, 2023.
    [35] K. Kong, J. Chen, J. Kirchenbauer, R. Ni, C. B. Bruss, and T. Goldstein, “Goat: A global transformer on large-scale graphs,” in International Conference on Machine Learning, pp. 17375–17390, PMLR, 2023.
    [36] Q. Wu, W. Zhao, Z. Li, D. P. Wipf, and J. Yan, “Nodeformer: A scalable graph structure learning transformer for node classification,” Advances in Neural Information Processing Systems, vol. 35, pp. 27387–27401, 2022.
    [37] Q. Wu, W. Zhao, C. Yang, H. Zhang, F. Nie, H. Jiang, Y. Bian, and J. Yan, “Sgformer: Simplifying and empowering transformers for large-graph representations,” Advances in Neural Information Processing Systems, vol. 36, pp. 64753–64773, 2023.
    [38] C. Deng, Z. Yue, and Z. Zhang, “Polynormer: Polynomial-expressive graph transformer in linear time,” arXiv preprint arXiv:2403.01232, 2024.
    [39] K. Sharma, Y.-C. Lee, S. Nambi, A. Salian, S. Shah, S.-W. Kim, and S. Kumar, “A survey of graph neural networks for social recommender systems,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–34, 2024.
    [40] X.-M. Zhang, L. Liang, L. Liu, and M.-J. Tang, “Graph neural networks and their current applications in bioinformatics,” Frontiers in genetics, vol. 12, p. 690049, 2021.
    [41] H. Zhang, B. Wu, X. Yuan, S. Pan, H. Tong, and J. Pei, “Trustworthy graph neural networks: Aspects, methods, and trends,” Proceedings of the IEEE, vol. 112, no. 2, pp. 97–139, 2024.
    [42] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.
    [43] X. Jiang, Z. Tian, and K. Li, “A graph-based approach for missing sensor data imputation,” IEEE Sensors Journal, vol. 21, no. 20, pp. 23133–23144, 2021.
    [44] D. Adhikari, W. Jiang, J. Zhan, Z. He, D. B. Rawat, U. Aickelin, and H. A. Khorshidi, “A comprehensive survey on imputation of missing data in internet of things,” ACM Computing Surveys, vol. 55, no. 7, pp. 1–38, 2022.
    [45] Z. Lin, C. Li, Y. Miao, Y. Liu, and Y. Xu, “Pagraph: Scaling gnn training on large graphs via computation-aware caching,” in Proceedings of the 11th ACM Symposium on Cloud Computing, pp. 401–415, 2020.
    [46] D. Zheng, C. Ma, M. Wang, J. Zhou, Q. Su, X. Song, Q. Gan, Z. Zhang, and G. Karypis, “Distdgl: Distributed graph neural network training for billion-scale graphs,” in 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pp. 36–44, IEEE, 2020.
    [47] Z. Xue, Y. Yang, and R. Marculescu, “Sugar: Efficient subgraph-level training via resource-aware graph partitioning,” IEEE Transactions on Computers, vol. 72, no. 11, pp. 3167–3177, 2023.
    [48] W. Lu, Z. Guan, W. Zhao, and Y. Yang, “Adagmlp: Adaboosting gnn-to-mlp knowledge distillation,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2060–2071, 2024.
    [49] X. Han, T. Zhao, Y. Liu, X. Hu, and N. Shah, “MLPInit: Embarrassingly simple GNN training acceleration with MLP initialization,” in International Conference on Learning Representations, 2023.
    [50] G. Fang, X. Ma, M. Song, M. B. Mi, and X. Wang, “Depgraph: Towards any structural pruning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16091–16101, 2023.
    [51] J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-adaptive sparsity for the magnitudebased pruning,” arXiv preprint arXiv:2010.07611, 2020.
    [52] C. Shu, Y. Liu, J. Gao, Y. Zheng, and C. Shen, “Channel-wise knowledge distillation for dense prediction,” in ICCV, 2021.
    [53] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,” AI magazine, vol. 29, no. 3, pp. 93–93, 2008.
    [54] O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann, “Pitfalls of graph neural network evaluation,” arXiv preprint arXiv:1811.05868, 2018.
    [55] P. Mernyei and C. Cangea, “Wiki-cs: A wikipedia-based benchmark for graph neural networks,” arXiv preprint arXiv:2007.02901, 2020.
    [56] F. Monti, M. Bronstein, and X. Bresson, “Geometric matrix completion with recurrent multi-graph neural networks,” Advances in neural information processing systems, vol. 30, 2017.
    [57] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631, 2019.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE