| 研究生: |
呂其蓁 Lu, Chi-Chen |
|---|---|
| 論文名稱: |
GTR: 從圖與樹結構學習穩健特徵表示
:應用於表格資料的無監督抗噪異常偵測 GTR: Learning Better Graph and Tree Representations for Noise-Robust Unsupervised Anomaly Detection on Tabular Data |
| 指導教授: |
李政德
Li, Cheng-Te |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 人工智慧科技碩士學位學程 Graduate Program of Artificial Intelligence |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 68 |
| 中文關鍵詞: | 無監督學習 、異常偵測 、資料汙染 、圖神經網路 、偽標籤 、統一損失函數 |
| 外文關鍵詞: | Unsupervised Learning, Anomaly Detection, Graph Neural Network, Pseudo Labels, Unified Loss, Contaminated Data |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在無監督異常偵測任務中,模型往往預設訓練資料為純淨正常樣本,然而在實際應用場景中,訓練集常受異常樣本污染,特別是屬於中度汙染(即佔比不容忽視的異常樣本混雜於訓練資料中)之情形更為常見,例如問卷調查、電話訪談、或感測器監測過程中,可能早已存在異常但未被察覺。此類汙染會干擾模型對正常樣本分佈的學習,導致異常判斷能力下降。
為解決此挑戰,本研究提出一套多模組整合的異常偵測架構,結合結構資訊、樹模型統計特徵與偽標籤監督,以提升模型在中度污染訓練集下的辨識能力與泛化表現。
本方法包含五大模組:以圖神經網路為基礎的圖嵌入模組,用於捕捉樣本間的結構相似性;以Isolation Forest 為基礎的樹特徵模組,用於萃取路徑長度、分割閾值等異常訊號;偽標籤產生器則根據異常分數進行高可信度標註,提供弱監督訊號;分類器則整合上述多源特徵進行異常判斷;最後透過統一損失函數將各模組目標整合,並導入動態權重與熵正則項以平衡多任務訓練。
實驗結果顯示,所提出方法在多個中度汙染的異常偵測資料集上皆展現穩定且優異的表現,尤其在無需真實標籤的情況下,仍能有效區分異常樣本,具備良好泛化性與實務應用潛力。
In unsupervised anomaly detection tasks, models are typically built on the assumption that the training data consists entirely of clean normal samples. However, in real-world applications, it is common for the training set to be moderately contaminated—that is, containing a non-negligible portion of unlabeled anomalies. This scenario is often seen in data collection processes such as surveys, phone interviews, or sensor monitoring, where anomalous patterns may exist without being explicitly identified. Such contamination misleads the model into learning distorted representations, degrading its ability to detect true anomalies.
To address this challenge, we propose a modular and unified anomaly detection framework that integrates structural context, statistical tree-based features, and pseudo-supervised learning. Our approach is specifically designed to improve robustness and generalization performance under moderately contaminated training data.
The framework includes five key components: a graph encoder based on graph neural networks to capture structural similarity among instances; a tree-based module inspired by Isolation Forest to extract path lengths and split thresholds; a pseudo-label generator that produces high-confidence labels based on anomaly scores for weak supervision; a classifier that fuses multi-source signals to make anomaly predictions; and a unified loss function that incorporates dynamic weighting and entropy regularization to balance multi-objective optimization.
Experimental results on multiple moderately contaminated anomaly detection datasets demonstrate that the proposed method achieves robust and superior performance. Notably, it effectively distinguishes anomalies even without ground-truth labels, showing strong generalization ability and practical potential.
[1] S. Ak¸cay, A. Atapour-Abarghouei, and T. P. Breckon, “Ganomaly: Semisupervised anomaly detection via adversarial training,” in ACCV, 2018.
[2] J. An and S. Cho, “Variational autoencoder based anomaly detection using reconstruction probability,” in Special Lecture on IE, 2015, pp. 1–18.
[3] F. Angiulli and C. Pizzuti, “Fast outlier detection in high dimensional spaces,” in Proc. 6th European Conf. on Principles of Data Mining and Knowledge Discovery (PKDD). Springer, 2002, pp. 15–27.
[4] S. Bandyopadhyay, A. Mukherjee, and S. Ghosh, “Outlier detection in network data: A survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 2, pp. 635–654, 2022.
[5] G. Blanchard, G. Lee, and C. Scott, “Semi-supervised novelty detection,” in Journal of Machine Learning Research, 2010.
[6] A. P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, no. 7, pp. 1145–1159, 1997.
[7] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying densitybased local outliers,” ACM sigmod record, vol. 29, no. 2, pp. 93–104, 2000.
[8] R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,” arXiv preprint arXiv:1901.03407, 2019.
[9] K. Ding, J. Li, R. Bhanushali, and H. Liu, “Deep anomaly detection on attributed networks,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
[10] P. Goyal and E. Ferrara, “Deep isolation forest,” arXiv preprint arXiv:2006.09542, 2020.
[11] W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 1025–1035.
[12] J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, no. 1, pp. 29–36, 1982.
[13] A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in CVPR, 2018.
[14] T. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” ICLR, 2017.
[15] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” in Bayesian deep learning workshop (NIPS), 2016.
[16] C. Li, K. Sohn, J. Zhang, Q. Ke, C.-Y. Zhang, and H. Liu, “Cutpaste: Selfsupervised learning for anomaly detection and localization,” in CVPR, 2021.
[17] S. Li, Y. Lu, S. Jiu, H. Huang, G. Yang, and J. Yu, “Prototype-oriented hypergraph representation learning for anomaly detection in tabular data,” Information Processing & Management, vol. 62, no. 1, p. 103877, 2025.
[18] T. e. a. Li, “Density tree ensemble for scalable anomaly detection,” in ICDM, 2023.
[19] Z. Li, G. Pang, and C. Shen, “Minimum covariance matching for prototype-based anomaly detection,” in CVPR, 2022.
[20] Z. Li, Y. Zhao, X. Hu, N. Botta, C. Ionescu, and G. H. Chen, “ECOD: Unsupervised outlier detection using empirical cumulative distribution functions,” arXiv preprint arXiv:2201.00382, 2022.
[21] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008, pp. 413–422.
[22] J. Liu, B. Liu, and Y. Zhang, “Loss weighting with uncertainty and entropy for multi-task learning,” in NeurIPS Workshop, 2020.
[23] S. Liu, B. Zhuang, Q. Wu, C. Lin, and I. Reid, “Pt-mad: Probabilistic teaching for multi-class anomaly detection,” in CVPR, 2021.
[24] X. Liu, K. Ding, J. Zhou, J. Xu, and H. Liu, “Graph anomaly detection with multiple structural perspectives,” in AAAI, 2022.
[25] Y. e. a. Liu, “Self-organizing gan for unsupervised anomaly detection,” in AAAI, 2019.
[26] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in ICCV, 2017.
[27] S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang, “Adversarially regularized graph autoencoder,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), 2018, pp. 2606–2612.
[28] ——, “Adversarially regularized graph autoencoder,” in IJCAI, 2018.
[29] G. Pang, C. Shen, L. Cao, and A. v. d. Hengel, “Deep learning for anomaly detection: A review,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021.
[30] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in SIGMOD. ACM, 2000, pp. 427–438.
[31] L. Ruff, J. Kauffmann, R. A. Vandermeulen, G. Montavon, W. Samek, M. Kloft, and K.-R. Müller, “A unifying review of deep and shallow anomaly detection,” Proceedings of the IEEE, vol. 109, no. 5, pp. 756–795, 2021.
[32] L. Ruff, R. A. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, K.-R. Müller, and M. Kloft, “Deep one-class classification,” in ICML, 2018.
[33] M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proc. MLSDA 2014—2nd Workshop on Machine Learning for Sensory Data Analysis. ACM, 2014, p. 4.
[34] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural computation, vol. 13, no. 7, pp. 1443–1471, 2001.
[35] K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A. Kurakin, H. Zhang, and C. Raffel, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” in Advances in neural information processing systems, vol. 33, 2020, pp. 596–608.
[36] Y. e. a. Wang, “Contrastive clustering for unsupervised anomaly detection,” in ICLR, 2022.
[37] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.
[38] A. F. Zhang and [Co-authors], “Adaptive density outlier detection,” Journal/Conference, 2024, if known, add volume/pages/doi.
[39] Z. Zhang and o. Liu, “Unsupervised outlier detection with reinforced noise is criminator,” in ACM Conference/Journal, 2024, add DOI or pages if known.
[40] B. Zong, Q. Song, M. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” in ICLR, 2018.