| 研究生: |
吳宥澄 Wu, You-Cheng |
|---|---|
| 論文名稱: |
結合條件生成對抗網路與對比學習進行過採樣之異常檢測法 Generative Oversampling with Conditional Generative Adversarial Network and Contrastive Learning for Anomaly Detection |
| 指導教授: |
李昇暾
Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 異常偵測 、不平衡分類 、對比學習 、生成對抗網路 、過採樣 |
| 外文關鍵詞: | Anomaly detection, imbalanced classification, contrastive learning, Generative Adversarial Network, oversampling |
| 相關次數: | 點閱:71 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
異常偵測為數據分析領域中一個重要的研究議題,其目的在於辨別偏離正常情況的異常數據點或樣態,幾乎各行各業都有使用異常偵測技術的需求,包括金融、生產製造、醫療保健以及交通等。
傳統的異常偵測通常仰賴專家判讀,在資料量不斷增長的今日是高成本且低效率的做法。近期由於技術創新,機器學習被廣泛應用至異常偵測,然而一般而言,正常樣本數會遠多於異常樣本數,導致分類器的表現可能會受到不平衡資料的影響,使偵測異常的能力受損。
為克服異常偵測資料集普遍存在之不平衡特性,本研究試以對比學習 (Contrastive learning) 與條件生成對抗網路 (Conditional Generative Adversarial Network) 進行特徵學習與過採樣 (Oversampling),再利用分類器判別異常,此方法將資料轉換為更容易區分之特徵,並改善分類器於訓練時過於關注正常類別樣本而忽略異常類別樣本之情形,以提升檢測異常之性能。
Anomaly detection is a critical research topic in data analysis that aims to spot unusual data points or patterns that differ from normal behavior. Almost every industry requires anomaly detection technology, including finance, healthcare, manufacturing, transportation, and many others.
Conventional anomaly detection methods often rely on manual interpretation, which has become costly and ineffective due to the increasing data volume. Thanks to technological innovations, machine learning-based systems have been generally utilized to detect anomalies. However, in most cases, the number of normal instances is much larger than the number of anomalous instances, so that the performance of a classifier may suffer from the imbalanced dataset, leading to impaired ability to detect anomalies.
To address the problem of imbalanced datasets in anomaly detection, this research utilizes contrastive learning method and conditional Generative Adversarial Network (cGAN) for representation learning and oversampling. The attached classifier then performs the anomaly detection task. This method enhances the performance of anomaly detection by making features more separable and reducing the dominance of the normal class during training.
Ali, A., Shamsuddin, S. M., & Ralescu, A. L. (2013). Classification with class imbalance problem. Int. J. Advance Soft Compu. Appl, 5(3), 176–204.
Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). Face aging with conditional generative adversarial networks. In 2017 IEEE International Conference on Image Processing (ICIP) (pp. 2089-2093). IEEE.
Barua, S., Islam, M. M., Yao, X., & Murase, K. (2014). MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning. IEEE Transactions on Knowledge and Data Engineering, 26(2), 405–425.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.
Cho, H., Seol, J., & Lee, S. G. (2021). Masked contrastive learning for anomaly detection. arXiv preprint arXiv:2105.08793.
Douzas, G., & Bacao, F. (2018a). Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications, 91, 464–471.
Douzas, G., Bacao, F., & Last, F. (2018b). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1–20.
Drummond, C., & Holte, R. C. (2003). C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. Workshop on Learning from Imbalanced Datasets II, (Vol. 11, pp. 1-8).
Engelmann, J., & Lessmann, S. (2021). Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. Expert Systems with Applications, 174, 114582.
Gao, T., Yao, X., & Chen, D. (2021). Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
Gayathri, R. G., Sajjanhar, A., Xiang, Y., & Ma, X. (2021). Anomaly Detection for Scenario-based Insider Activities using CGAN Augmented Data. In 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (pp. 718-725). IEEE.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.
Haibo He, & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (pp. 1322-1328). IEEE.
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In International conference on intelligent computing (pp. 878-887).
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729-9738).
Hilal, W., Gadsden, S. A., & Yawney, J. (2022). Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications, 193, 116429.
Hojjati, H., Ho, T. K. K., & Armanfard, N. (2022). Self-supervised anomaly detection: A survey and outlook. arXiv preprint arXiv:2205.05173.
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1125-1134).
Le-Khac, P. H., Healy, G., & Smeaton, A. F. (2020). Contrastive Representation Learning: A Framework and Review. IEEE Access, 8, 193907–193934.
Lin, W. C., Tsai, C. F., Hu, Y. H., & Jhang, J. S. (2017). Clustering-based undersampling in class-imbalanced data. Information Sciences, 409–410, 17–26.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. ArXiv Preprint ArXiv:1411.1784.
Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. In 2020 11th International Conference on Information and Communication Systems (ICICS) (pp. 243-248). IEEE.
Oord, A. van den, Li, Y., & Vinyals, O. (2018). Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748.
Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2021). Deep Learning for Anomaly Detection. ACM Computing Surveys, 54(2), 1–38.
Ramponi, G., Protopapas, P., Brambilla, M., & Janssen, R. (2018). T-cgan: Conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. ArXiv Preprint ArXiv:1811.08295.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. In International conference on machine learning (pp. 1060-1069). PMLR.
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3982-3992).
Reiss, T., & Hoshen, Y. (2022). Mean-Shifted Contrastive Loss for Anomaly Detection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 2, pp. 2155-2162).
Tack, J., Mo, S., Jeong, J., & Shin, J. (2020). CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances. Advances in neural information processing systems, 33, 11839-11852.
Ullah, I., & Mahmoud, Q. H. (2021). A Framework for Anomaly Detection in IoT Networks Using Conditional Generative Adversarial Networks. IEEE Access, 9, 165907–165931.
Vu, L., Bui, C. T., & Nguyen, Q. U. (2017). A Deep Learning Based Method for Handling Imbalanced Problem in Network Traffic Classification. In Proceedings of the 8th International Symposium on Information and Communication Technology (pp. 333-339).
Weiss, G. M., McCarthy, K., & Zabar, B. (2007). Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs?. Dmin, 7(35-41), 24.
Zhang, C., Zhou, Y., Chen, Y., Deng, Y., Wang, X., Dong, L., & Wei, H. (2018). Over-sampling algorithm based on VAE in imbalanced classification. In Cloud Computing–CLOUD 2018: 11th International Conference (pp. 334-344). Springer International Publishing.
校內:2029-01-18公開