| 研究生: |
羅紹瑄 Lo, Shao-Hsuan |
|---|---|
| 論文名稱: |
一個應用於殭屍網路偵測之類神經網路自我學習架構 A Neural Network-Based Self-Training Framework for Botnet Detection |
| 指導教授: |
謝錫堃
Shieh, Ce-Kuen |
| 共同指導教授: |
張志標
Chang, Jyh-Biau |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 40 |
| 中文關鍵詞: | 殭屍網路偵測 、深度類神經網路 、自我學習 、網路流 、會話 |
| 外文關鍵詞: | Botnet Detection, Deep neural network, Self-Training, NetFlow, Session |
| 相關次數: | 點閱:49 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在殭屍網路偵測中,網路攻擊者時常變更攻擊策略並找尋新方法攻擊網際網路。大多數殭屍網路偵測研究主要是利用監督式學習像是機器學習演算法及類神經網路模型,這類監督式學習方法需要大量的已標註資料才能訓練出全面性表現的模型。然而在現實情況,想蒐集最新或變種的殭屍網路行為是非常困難的,因此,我們提出一個自我學習架構利用未標註資料去改善上述的問題。自我學習的方法是透過在標註資料下訓練的模型對未標註資料進行虛擬標註,再透過擴增的訓練資料反覆訓練,透過此方法,模型可以從未標註資料中獲得更多資訊。在我們提出的自我學習架構中,我們也提出了自我學習需要的虛擬標註方法。我們的虛擬標註方法透過判斷未標註資料的各類機率預測,將高信心的未標註資料加入擴增的訓練資料,因此適合大部分的類神經網路分類器。從實驗結果中,我們的自我學習架構在少量的已標註資料情形可以進步8.9%的F1值與5.5%的準確率,對於變種及從未看過的殭屍網路能有18.7%的F1值與9.1%的準確率進步。在訓練30%已標註資料中,我們最終的F1值在二分類任務達到99.0%的F1值及99.0%的準確率,在多分類任務達到95.8%的F1值及97.0%的準確率。
In botnet detection, threat actors constantly change their attack strategies and find new ways to attack networks. Most research in botnet detection depends on supervised learning approaches, where machine learning algorithms and neural-network-based models are greatly implemented. To train a model with high performance, supervised learning requires a huge amount of clean labeled data. However, it is perplexing and impractical to collect the latest botnets or variants immediately. Therefore, we propose a self-training framework to utilize the unlabeled data in order to solve the issue. Self-training is a method where a trained classifier is able to pseudo-label the unlabeled data while at the same time, enabling the model to repeatedly train itself by the data it labeled. Throughout the process, new insights are learned from the unlabeled data by the classifier. We also propose a pseudo-labeling method in our self-training framework. Our method is based on judging the probability distribution of each class in the unlabeled data, which is suitable for most neural network classifiers. The experiments demonstrate that our framework can improve f1-score by 8.9% and accuracy by 5.5% compared with the base model without self-training. In detecting unseen or variant botnets, our framework is able to improve the f1-score by 18.7% and accuracy by 9.1% even though the model lack labels. With merely 30% of the labeled data, we achieve 99.0% f1-score with 99.0% accuracy in binary class detection, and 95.8% f1-score with 97.0% accuracy in multiclass detection.
[1] C.-Y. Wang, C.-L. Ou, Y.-E. Zhang, F.-M. Cho, J.-B. Chang, and C.-K. Shieh, "BotCluster: A Session-based P2P Botnet Clustering System on NetFlow," Submitted to Computer Networks, 2018.
[2] Hye-Woo Lee1, Noo-ri Kim2, and Jee-Hyong Lee, " Deep Neural Network Self-training Based on Unsupervised Learning and Dropout" International Journal of Fuzzy Logic and Intelligent Systems, vol. 2017, pp. 1-10, 2017.
[3] Samaneh Mahdavifar, Dima Alhadidi, and Ali. A. Ghorbani, " Effective and Efficient Hybrid Android Malware Classification Using Pseudo-Label Stacked Auto-Encoder" International Journal of Network and Systems Management,
[4] Yang Yu, Jun Long, and Zhiping Cai, " Network Intrusion Detection through Stacking Dilated Convolutional Autoencoders" in Information Networking (ICOIN), 2017 International Conference on, 2017, pp. 712-717: IEEE.
[5] M. Alauthaman, N. Aslam, L. Zhang, R. Alasem, and M. A. Hossain, "A self-learning stream classifier for flow-based botnet detection" Neural Comput Appl, vol. 29, no. 11, pp. 991-1004, 2018.
[6] Abien Fred M. Agarap, " Deep Learning using Rectified Linear Units (ReLU)" Neural and Evolutionary Computing
[7] K. He, X. Zhang, S. Ren, J. Sun, " Deep Learning using Rectified Linear Units (ReLU)" International Conference of Computer Vision
[8] Diederik P. Kingma, Jimmy Lei Ba, " ADAM: A Method for Stochastic Optimization" Machine Learning
[9] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, " Dropout: A Simple Way to Prevent Neural Networks from Overfitting" Journal of Machine Learning Research
[10] https://www.f5.com/labs/articles/education/banking-trojans-a-reference-guide-to-the-malware-family-tree
[11] Wazir Zada Khan, Muhammad Khurram Khan, Senior Member, IEEE, Fahad T. Bin Muhaya, Mohammed Y. Aalsalem, and Han-Chieh Chao, Senior Member, IEEE, " A Comprehensive Study of Email Spam Botnet Detection " Journal of IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 17, NO. 4, FOURTH QUARTER 2015
[12] Q. Xie, M. Luong, E. Hovy, Q. Le, " Self-training with Noisy Student improves ImageNet classification " Computer Vision and Pattern Recognition