簡易檢索 / 詳目顯示

研究生: 高百君
Kao, Bai-Jun
論文名稱: 運用漸進式自我學習方法於真實世界網路流量的殭屍網路偵測
A Gradual Self-Training Approach for Botnet Detection in Real-world Network Traffic
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導教授: 張志標
Chang, Jyh-Biau
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 41
中文關鍵詞: 殭屍網路偵測深度類神經網路網路流漸進式自我學習會話
外文關鍵詞: Botnet Detection, Deep neural network, NetFlow, Gradual Self-training, Session
相關次數: 點閱:123下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 殭屍網路對網絡安全構成重大威脅,造成廣泛破壞並促進各種惡意活動。 這些受感染的計算機使網路犯罪分子能夠進行分散式阻斷服務 (DDoS) 攻擊、垃圾郵件等惡意行為。本研究的重點是改善在現實環境中部署殭屍網路檢測模型時會遇到模型效能隨時間下降的模型偏移問題,造成此問題的原因則是由於真實世界的資料會隨著時間而偏移。我們採用漸進式自我學習技術並利用深度神經網絡 (DNN) 來應對實際網絡流量帶來的挑戰。特別是,我們開發了一種專門為現實世界的網絡流量定制的偽標籤方法。在本研究中,我們使用國立成功大學三個月的校園網絡流量進行實驗。我們比較了傳統的自我學習和漸進式自我學習方法。結果表明,漸進式自我學習方法結合提出的偽標籤方法在處理模型偏移方面優於傳統方法,提高了檢測模型在未知資料上的表現,在以一個星期的資料為訓練資料的實驗中,使模型效能進步了4.5%的F1-score,只與理想上限差了1%,比傳統方法高了兩倍的效能。總而言之,本研究提出了一種基於漸進式自我學習技術的方案,以改善真實網路流量背景下的模型偏移問題。

    Botnets pose a significant threat to network security, enabling cybercriminals to carry out distributed denial of service (DDoS) attacks, spam emails, and other malicious behaviors. However, deploying botnet detection models in real-world environments faces the challenge of model shift. This study addresses the issue of model shift by proposing a gradual self-training approach and leveraging deep neural networks (DNNs) to handle the complexities of real-world traffic. A specialized pseudo-labeling method tailored to real-world network traffic is developed. Experiments are conducted using three months of campus network traffic from National Cheng Kung University, labeled by BotCluster. The performance of traditional self-training is compared with the gradual self-training approach. The results demonstrate that the combination of gradual self-training and the proposed pseudo-labeling method outperforms traditional methods in handling model shift. The detection model improves its performance on unknown data by 4.5%, approaching the ideal upper limit. The efficiency is doubled compared to traditional methods. This study provides a solution based on the gradual self-training approach to address model shift in real-world traffic, offering insights for deploying detection models in real network environments. The scope and limitations of the study are also discussed.

    摘要 I Abstract II 誌謝 III Content IV Tables V Figures VI Chapter 1 : Introduction 1 Chapter 2 : Background and Related Works 4 2.1 Real-world network traffic botnet detection 4 2.1.1 BotCluster 4 2.1.2 Input Format 6 2.1.3 Machine Learning Method 7 2.2 Self-Training 8 2.3 Related Works 9 Chapter 3 : Methodology 13 3.1 Overview 13 3.2 Data Shift Problem 13 3.3 Gradual Self-Training Approach 15 3.3.1 Gradual Self-Training Workflow 16 3.3.2 Self-Training Algorithm 18 3.4 Improved Pseudo-Labeling Method 19 Chapter 4 : Implementation 24 4.1 Label Real-World Traffic Dataset 24 4.2 Data Preprocess 25 Chapter 5 : Experiments 28 5.1 Environment 29 5.2 Evaluation Criteria 29 5.3 Neural Network Architecture 30 5.4 Dataset 31 5.5 Results 31 5.5.1 Experiment 1 (Week-based) 31 5.5.2 Experiment 2 (Month-based) 34 Chapter 6 : Conclusion and Future Work 39 Chapter 7 : References 40

    [1] Wang, Wei, et al. "BotMark: Automated botnet detection with hybrid analysis of flow-based and graph-based traffic behaviors." Information Sciences 511 (2020): 284-296.
    [2] Xing, Ying, et al. "Survey on botnet detection techniques: Classification, methods, and evaluation." Mathematical Problems in Engineering 2021 (2021): 1-24.
    [3] Vinayakumar, R., et al. "A visualized botnet detection system based deep learning for the internet of things networks of smart cities." IEEE Transactions on Industry Applications 56.4 (2020): 4436-4456.
    [4] Nazemi Gelian, Mahsa, Hoda Mashayekhi, and Yoosof Mashayekhi. "A self‐learning stream classifier for flow‐based botnet detection." International Journal of Communication Systems 32.16 (2019): e4143.
    [5] Kumar, Ananya, Tengyu Ma, and Percy Liang. "Understanding self-training for gradual domain adaptation." International Conference on Machine Learning. PMLR, 2020.
    [6] Koza, Jan, Marek Krcál, and Martin Holena. "Two Semi-supervised Approaches to Malware Detection with Neural Networks." ITAT. 2020.
    [7] Shao-Hsuan Lo, Ce-Kuen Shieh and Jyh-Biau Chang. "A Neural Network-Based Self-Training Framework for Botnet Detection." 2022.
    [8] Wang, Chun-Yu, et al. "BotCluster: A session-based P2P botnet clustering system on NetFlow." Computer Networks 145(2018): 175-189.
    [9] Delplace, Antoine, Sheryl Hermoso, and Kristofer Anandita. "Cyber attack detection thanks to machine learning algorithms." arXiv preprint arXiv:2001.06309 (2020).
    [10] Yang, Chao-Tung, et al. "Netflow monitoring and cyberattack detection using deep learning with ceph." IEEE Access 8 (2020): 7842-7850.
    [11] David YAROWSKY, "Unsupervised word sense disambiguation rivaling supervised methods." 33rd annual meeting of the association for computational linguistics, 1995. p.189-196.
    [12] Hye-Woo Lee, Noo-ri Kim, and Jee-Hyong Lee, "Deep Neural Network Self-training Based on Unsupervised Learning and Dropout." International Journal of Fuzzy Logic and Intelligent Systems, vol. 2017, p.1-10, 2017.
    [13] https://en.wikipedia.org/wiki/Softmax_function
    [14] https://www.virustotal.com
    [15] https://iq.opengenus.org/relu-activation/
    [16] He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.
    [17] Diederik P. Kingma and Jimmy Ba. "Adam: A method for stochastic optimization." ICLR. 2015
    [18] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.

    無法下載圖示 校內:2028-08-15公開
    校外:2028-08-15公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE