簡易檢索 / 詳目顯示

研究生: 葉家宏
Yap, Jia-Hong
論文名稱: 透過資料清理提升 P2P 殭屍網路偵測之深度學習效能
Using Data Cleansing to Promote Performance of Deep Learning for P2P Botnet Detection
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導: 張志標
Chang, Jyh-Biau
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 41
中文關鍵詞: 點對點殭屍網路類神經網路網路流大數據深度學習數據清理
外文關鍵詞: P2P botnet, Deep neural network, NetFlow, data cleansing, TensorFlow, MapReduce
相關次數: 點閱:134下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在我們先前研究中,開發了一個基於使用會話方式進行 P2P 殭屍網絡檢測系統──BotCluster ,用於通過使用無監督式機器學習在 NetFlow 上偵測 P2P 殭屍網絡。由於 BotCluser 是一個批次處理方式進行偵測,需要一天以上的時間來積累足夠的數據才能開心進行完善的群聚分析。為了達到即時的檢測 P2P 殭屍網絡,我們正在嘗試開發使用深度學習方式進行偵測,該深度學習模型訓練於使用基於實際流量日誌產生的會話方式的訓練集。由於資料集品質不佳,導致深度學習無法有效的進行學習其表現也變得不佳。
    在本文中,我們提出一種數據清理方法,以提高資料集的品質提升深度學
    習模型的效能。根據 2017 年 3 月,4 月,5 月的成功大學 NetFLow 數據集,我們的深度學習模型可以提升至如下:誤報率(FPR)從 30%下降至 1.39%,F1 score 可以從 88%提升至 94%,精準度從 85%提高至 95%。

    In our previous research, a P2P botnet detection system (BotCluster) using Session-based approach has been developed for discovering the P2P botnet on NetFlow by using unsupervised machine learning. However, BotCluster is a batch processing system, it takes one or several days to accumulate enough data for revealing stealth malicious communication. For detecting P2P botnets instantly, our on-going research is trying to develop a deep learning model trained by session-based dataset of real traffic logs. We try to directly use the output of BotCluster as the training dataset for our deep learning model. However, it shows bad quality and causes a poor performance to our deep learning model. In this research, we propose a data cleansing approach to improve the quality of the dataset and promote the performance of our model. According to the dataset of the 2017 NCKU NetFlow (March, April and May), the performance of our deep learning model can be promoted as follows: the False Positive Rate (FPR)can be reduced from 30% to 1.39%, the F1 score can be promoted from 88% to 94%, and the Precision can be increased from 85% to 95%.

    Chapter 1 : Introduction 1 Chapter 2 : Background & Related Works 4 2.1 BotCluster 4 2.1.1 Session Extraction & Filtering 4 2.1.2 Grouping 5 2.2 Related Works 5 Chapter 3 : Methodology 11 3.1 Overview 11 3.2 Data Preprocessing 12 3.2.1 Session Restoration 12 3.3 Data cleansing 13 3.3.1 Real Traffic Problem 13 3.3.2 Problem Description 13 3.3.3 The method of Data Cleansing 15 3.3.4 Goodness Ratio Measurement 17 3.3.5 Ambiguous session removal 18 3.3.6 Session`s class Rectification 18 3.4 Balancing and Mixing 19 3.5 Model Training 19 Chapter 4 : Implementation 20 4.1 Overview 20 4.2 Session Restoration 21 4.3 Goodness Ratio Measurement 23 4.4 Ambiguity Elimination and Label Correction 24 4.4.1 Ambiguity Elimination 24 4.4.2 Label Correction 25 4.5 Deep learning 25 Chapter 5 : Experiments 27 5.1 Experimental Environment 27 5.2 Metrics 27 5.3 Dataset Summary 28 5.4 Results 30 5.4.1 A dataset without data cleansing vs within data cleansing 30 5.5 A simple hyper-parameter tuning 32 5.5.1 The experiments of epoch 33 5.5.2 The hidden layers of experiments 34 5.5.3 The neurons of experiments 35 5.6 Runtime 36 Chapter 6 : Conclusion 39 References: 40

    [1] C.-Y. Wang, C.-L. Ou, Y.-E. Zhang, F.-M. Cho, J.-B. Chang, and C.-K. Shieh, "BotCluster: A Session-based P2P Botnet Clustering System on NetFlow," Computer Networks, Volume 145, 9 November 2018, pp. 175-189.
    [2] Apache Hadoop Map Reduce https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
    [3] Meisam Eslahi, H. Hashim, N. M. Tahir, "An enfficient false alaram reduction approach in HTTP-based botnet detection," IEEE Symposium on Computers & Informatics on 2013, pp. 201 - 2015.
    [4] Y. Zeng, H. GU, W. Wei, Y. Guo, "Deep -Full-Range: A Deep Learning Based Network Encrypted Traffic Classification and Intrusion Detection Framework," IEEE
    Access (Volume: 7), 01 April 2019, pp. 45182 - 45190.
    [5] A. S. Shekhawat, F. D. Troia, M. Stamp, "Feature analysis of encrypted malicious traffic," Expert Systems with Applications, Volume 125, 1 July 2019, pp. 130-141.
    [6] W. Wu, J. Alvarez, C. Liu, H.-M. Sun, "Bot detection using unsupervised machine learning," Microsystem Technologies, January 2018, Vol. 24 issue 1, pp. 209-217
    [7] S.-C. Chen, Y.-R. Chen, W.-G. Tzeng, "Effective Botnet Detection Through Neural Networks on Convolutional Features," 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering, pp. 372 - 378.
    [8] J. V. Roosmalen, H. Vranken, M. v. Eekelen, " Applying deep learning on packet flows for botnet detection," SAC'18 Proceedings of the 33rd Annual ACM Symposium on Applied Computing, April 09 -13 , 2018, pp. 1629 - 1636.
    [9] A. Otmane, S. Gunter, A. Mohammad, " Data Quality Measures and Data Cleansing for Research Information System," Journal of Digital Information Management, Vol 16, (Feb 2018), pp. 12 - 21.
    [10] N. Polyzotis, S. Roy, S. E. Whang, M. Zinkevich, "Data Management Challenges in Production Machine Learning, " SIGMOD'17 Proceedings of the 2017 ACM International Conference on Management of Data, pp 1723 - 1726.
    [11] P. Li, X. Rao, J. Blasé, Y. Zhang, X. Chu, C. Zhang, "Clean Ml: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]," 2019, arXiv:1904.09483
    [12] National Center for High-performance Computing (NCHC), Big data platform Braavos https://www.nchc.org.tw/posts/EeKYi3R8sU/%E5%A4%A7%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90%E5%B9%B3%E5%8F%B0(BDS)
    [13] TensorFlow https://www.tensorflow.org/
    [14] National Center for High-performance Computing (NCHC), Tawain Computing Cloud https://www.nchc.org.tw/posts/Mr4kt9kyKc/taiwania2

    無法下載圖示
    校外:不公開
    電子論文及紙本論文均尚未授權公開
    QR CODE