簡易檢索 / 詳目顯示

研究生: 潘祈睿
Pan, Chi-Jui
論文名稱: 利用混合式特徵選擇方法簡化惡意流量分類複雜度
Simplifying Malicious Traffic Classification based on Hybrid Feature Selection
指導教授: 李忠憲
Li, Jung-Shian
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 54
中文關鍵詞: 入侵偵測機器學習特徵選擇
外文關鍵詞: Intrusion Detection, Machine Learning, Feature Selection
相關次數: 點閱:121下載:22
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網路攻擊手段日趨複雜,加上近年多走向自動化,具變種數量多、影響範圍廣、散播速度快及攻擊效率高等特性,除了防火牆的建置外,入侵偵測系統更是扮演網路防禦的重要角色,透過網路型入侵偵測系統可以分析監聽到的網路流量、封包,抽取未知流量行為模式,過濾出異常行為,在網路攻擊發生前發出預警,降低財產損失。近年來機器學習在入侵偵測相關研究上取得突破性發展,然而常面臨誤報率過高、訓練時間成本高、資料集過舊或缺乏攻擊多樣性等問題,本研究使用CICIDS2017資料集,透過特徵選擇中過濾與包裝的混合方法結合隨機森林分類器,找出具關鍵影響力的特徵,並提高預測準確率同時減少訓練時間。我們也比較決策樹、簡單貝氏分類器、多層感知器的分類效能,最後套用於入侵檢測廣泛被使用的NSL-KDD資料集上與相關研究比較。實驗結果顯示,本研究模型得以取得99.69%的預測準確率,相較於未進行混合式特徵選擇處理之原始資料,節省32.13%的時間消耗,並能夠保留原始特徵訊息,對關鍵影響力特徵進行探討,最後於相關研究中取得較佳的效能表現。

    Cyberattacks have become more complicated, and there are more variants of cyberattacks recently. In addition to the firewall, the intrusion detection system (IDS) plays an important role in network security. The network-based IDS (NIDS) can monitor the network traffic and packets, extract unknown traffic behavior, and filter out abnormal behaviors. The NIDS can give an early warning before a cyberattack occurs to reduce property damage.
    In recent years, machine learning has achieved breakthrough development in intrusion detection research. However, it often faces problems such as high false alarm rate, high training time cost, using old datasets that lack of attack diversity. In this study, we use CICIDS2017 as our datasets and we proposed a filter-wrapper hybrid feature selection method (FWHFS) using random forest classifier to identify the important features that achieve higher predict accuracy and lower training time consumption. We also compare it with those of decision tree, Naïve Bayes and MLP classifier. Finally, our model is compared with related research using NSL-KDD dataset that is widely used in IDS studies.
    The experimental results show that our model can achieve a prediction accuracy rate of 99.69%, which saves 32.13% of the time consumption compared with the original data without using FWHFS, and can preserve the original feature information and carry out the important features. And demonstrating improvements over related research.

    摘要……I 誌謝……XIII 目錄……XIV 表目錄……XVI 圖目錄……XVII 一、緒論……1 1.1 研究背景……1 1.2 研究動機……3 1.3 貢獻……4 1.4 全文架構……5 二、相關研究……6 2.1 網路攻擊概述…………6 2.2 入侵偵測系統…………8 2.3 特徵選擇…………10 2.4 CICIDS與NSLKDD資料集……13 三、系統架構……16 3.1 資料預處理(DATA PREPROCESSING)……18 3.1.1獨熱編碼(One-Hot Encode)……19 3.1.2 特徵縮放(Feature Scaling)……20 3.2 特徵選擇(FEATURE SELECTION)……21 3.2.1 低變異數特徵移除(Remove Low Variance Features)……22 3.2.2 相關性特徵選擇(Correlation-based Feature Selection, CFS)……23 3.2.3 過濾-包裝混合方法(Filter-Wrapper Hybrid Feature Selection, FWHFS)……26 3.3 機器學習演算法(MACHINE LEARNING ALGORITHM)……34 3.3.1 決策樹(Decision Tree, DT)……34 3.3.2 隨機森林(Random Forest, RF)……36 3.3.3 多層感知器(Multilayer Perceptron, MLP)……38 3.3.4單純貝氏分類器(Naïve Bayes Classifier, NBC)……40 四、實驗結果……42 4.1 系統環境與相關評估指標說明……43 4.2 使用不同機器學習演算法的效能比較……45 4.3使用特徵選擇後的分類結果與時間成本……47 4.4 相關論文效能比較……49 五、結論與未來展望……50 參考資料……51

    [1] R. Moskovitch, S. Pluderman and I. Gus, "Host based intrusion detection using machine learning.," in Intelligence and Security Informatics, 2007 IEEE, New Brunswick, 2007.
    [2] M. Anbar , R. Abdullah , I. H. Hasbullah , Y. W. Chong and O. E. Elejla, "Comparative performance analysis of classification algorithms for intrusion detection system.," in Privacy, Security and Trust (PST), 14th Annual Conference on. IEEE., 2016.
    [3] S. Choudhury and A. Bhowal, "Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection.," in Smart technologies and management for computing, communication, controls, energy and materials (ICSTM), 2015 International conference on. IEEE., Chennai, 2015.
    [4] N. Shone, T. N. Ngoc, V. D. Phai and Q. Shi, "A deep learning approach to network intrusion detection," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 41-50, 2018.
    [5] H. Yuan, S. S. Tseng, W. Gangshan and Z. Fuyan, "A two-phase feature selection method using both filter and wrapper.," in Systems, Man, and Cybernetics, 1999. IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on.IEEE., 1999.
    [6] I. Sharafaldin, A. H. Lashkari and A. A. Ghorbani, "Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization," in 4th International Conference on Information Systems Security and Privacy (ICISSP), Purtogal, 2018.
    [7] C. Kolias, G. Kambourakis and A. Stavrou, "DDoS in the IoT: Mirai and other botnets," Computer, vol. 50, no. 7, pp. 80-84, 2017.
    [8] A. Shoemaker, "How to Identify a Mirai-Style DDoS Attack," Imperva Incapsula, 10 April 2017. [Online]. Available: https://www.incapsula.com/blog/how-to-identify-a-mirai-style-ddos-attack.html.
    [9] C. Simmons, C. Ellis, S. Shiva, D. Dasgupta and Q. Wu, "AVOIDIT: A cyber attack taxonomy," in Proceedings of the 9th Annual Symposium on Information Assurance(ASIA'14), 2009.
    [10] S. Latha and S. J. Prakash, "A survey on network attacks and Intrusion detection systems.," in Advanced Computing and Communication Systems (ICACCS), 2017 4th International Conference on, Coimbatore, 2017.
    [11] P. Amini, M. A. Araghizadeh and R. Azmi, "A survey on Botnet: Classification, detection and defense," in 2015 International Electronics Symposium (IES), Surabaya, 2015.
    [12] N. Hoque, D. K. Bhattacharyya and J. K. Kalita, "Botnet in DDoS attacks: trends and challenges," IEEE Communications Surveys & Tutorials, vol. 17, no. 4, pp. 2242 - 2270, 2015.
    [13] S. Gujrathi, "Heartbleed bug: Anopenssl heartbeat vulnerability.," International Journal of Computer Science and Engine ter Science and Engineering, vol. 2, no. 5, pp. 61-64, 2014.
    [14] D. M. Kienzle and M. C. Elder , "Recent worms: a survey and trends.," in Proceedings of the 2003 ACM workshop on Rapid malcode. ACM., 2003.
    [15] B. Rajesh, Y. J. Reddy and B. D. K. Reddy, "A Survey Paper on Malicious Computer Worms.," International Journal of Advanced Research in Computer Science and Technology 3., 2015.
    [16] S. Mukherjee and N. Sharma, "Intrusion detection using naive Bayes classifier with feature reduction.," Procedia Technology, pp. 119-128 , 2012.
    [17] D. E. Denning, "An Intrusion-Detection Model," IEEE Transactions on Software Engineering, Vols. SE-13, no. 2, pp. 222 - 232, 1987.
    [18] O. Koucham, T. Rachidi and N. Assem, "Host intrusion detection using system call argument-based clustering combined with Bayesian classification.," in SAI Intelligent Systems Conference (IntelliSys), 2015. IEEE., London, 2015.
    [19] B. C. Jiang, H. I. Liu, N. Y. Chung and S. J. Li, "Novel intrusion prediction mechanism based on honeypot log similarity.," International Journal of Network Management, vol. 26, no. 3, pp. 156-175, 2016.
    [20] C. Yin, Y. Zhu and J. Fei, "A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks," IEEE Access, vol. 5, pp. 21954 - 21961, 2017.
    [21] D. E. Kim and M. Gofman, "Comparison of shallow and deep neural networks for network intrusion detection.," in Computing and Communication Workshop and Conference (CCWC), 2018 IEEE 8th Annual. IEEE, 2018.
    [22] A. Osareh and B. Shadgar , "Intrusion detection in computer networks based on machine learning algorithms.," International Journal of Computer Science and Network Security, vol. 8, no. 11, pp. 15-23, November 2008.
    [23] J. Esmaily, R. Moradinezhad and J. Ghasemi, "Intrusion detection system based on Multi-Layer Perceptron Neural Networks and Decision Tree.," in Information and Knowledge Technology (IKT), 2015 7th Conference on. IEEE,., 2015.
    [24] G. Kim, S. Lee and S. Kim, "A novel hybrid intrusion detection method integrating anomaly detection with misuse detection.," Expert Systems with Applications, vol. 41, no. 4, pp. 1690-1700, 2014.
    [25] S. Lakhina, S. Joseph and B. Verma, "Feature reduction using principal component analysis for effective anomaly–based intrusion detection on NSL-KDD.," International Journal of Engineering Science and Technology, vol. 2, no. 6, pp. 1790-1799, 2010.
    [26] I. Guyon and E. André, "An introduction to variable and feature selection.," Journal of machine learning research, vol. 3, pp. 1157-1182, 2003.
    [27] S. Sheen and R. Rajesh, "Network intrusion detection using feature selection and Decision tree classifier.," in TENCON 2008 - 2008 IEEE Region 10 Conference. IEEE., 2008.
    [28] C. Yin, L. Ma, L. Feng, Z. Yin and J. Wang, "A Feature Selection Algorithm towards Efficient Intrusion Detection.," International journal of multimedia and ubiquitous engineering., vol. 10, no. 11, pp. 253-264, 2015.
    [29] F. Amiri, M. R. Yousef, C. Lucas, A. Shakery and N. Yazdani, "Mutual information-based feature selection for intrusion detection systems.," Journal of Network and Computer Applications, vol. 34, no. 4, pp. 1184-1199, 2011.
    [30] O. A. Alomari, A. T. Khader, M. A. Albetar and L. M. Abualigah, "MRMR BA: A hybrid gene selection algorithm for cancer classification.," J Theoretical Appl Inf Techn, vol. 95, no. 12, 2017.
    [31] H. H. Huang, W. C. Hsieh and D. M. Lu, "Hybrid feature selection by combining filters and wrappers.," Expert Systems with Applications., vol. 38, no. 7, pp. 8144-8150, 2011.
    [32] H. Kawakubo and H. Yoshida, "Rapid feature selection based on random forests for high-dimensional data.," in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA).The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2012.
    [33] A. Gharib, I. Sharafaldin and A. H. Lashkari, "An Evaluation Framework for Intrusion Detection Dataset," in Information Science and Security (ICISS), 2016 International Conference on. IEEE., Pattaya, 2016.
    [34] I. Sharafaldin, A. Gharib and A. H. Lashkari, "Towards a Reliable Intrusion Detection Benchmark Dataset.," Software Networking., pp. 177-200, 2017.
    [35] "Patator," [Online]. Available: https://github.com/lanjelot/patator.
    [36] "Hulk DoS tool," [Online]. Available: https://github.com/grafov/hulk.
    [37] "heartleech," [Online]. Available: https://github.com/robertdavidgraham/heartleech.
    [38] "Damn Vulnerable Web Application (DVWA)," [Online]. Available: http://www.dvwa.co.uk/.
    [39] "Metasploit," [Online]. Available: https://github.com/rapid7/metasploit-framework.
    [40] "Ares," [Online]. Available: https://github.com/sweetsoftware/Ares.
    [41] "LOIC," [Online]. Available: https://github.com/NewEraCracker/LOIC/.
    [42] M. Tavallaee, E. Bagheri, W. Lu and A. A. Ghorbani, "A detailed analysis of the KDD CUP 99 data set," in Computational Intelligence for Security and Defense Applications, 2009. CISDA 2009. IEEE Symposium on, Ottawa, 2009.
    [43] J. Brownlee, "How to Prepare Data For Machine Learning," 25 December 2013. [Online]. Available: https://machinelearningmastery.com/how-to-prepare-data-for-machine-learning/.
    [44] B. K. Singh, K. Verma and A. S. Thoke, "Investigations on impact of feature normalization techniques on classifier's performance in breast tumor classification.," International Journal of Computer Applications, vol. 116, no. 19, 2015.
    [45] M. A. Hall, Correlation-based feature selection for machine learning., 1999.
    [46] K. Kira and L. L. Rendell, "A practical approach to feature selection.," Machine Learning Proceedings., pp. 249-256, 1992.
    [47] I. Kononenko, "Estimating attributes: analysis and extensions of RELIEF.," in European conference on machine learning., Springer, Berlin, Heidelberg., 1994.
    [48] R. Kohavi and G. H. John , "Wrappers for feature subset selection.," Artificial intelligence, vol. 97, no. 1-2, pp. 273-324, 1997.
    [49] R. J. Quinlan, "Induction of decision trees.," Machine learning, vol. 1, no. 1, pp. 81-106, 1986.
    [50] L. Breiman, Classification and regression trees., Routledge., 2017.
    [51] L. Breiman, "Random forest.," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
    [52] M. W. Gardner and S. R. DORLING, "Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences.," Atmospheric environment., vol. 32, no. 14-15, pp. 2627-2636, 1998.
    [53] S. Ruder, "An overview of gradient descent optimization algorithms.," in arXiv preprint arXiv:1609.04747., 2016.
    [54] D. P. Kingma and J. L. Ba, "Adam: A method for stochastic optimization.," in arXiv preprint arXiv:1412.6980., 2014.
    [55] H. Zhang, "The optimality of naive Bayes.," in Proceedings of the 7th International Florida Artificial Intelligence Research Society Conference., 2004.
    [56] "scikit-learn," [Online]. Available: http://scikit-learn.org/stable/index.html.
    [57] "Keras," [Online]. Available: https://github.com/keras-team/keras.
    [58] D. J. Rumsey and D. Unger, U Can: statistics for dummies., John Wiley & Sons., 2015.
    [59] H. M. Anwer, M. Farouk and A. Abdel-Hamid, "A framework for efficient network anomaly intrusion detection with features selection," in Information and Communication Systems (ICICS), 2018 9th International Conference on. IEEE., 2018.
    [60] E. Hodo, X. Bellekens, A. Hamilton, C. Tachtatzis and R. Atkinson, "Shallow and deep networks intrusion detection system: A taxonomy and survey.," in arXiv preprint arXiv:1701.02145., 2017.
    [61] W. Buntine and A. Jakulin, "Feature selection for dimensionality reduction," Subspace, latent structure and feature selection, pp. 84-102, 2006.
    [62] M. A. Ambusaidi, X. He, P. Nanda and Z. Tan, "Building an intrusion detection system using a filter-based feature selection algorithm.," IEEE transactions on computers., vol. 65, no. 10, pp. 2986-2998, 2016.
    [63] Y. Chunyong, L. Ma, L. Feng, Z. Yin and J. Wang, "A Feature Selection Algorithm towards Efficient Intrusion Detection.," International journal of multimedia and ubiquitous engineering., vol. 10, no. 11, pp. 253-264, 2015.

    下載圖示 校內:2020-09-01公開
    校外:2020-09-01公開
    QR CODE