| 研究生: |
曾鼎凱 Tseng, Ding-Kai |
|---|---|
| 論文名稱: |
應用XGBoost於NetFlow之惡意流量偵測之研究 A NetFlow Based Malicious Traffic Detection Research using XGBoost |
| 指導教授: |
楊竹星
Yang, Chu-Sing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 機器學習 、特徵 、NetFlow 、掃描攻擊 、資料取樣 |
| 外文關鍵詞: | machine learning, features, NetFlow, scanning attacks, data sampling |
| 相關次數: | 點閱:74 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
如今網路應用普及於人們的生活之中,創造便利的同時,透過網路的攻擊也愈發氾濫,不論是針對特定服務漏洞(Vulnerability)進行的入侵,又或是透過遠端存取(Remote Access)的服務嘗試接管主機,埠口掃描(Port Scanning)與暴力破解(Brute Force)等都是十分常見的初步手段。在網路使用如此頻繁的現在,如何更快速且精確的發現這些行為愈顯重要。
本研究主要藉由分析NetFlow所提供的網路流量資訊進行惡意行為的偵測,以監督式學習算法XGBoost為主要模型,透過學習普遍攻擊初期的網路活動,以期能夠提升偵測率並能夠更早的發現攻擊。有別於僅以資料集或是自行產生的資料為主要測試目標的研究,本研究實作了一個基於機器學習算法對真實網路流量進行週期性轉換與分析的系統,除了在偵測率上有所提升之外,在非well-known port的攻擊上,更能有效地進行偵測。
本研究主要貢獻在原NetFlow資料欄位的基礎上,新產生了四個特徵來增加判別準確度,並且在訓練資料集的取樣上,針對特徵多樣性與資料集平衡進行限制,這可以讓學習模型更能反映正式偵測時的情況。在動態的網路環境下,有較彈性的可供學習資料,這些資料數量大且極不平衡,適當的選擇與處理這些資料,並使其能在可接受時間內完成模型訓練,最適於當下真實環境且能更良好的對實時流量進行分辨。
Nowadays, Internet is indispensable in daily life but at the same time, network attacks grow rapidly due to this situation, whether it is the invasion through specific vulnerabilities or trying to take over the hosts by remote access services, Port Scanning and Brute Force are usually the most common means in early steps. Owing to the network is used so frequently, how to find these behaviors more quickly and accurately becomes more and more important.
This study mainly detects the malicious traffic behavior by analyzing the network traffic information provided by NetFlow. The supervised learning algorithm XGBoost is used to be the main model. By learning the initial network activities of the general attacks, it is possible to improve the detection rate and aware of the attacks earlier. Different from the research that only use datasets or self-generated data to go through the research, this study has implemented a system based on machine learning algorithm to periodically convert and analyze real network traffic. In addition to the improvement of detection rate, it is also more effective to detect non-well-known port attacks than traditional rule-based system.
Based on the original NetFlow data field, this research proposes four new features to increase the discriminant accuracy, and limits the feature diversity and data set balance on the training data set, which allows the learning model to better reflect the situation at the time in real detection. In a dynamic network environment, there are more flexible learning data, which are large and extremely unbalanced. Appropriate selection and processing of these data, and completing model training in an acceptable time, is significant for determining whether the near real-time traffic is normal or malicious in real-world network environment.
[1] EC-Council. (2018). Certified Ethical Hacker - CEH Certification | EC-Council. [online] Available at: http://www.eccouncil.org/Certification/certified-ethical-hacker.
[2] 郭鎮頴(2015)。設計與實作基於NetFlow的網路入侵偵測系統。國立成功大學電腦與通信工程研究所碩士論文,台南市。 取自https://hdl.handle.net/11296/aafvma
[3] Ingham, K., & Forrest, S. (2002). A history and survey of network firewalls. University of New Mexico, Tech. Rep.
[4] Moore, A. W., & Papagiannaki, K. (2005, March). Toward the accurate identification of network applications. In International Workshop on Passive and Active Network Measurement (pp. 41-54). Springer, Berlin, Heidelberg.
[5] Sabahi, F., & Movaghar, A. (2008, October). Intrusion detection: A survey. In Systems and Networks Communications, 2008. ICSNC'08. 3rd International Conference on (pp. 23-26). IEEE.
[6] Zhang, X., Li, C., & Zheng, W. (2004, September). Intrusion prevention system design. In null (pp. 386-390). IEEE.
[7] Kenkre, P. S., Pai, A., & Colaco, L. (2015). Real time intrusion detection and prevention system. In Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014 (pp. 405-411). Springer, Cham.
[8] Guimaraes, M., & Murray, M. (2008, September). Overview of intrusion detection and intrusion prevention. In Proceedings of the 5th annual conference on Information security curriculum development (pp. 44-46). ACM.
[9] Lazarevic, A., Kumar, V., & Srivastava, J. (2005). Intrusion detection: A survey. In Managing Cyber Threats (pp. 19-78). Springer, Boston, MA.
[10] Liao, H. J., Lin, C. H. R., Lin, Y. C., & Tung, K. Y. (2013). Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications, 36(1), 16-24.
[11] Vokorokos, L., & Baláž, A. (2010, May). Host-based intrusion detection system. In Intelligent Engineering Systems (INES), 2010 14th International Conference on (pp. 43-47). IEEE.
[12] Vigna, G., & Kemmerer, R. A. (1999). NetSTAT: A network-based intrusion detection system. Journal of computer security, 7(1), 37-71.
[13] Saad, S., Traore, I., Ghorbani, A., Sayed, B., Zhao, D., Lu, W., ... & Hakimian, P. (2011, July). Detecting P2P botnets through network behavior analysis and machine learning. In Privacy, Security and Trust (PST), 2011 Ninth Annual International Conference on (pp. 174-180). IEEE.
[14] Pleskonjic, D. (2003, December). Wireless intrusion detection systems (WIDS). In 19th Annual Computer Security Applications Conference.
[15] Peddabachigari, S., Abraham, A., Grosan, C., & Thomas, J. (2007). Modeling intrusion detection system using hybrid intelligent systems. Journal of network and computer applications, 30(1), 114-132.
[16] Heberlein, L. T., Dias, G. V., Levitt, K. N., Mukherjee, B., Wood, J., & Wolber, D. (1990, May). A network security monitor. In Research in Security and Privacy, 1990. Proceedings., 1990 IEEE Computer Society Symposium on (pp. 296-304). IEEE.
[17] Bejtlich, R. (2013). The practice of network security monitoring: understanding incident detection and response. No Starch Press.
[18] Snort.org. (2018). Snort - Network Intrusion Detection & Prevention System. [Online]. Available: https://www.snort.org.
[19] Lakhina, A., Crovella, M., & Diot, C. (2005, August). Mining anomalies using traffic feature distributions. In ACM SIGCOMM Computer Communication Review (Vol. 35, No. 4, pp. 217-228). ACM.
[20] Zhengbing, H., Zhitang, L., & Junqi, W. (2008, January). A novel Network Intrusion Detection System (NIDS) based on signatures search of data mining. In Proceedings of the 1st international Conference on Forensic Applications and Techniques in Telecommunications, information, and Multimedia and Workshop(p. 45). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).
[21] Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., & Vázquez, E. (2009). Anomaly-based network intrusion detection: Techniques, systems and challenges. computers & security, 28(1-2), 18-28.
[22] Farraposo, S., Owezarski, P., & Monteiro, E. (2006, April). Contribution of anomalies detection and analysis on traffic engineering. In INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings (pp. 1-2). IEEE.
[23] Nguyen, H. A., Van Nguyen, T., Kim, D. I., & Choi, D. (2008, May). Network traffic anomalies detection and identification with flow monitoring. In Wireless and Optical Communications Networks, 2008. WOCN'08. 5th IFIP International Conference on(pp. 1-5). IEEE.
[24] Moore, A. W., & Zuev, D. (2005, June). Internet traffic classification using bayesian analysis techniques. In ACM SIGMETRICS Performance Evaluation Review (Vol. 33, No. 1, pp. 50-60). ACM.
[25] Altwaijry, H., & Algarny, S. (2012). Bayesian based intrusion detection system. Journal of King Saud University-Computer and Information Sciences, 24(1), 1-6.
[26] Li, Y., & Guo, L. (2007). An active learning based TCM-KNN algorithm for supervised network intrusion detection. Computers & security, 26(7-8), 459-467.
[27] Om, H., & Kundu, A. (2012, March). A hybrid system for reducing the false alarm rate of anomaly intrusion detection system. In Recent Advances in Information Technology (RAIT), 2012 1st International Conference on (pp. 131-136). IEEE.
[28] Aburomman, A. A., & Reaz, M. B. I. (2016). A novel SVM-kNN-PSO ensemble method for intrusion detection system. Applied Soft Computing, 38, 360-372.
[29] Chen, W. H., Hsu, S. H., & Shen, H. P. (2005). Application of SVM and ANN for intrusion detection. Computers & Operations Research, 32(10), 2617-2634.
[30] Mukkamala, S., Janoski, G., & Sung, A. (2002). Intrusion detection using neural networks and support vector machines. In Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint Conference on (Vol. 2, pp. 1702-1707). IEEE.
[31] Chen, H., Hu, Z., Ye, Z., & Liu, W. (2009, December). A new model for P2P traffic identification based on DPI and DFI. In Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on (pp. 1-3). IEEE.
[32] Liao, M. Y., Luo, M. Y., Yang, C. S., Chen, C. H., Wu, P. C., & Chen, Y. W. (2012). Design and evaluation of deep packet inspection system: a case study. IET networks, 1(1), 2-9.
[33] Cascarano, N., Ciminiera, L., & Risso, F. (2010, March). Improving cost and accuracy of DPI traffic classifiers. In Proceedings of the 2010 ACM Symposium on Applied Computing (pp. 641-646). ACM.
[34] 張家綸(2017)。「於多核心環境下設計與實作一針對Linux深層封包檢測系統之封包排程機制」。碩士論文,國立成功大學電腦與通信工程研究所。https://hdl.handle.net/11296/pqv35v。
[35] Jiang, W., Yang, Y. H. E., & Prasanna, V. K. (2010, April). Scalable multi-pipeline architecture for high performance multi-pattern string matching. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on (pp. 1-12). IEEE.
[36] EU GDPR Portal. (2018). EU GDPR Information Portal. [Online]. Available: https://www.eugdpr.org/.
[37] Cisco Systems NetFlow Services Export Version 9 (2004).
[38] Cisco. (2018). NetFlow Version 9 Flow-Record Format [IP Application Services]. [Online]. Available: https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html.
[39] Li, B., Springer, J., Bebis, G., & Gunes, M. H. (2013). A survey of network flow applications. Journal of Network and Computer Applications, 36(2), 567-581.
[40] Sperotto, A., Schaffrath, G., Sadre, R., Morariu, C., Pras, A., & Stiller, B. (2010). An Overview of IP Flow-based Intrusion Detection. IEEE Communications Surveys and Tutorials, 12(3), 343-356.
[41] Farraposo, S., Owezarski, P., & Monteiro, E. (2006, April). Contribution of anomalies detection and analysis on traffic engineering. In INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings (pp. 1-2). IEEE.
[42] Lakhina, A., Crovella, M., & Diot, C. (2005, August). Mining anomalies using traffic feature distributions. In ACM SIGCOMM Computer Communication Review (Vol. 35, No. 4, pp. 217-228). ACM.
[43] Winter, P., Hermann, E., & Zeilinger, M. (2011, February). Inductive intrusion detection in flow-based network data using one-class support vector machines. In New Technologies, Mobility and Security (NTMS), 2011 4th IFIP International Conference on (pp. 1-5). IEEE.
[44] Liu, X. W., Wang, H. Q., Liang, Y., & Lai, J. B. (2007, August). Heterogeneous multi-sensor data fusion with multi-class support vector machines: creating network security situation awareness. In Machine Learning and Cybernetics, 2007 International Conference on (Vol. 5, pp. 2689-2694). IEEE.
[45] Vykopal, J., Plesnik, T., & Minarik, P. (2009, March). Network-based dictionary attack detection. In Future Networks, 2009 International Conference on (pp. 23-27). IEEE.
[46] Frias-Martinez, V., Sherrick, J., Stolfo, S. J., & Keromytis, A. D. (2009, December). A network access control mechanism based on behavior profiles. In Computer Security Applications Conference, 2009. ACSAC'09. Annual (pp. 3-12). IEEE.
[47] Šourek, G., Kuželka, O., & Železný, F. (2015, June). Learning to detect network intrusion from a few labeled events and background traffic. In IFIP International Conference on Autonomous Infrastructure, Management and Security (pp. 73-86). Springer, Cham.
[48] Vargas-Muñoz, M. J., Martínez-Peláez, R., Velarde-Alvarado, P., Moreno-García, E., Torres-Roman, D. L., & Ceballos-Mejía, J. J. (2018, February). Classification of network anomalies in flow level network traffic using Bayesian networks. In 2018 International Conference on Electronics, Communications and Computers (CONIELECOMP) (pp. 238-243). IEEE.
[49] Pao, T. L., & Wang, P. W. (2004). Netflow based intrusion detection system. In Networking, Sensing and Control, 2004 IEEE International Conference on (Vol. 2, pp. 731-736). IEEE.
[50] Zhenqi, W., & Xinyu, W. (2008, December). Netflow based intrusion detection system. In MultiMedia and Information Technology, 2008. MMIT'08. International Conference on (pp. 825-828). IEEE.
[51] Fumo, D. (2017). Types of Machine Learning Algorithms You Should Know. Retrieved from Towards Data Science: https://towardsdatascience. com/types-ofmachine-learning-algorithms-you-should-know-953a08248861.
[52] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Chen, Y. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354.
[53] Systutorials.com. (2018). nfcapd - netflow capture daemon - Linux Man Pages (1). [Online]. Available: https://www.systutorials.com/docs/linux/man/1-nfcapd/.
[54] Systutorials.com. (2018). mount - eCryptfs private mount helper. - Linux Man Pages (1).[Online]. Available: https://www.systutorials.com/docs/linux/man/1-mount/.
[55] En.wikipedia.org. (2018). Network File System. [Online]. Available: https://en.wikipedia.org/wiki/Network_File_System.
[56] Systutorials.com. (2018). nfdump - netflow display and analyze program - Linux Man Pages (1). [Online]. Available: https://www.systutorials.com/docs/linux/man/1-nfdump/.
[57] Nfdump.sourceforge.net. (2018). NFDUMP. [Online]. Available: http://nfdump.sourceforge.net/.
[58] Iana.org. (2018). Protocol Numbers. [Online]. Available: https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml.
[59] Team-cymru.org. (2018). [Online]. Available: http://www.team-cymru.org/Services/Bogons/bogon-bn-agg.txt.
[60] Iana.org. (2018). Service Name and Transport Protocol Port Number Registry. [Online]. Available: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml.
[61] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). ACM.
[62] GitHub. (2018). dmlc/xgboost. [Online]. Available: https://github.com/dmlc/xgboost.
[63] GitHub. (2018). comment:re sklearn -- integer encoding vs 1-hot (py) · Issue #1 · szilard/benchm-ml. [Online]. Available: https://github.com/szilard/benchm-ml/issues/1.
[64] Myip.ms. (2018). My IP Address - Shows IPv4 & IPv6 | Blacklist IP Check - Hosting Info. [Online]. Available: https://myip.ms/.
[65] Abuseipdb.com. (2018). AbuseIPDB - IP address abuse reports - Making the Internet safer, one IP at a time. [Online]. Available: https://www.abuseipdb.com/.
[66] Projecthoneypot.org. (2018). The Web's Largest Community Tracking Online Fraud & Abuse | Project Honey Pot. [Online]. Available: https://www.projecthoneypot.org/.
[67] Scikit-learn.org. (2018). 3.2. Tuning the hyper-parameters of an estimator — scikit-learn 0.19.2 documentation. [Online]. Available: http://scikit-learn.org/stable/modules/grid_search.html.
[68] En.wikipedia.org. (2018). Lagrange multiplier. [Online]. Available: https://en.wikipedia.org/wiki/Lagrange_multiplier.
[69] WhatIsMyIPAddress.com. (2018) .I. Tools and B. Check, "IP Address Blacklist Check" [Online]. Available: https://whatismyipaddress.com/blacklist-check.
[70] WhatIsMyIP.com®. (2018). Blacklist Check - WhatIsMyIP.com®. [Online]. Available: https://www.whatismyip.com/blacklist-check/.