| 研究生: | 王俊又 Wang, Chun-Yu | 
|---|---|
| 論文名稱: | BotCluster:一個用於Netflow上的P2P殭屍網路群聚系統 BotCluster: A P2P Botnet Clustering System on Netflow | 
| 指導教授: | 謝錫堃 Shieh, Ce-Kuen | 
| 共同指導教授: | 張志標 Chang, Jyh-Biau | 
| 學位類別: | 博士 Doctor | 
| 系所名稱: | 電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering | 
| 論文出版年: | 2020 | 
| 畢業學年度: | 108 | 
| 語文別: | 英文 | 
| 論文頁數: | 67 | 
| 中文關鍵詞: | P2P 殭屍網路 、Netflow 、MapReduce 、網路安全 | 
| 外文關鍵詞: | P2P Botnet Detection, Netflow, MapReduce, Network Security | 
| 相關次數: | 點閱:211 下載:0 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
本論文目的在於檢測實際網路流量Netflow日誌中的P2P殭屍網絡活動。這項研究提出了一個基於會話型式(Session-Based)的P2P殭屍網絡行為偵測系統BotCluster,用於群聚Netflow流量日誌中的惡意主機。BotCluster將Netflow的單向記錄合併為雙向會話,然後利用3級分組將相似的會話聚集為具有相似行為的會話變成群組。此外,BotCluster利用殭屍網絡的通信性質的相似性和規律性消除不相關的會話並保持大量異常會話。 BotCluster在分組階段使用無監督的分群演算法DBSCAN作為核心算法。匯集的群組可被視為惡意行為集合,因為只有人為惡意軟件才會在網絡跟踪中生成大量類似模式。同時,面對數據冗餘現象,其中一些相同的特徵向量反復出現。我們也提出了一種數據壓縮方法,以減少輸入量並確保輸入資料有足夠的代表性以符合DBSCAN的群聚的標準。在效能評估上面BotCluster使用從台灣兩個大學校園(成大和中正)的真實Netflow流量日誌進行評估的。數據集的大小分別為694.6 GB和137 GB,總計約有46.2億個流和4,400萬個IP位址。此外為了確保實驗的可靠性我們使用VirusTotal黑名單服務評估檢測結果的準確度。結果表明,BotCluster對成大和中正數據集的檢測準確度分別為96.23%和86.62%。當進行合併兩個校園的Netflow日誌進行偵測時,平均準確度可達97.58%。最後,在將數據壓縮應用於輸入會話後,平均數據縮減率可以達到約81.34%,而平均準確度僅略微降低了1.6%。換句話說,只要給定足夠的觀察時間與足量的資料,BotCluster就能夠偵測在實際網路流量中的P2P殭屍網路活動,不需要任何事先的學習或者預先標記。
This dissertation is aimed to detect P2P Botnet activities in the real traffic Netflow logs. This study presents a Session-based P2P Botnet Behavior Clustering System called BotCluster implemented on MapReduce for aggregating malicious hosts within Netflow traffic logs. The proposed botnet detection system, BotCluster, merges the unidirectional records of Netflow into bi-directional sessions and then utilizes a 3-level grouping to cluster similar sessions into groups with a like behavior. Besides, BotCluster would eliminate unrelated sessions and keep the large irregular sessions using the similarity and regularity of Botnets in their communication nature. BotCluster uses an unsupervised clustering DBSCAN (Density-based spatial clustering of applications with noise) as the core algorithm in the grouping stage. The clustered groups can be considered as malicious behavioral collections because only man-made malware would generate the large of the similar pattern in network traces. Meanwhile, facing duplicated sessions in which some of the same feature vectors repeatedly emerged. A data compacting approach was proposed to reduce the input volume and keep enough representative to fit DBSCAN's criteria. The performance of BotCluster is evaluated using real-world Netflow traffic logs collected from two university campuses in Taiwan (i.e., NCKU and CCU). The datasets have sizes of 694.6 GB and 137 GB, respectively, and contain a total of approximately 4.62 billion flows and a total of approximately 44 million IP addresses. The precision of the BotCluster detection results is evaluated using the VirusTotal blacklist service. It is shown that BotCluster achieves a detection precision of 96.23% and 86.62% for the NCKU and CCU datasets, respectively. When applied to a combined dataset containing the Netflow logs of both campuses, BotCluster achieves an average precision of 97.58%. Finally, with data compacting applied to the input sessions, the average data reduction ratio can up to about 81.34%, and the precision has only slightly decreased by 1.6% on average. In other words, given sufficient observation duration, BotCluster can detect unknown botnets in real traffic without the need for any prior learning or labeling.
[1].	A Generic P2P Botnet Detection System based on Fed-MR (Federated MapReduce), 2014-2017, Ministry of Science and Technology (MOST) general research project. “Unpublished results”.
[2].	Mohammad Alauthaman, Nauman Aslam, Li Zhang, Rafe Alasem, M. A. Hossain, “A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks”, Neural Computing and Applications, 2016.
[3].	S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu et al., "Detecting P2P botnets through network behavior analysis and machine learning", Privacy Security and Trust (PST) 2011 Ninth Annual International Conference on, pp. 174-180, 2011.
[4].	Matija Stevanovic, Jens Myrup Pedersen, “An analysis of network traffic classification for botnet detection”, Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015 International Conference on, pp. 1-8, 2015.
[5].	M. Stevanovic, J. M. Pedersen, “An efficient flow-based botnet detection using supervised machine learning”, Computing Networking and Communications (ICNC) 2014 International Conference on, pp. 797-801, 2014.
[6].	Jia-Hao Sun, Tzung-Han Jeng, Chien-Chih Chen, Hsiu-Chuan Huang, and Kuo-Sen Chou, “MD-Miner: Behavior-Based Tracking of Network Traffic for Malware-Control Domain Detection”, Big Data Computing Service and Applications (BigDataService), 2017 IEEE Third International Conference on, pp. 96-105, 2017.
[7].	Suzan Almutairi, Saoucene Mahfoudh, and Jalal S. Alowibdi, “Peer to Peer Botnet Detection Based on Network Traffic Analysis”, New Technologies, Mobility and Security (NTMS), 2016 8th IFIP International Conference on, pp. 1-4, 2016.
[8].	Duc C. Le, A. Nur Zincir-Heywood, and Malcolm I. Heywood, “Data Analytics on Network Traffic Flows for Botnet Behaviour Detection”, Computational Intelligence (SSCI), 2016 IEEE Symposium Series on, pp.1-7, 2016.
[9].	Francisco Villegas Alejandre, Nareli Cruz Cortés, and Eleazar Aguirre Anaya, “Feature selection to detect botnets using machine learning algorithms”, Electronics, Communications and Computers (CONIELECOMP), 2017 International Conference on, pp. 1-7, 2017.
[10].	Long Mai, and Minho Park, “A Comparison of Clustering Algorithms for Botnet Detection Based on Network Flow”, Ubiquitous and Future Networks (ICUFN), 2016 Eighth International Conference on, pp. 667-669, 2016.
[11].	Ahmad Azab, Mamoun Alazab, and Mahdi Aiash, “Machine learning based Botnet Identification Traffic”, Trustcom/BigDataSE/ISPA, pp. 1788-1794, 2016.
[12].	Elias Bou-Harb, Mourad Debbabi, and Chadi Assi, “Big Data Behavioral Analytics Meet Graph Theory: On Effective Botnet Takedowns”, IEEE Network, pp. 18-26, 2017.
[13].	Qiben Yan, Yao Zheng, Tingting Jiang, Wenjing Lou, and Y. Thomas Hou, “PeerClean: Unveiling peer-to-peer botnets through dynamic group behavior analysis”, Computer Communications (INFOCOM), 2015 IEEE Conference on, pp.316-324, 2015.
[14].	Ruidong Chen, Weina Niu, Xiaosong Zhang, Zhongliu Zhuo, and Fengmao Lv, “An Effective Conversation-Based Botnet Detection Method, ” Mathematical Problems in Engineering, vol. 2017, Article ID 4934082, 9 pages, 2017. doi:10.1155/2017/4934082
[15].	Leyla Bilge, Davide Balzarotti, William Robertson, Engin Kirda, and Christopher Kruegel, “Disclosure: detecting botnet command and control servers through large-scale Netflow analysis, ” Proceedings of the 28th Annual Computer Security Applications Conference, pp.129-138, 2012. [doi>10.1145/2420950.2420969]
[16].	Huy Hang, Xuetao Wei, Michalis Faloutsos, and Tina Eliassi-Rad, “Entelecheia: Detecting P2P botnets in their waiting stage, ” IFIP Networking Conference, 2013, pp.1-9, 2013.
[17].	K. Singh, S.C. Guntuku, A. Thakur, and C. Hota, “Big data analytics framework for peer-to-peer botnet detection using random forests, ” Information Sciences, 278 (2014), pp. 488-497, 2014.
[18].	Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee, “BotMiner: clustering analysis of network traffic for protocol- and structure-independent botnet detection, ” Proceedings of the 17th conference on Security symposium, July 28-August 01, 2008, pp.139-154, 2008.
[19].	Martin Ester, Hans-Peter Kriegel, Jörg Sander , Xiaowei Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, pp. 226–231.
[20].	J. Yan, L. Ying, Y. Yang, P. Su, D. Feng, “Long Term Tracking and Characterization of P2P Botnet”, IEEE TrustCom, pp. 244-251, 2014.
[21].	M. Yahyazadeh and M. Abadi, “BotOnus: An online unsupervised method for botnet detection, ” The ISC International Journal of Information Security (ISeCure), vol. 4, no 1, pp. 51–62, January 2012. 
[22].	Rahimeh Khodadadi, Behzad Akbari, “Ichnaea: Effective P2P botnet detection approach based on analysis of network flows”, Telecommunications (IST), 2014 7th International Symposium on, pp. 934-940, 2014.
[23].	Junjie Zhang, Roberto Perdisci, Wenke Lee, Xiapu Luo, Unum Sarfraz, “Building a Scalable System for Stealthy P2P-Botnet Detection”, IEEE Transactions on Information Forensics and Security, vol. 9, no. 1, pp. 27-38, January 2014.
[24].	P. Narang, S. Ray, C Hota, V. Venkatakrishnan, “Peershark: Detecting peer-to-peer botnets by tracking conversations”, Security and Privacy Workshops (SPW) 2014, 2014.
[25].	W. Ye, K. Cho, "P2P and P2P botnet traffic classification in two stages" in Soft Computing Journal, Berlin Heidelberg:Springer, pp. 1-12, Sep. 2015.
[26].	S. Garg, K. Peddoju, A. Sarje, “Scalable P2P bot detection system based on network data stream Peer-to-Peer Networking and Applications”, pp. 1-16, 2016.
[27].	O. Y. Al-Jarrah, O. Alhussein, P. D. Yoo, S. Muhaidat, K. Taha, and K. Kim, “Data randomization and cluster-based partitioning for botnet intrusion detection”, IEEE Transactions on Cybernetics, vol. 46, no. 8, pp. 1796-1806, 2016.
[28].	Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, “Fast unfolding of communities in large networks”, Journal of Statistical Mechanics: Theory and Experiment 2008.
[29].	Savita Mohurle, Manisha Patil, “A Study of WannaCry Ransomware Attack”, International Journal of Engineering Research in Computer Science and Engineering(IJERCSE), Vol 4, Issue 9, pp.5-7, 2017.
[30].	Alice Hutchings and Richard Clayton, “Configuring Zeus: A case study of online crime target selection and knowledge transmission”, 2017 APWG Symposium on Electronic Crime Research (eCrime), pp.1-8, 2017.
[31].	Olivier Thonnard and Marc Dacier, “A Strategic Analysis of Spam Botnets Operations”, CEAS '11 Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, pp.162-171, 2011.
[32].	Max Kerkers, José Jair Santanna, and Anna Sperotto, “Characterisation of the Kelihos.B Botnet”, Monitoring and Securing Virtualized Networks and Services 8th IFIP WG 6.6 International Conference on Autonomous Infrastructure, Management, and Security, AIMS 2014, pp.79-91, 2014.
[33].	Chung-Nan Lee, Fred Chou, and C. M. Chen, “Automatically Generating Payload-based Models for Botnet Detection”, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp.1-7, 2015.
[34].	Thorsten Holz, Moritz Steiner, Frederic Dahl, Ernst Biersack, Felix Freiling, “Measurements and Mitigation of Peer-to-Peer-based Botnets: A Case Study on Storm Worm”, In Proc. of LEET 2008
[35].	Sam Stover, Dave Dittrich, John Hernandez, and Sven Dietrich, “analysis of the Storm and Nugache trojans: P2P is here”, USENIX login, vol.32, No.6, pp.18-27, 2007
[36].	Gernot Vormayr, Tanja Zseby, and Joachim Fabini, “Botnet Communication Patterns”, IEEE Communications Surveys & Tutorials, VOL. 19, NO. 4, pp.768 - 2796, FOURTH QUARTER, 2017
[37].	Piotr Bazydło, Krzysztof Lasota, Adam Kozakiewicz, “Botnet Fingerprinting: Anomaly Detection in SMTP Conversations”, IEEE Security & Privacy, Volume: 15, Issue: 6, pp.25-32, 2017
[38].	Dilara Acarali, Muttukrishnan Rajarajan, Nikos Komninos, Ian Herwono, “Event graphs for the observation of botnet traffic”, 2017 8th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp.628-634, 2017
[39].	Kallisthenis I. Sgouras, Avraam N. Kyriakidis, Dimitris P. Labridis, “Short-term risk assessment of botnet attacks on advanced metering infrastructure”, IET Cyber-Physical Systems: Theory & Applications, Vol: 2, Issue: 3, pp.143-151, 2017
[40].	Muhammad Yusof, Madihah Mohd Saudi, Farida Ridzuan, “A new mobile botnet classification based on permission and API calls”, 2017 Seventh International Conference on Emerging Security Technologies (EST), pp.122-127, 2017
[41].	Harshvardhan P. Joshi, Matthew Bennison, Rudra Dutta, “Collaborative botnet detection with partial communication graph information”, 2017 IEEE 38th Sarnoff Symposium, pp.1-6, 2017
[42].	Vincenzo Matta, Mario Di Mauro, Maurizio Longo, “Botnet identification in multi-clustered DDoS attacks”, 2017 25th European Signal Processing Conference (EUSIPCO), pp.2171-2175, 2017
[43].	Yuan-Hsiang Su, Amir Rezapour, Wen-Guey Tzeng, “The forward-backward string: A new robust feature for botnet detection”, 2017 IEEE Conference on Dependable and Secure Computing, pp.485-492, 2017
[44].	YeFei Zhang, Yi Chuan, Wang LeiWang, XinHong Hei, Guo Xie, “Fairness-power consumption re-topology strategies for mobile botnet”, 2017 International Conference on Electromagnetics in Advanced Applications (ICEAA), pp.800-803, 2017
[45].	Lorenzo De Carli, Ruben Torres, Gaspar Modelo-Howard, Alok Tongaonkar, Somesh Jha, “Botnet protocol inference in the presence of encrypted traffic”, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, pp.1-9,2017
[46].	Angelo Sapello, Constantin Serban, Ritu Chadha, Rauf Izmailov, “Application of Learning Using Privileged Information(LUPI): Botnet Detection”, 2017 26th International Conference on Computer Communication and Networks (ICCCN), pp.1-8, 2017
[47].	Yacin Nadji, Roberto Perdisci, Manos Antonakakis, “Still Beheading Hydras: Botnet Takedowns Then and Now”, IEEE Transactions on Dependable and Secure Computing, Volume: 14, Issue: 5, pp.535-459, 2017
[48].	Taher Issoufaly, Pierre Ugo Tournoux, “BLEB: Bluetooth Low Energy Botnet for large scale individual tracking”, 2017 1st International Conference on Next Generation Computing Applications (NextComp), pp.115-120, 2017
[49].	Jie Yin, Xiang Cui, Ke Li, “A Reputation-Based Resilient and Recoverable P2P Botnet”, 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC), pp.275-282, 2017
[50].	Manoj S. Koli, Manik K. Chavan, “An advanced method for detection of botnet traffic using intrusion detection system”, 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 481-485, 2017
[51].	Popular Internet of Things Forecast of 50 Billion Devices by 2020 Is Outdated, 2016 IEEE Spectrum, Amy Nordrum, https://spectrum.ieee.org/tech-talk/telecom/internet/popular-internet-of-things-forecast-of-50-billion-devices-by-2020-is-outdated
[52].	VirusTotal, 2017, https://www.virustotal.com/
[53].	Weka, 2017, https://www.cs.waikato.ac.nz/ml/weka/
[54].	Alexa Top Sites, 2017, http://www.alexa.com/topsites/countries/TW
[55].	SimilarWeb, 2017, https://www.similarweb.com/
[56].	Whois.net, 2017, https://www.whois.net/
[57].	TaiWan Advanced Research and Education Network (TWAREN), 2017, http://www.twaren.net/
[58].	NFDUMP, 2017, http://nfdump.sourceforge.net/
[59].	Tcpreplay, 2017, http://tcpreplay.synfin.net/
[60].	Braavos. 2017, https://www.nchc.org.tw/
[61].	Netflow, 2017, https://en.wikipedia.org/wiki/Netflow
[62].	PCAP, 2017, https://en.wikipedia.org/wiki/Pcap
[63].	Hadoop, 2017, http://hadoop.apache.org/
[64].	VirusShare, 2019, https://virusshare.com/
[65].	R.Salem, “A Manifold Learning Framework for Reducing High-dimensional Big Text Data.” 2017 12th International Conference on Computer Engineering and Systems (ICCES), pp. 347–352.
[66].	Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290(5500): 2319-2323.
[67].	 L. K. Saul and S. T. Roweis, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, pp. 2323–2326, 2000.
[68].	L.DeLathauwer, B.DeMoor, and J.Vandewalle, “A multilinear singular value decomposition,” SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000.
[69].	A.Javadpour and G.Wang, “Feature Selection and Intrusion Detection in Cloud Environment based on Machine Learning Algorithms,” 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 1417–1421, 2017
[70].	G.Xu, Y.Ding, Chunyi Wu, Yunan Zhai, and J.Zhao, “Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing,” 2016 8th International Congress on Ultra-Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 235–239, 2016.
[71].	C.Fürber and M.Hepp, “Towards a vocabulary for data quality management in semantic web architectures,” Proceedings of the 1st International Workshop on Linked Web Data Management - LWDM ’11, p. 1, 2011.
[72].	Petr Berka,  “Data Cleansing Using Clustering,” 2015 4th International Conference on Man–Machine Interactions, pp. 391-399.
[73].	P. Li, X. Rao, J. Blasé, Y. Zhang, X. Chu, C. Zhang, "Clean Ml: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]," 2019, arXiv:1904.09483
[74].	I.Taleb, R.Dssouli, and M. A.Serhani, “Big Data Pre-processing: A Quality Framework,” 2015 IEEE International Congress on Big Data, pp. 191–198, 2015.
[75].	I.Taleb, H. T.ElKassabi, M. A.Serhani, R.Dssouli, and C.Bouhaddioui, “Big Data Quality: A Quality Dimensions Evaluation,” 2016 Int Ieee Conferences on Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (Uic/Atc/Scalcom/Cbdcom/Iop/Smartworld), pp. 759–765, 2016.
[76].	A. Kleiner, A. Talwalkar, P. Sarkar, and M. Jordan, “The big data bootstrap,” ArXiv Prepr. ArXiv12066415, 2012.
[77].	W.Wei, M.Zhang, B.Zhang, and X.Tang, “A Data Cleaning Method Based on Association Rules,” Intelligent Systems and Knowledge Engineering (ISKE2007), p. 6, 2007.
[78].	M.Rehman andV.Esichaikul, “Duplicate record detection for database cleansing,” 2009 2nd International Conference on Machine Vision, ICMV 2009, pp. 333–338, 2009.
[79].	M. L.Lee, T. W.Ling, andW. L.Low, “IntelliClean: A Knowledge-Based Intelligent Data Cleaner,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 290–294, 2000.