簡易檢索 / 詳目顯示

研究生: 張育恩
Chang, Yu-En
論文名稱: 變動中心點數量之群聚演算法於P2P殭屍網路偵測之應用
A Clustering Algorithm with Fluctuant-Centroid Adjustment for P2P Botnet Detection
指導教授: 謝錫堃
Shieh, Ce-Kuen
共同指導教授: 張志標
Chang, Jyh-Biau
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 35
中文關鍵詞: 點對點殭屍網路非監督式機器學習MapReduce架構K-Means群聚演算法網路流
外文關鍵詞: P2P Botnet, Unsupervised learning, MapReduce Framwork, K-Means, NetFlow
相關次數: 點閱:204下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,群聚分析在資料探勘以及巨量資料是一個受到關注的議題。許多群聚演算法被提出來解決不同領域的問題。在這些群聚演算法中,K-Means演算法是最廣泛被使用的群聚演算法。但是K-Means演算法有著難以決定中心點數量以及離群值對其群聚結果影響甚大等問題。雖然有許多針對K-Means缺點做改善的研究,但這些研究大部分仍然需要決定一個範圍的中心點數量且有著處理巨量資料的效能問題。
    在本篇研究,我們提出一個可以變動中心點數量之群聚演算法。此演算法改善了K-Means流程,使其在每次迭代會選出新的中心點進行群聚,以及根據群集間距離來合併群集,以達到變動中心點數量的效果。除此之外,也利用了距離門檻值減輕離群值對於結果的影響。我們與之前的研究做整合,並使用真實世界網路流量進行點對點殭屍網路偵測來驗證我們群聚演算法的效果。實驗結果顯示我們的群聚演算法的平均準確率有95%以上。

    In recent years, data clustering problem has been a popular topic in data mining and big data. Numerous clustering algorithms have been proposed to solve problems in different fields. Among these clustering algorithms, the K-Means algorithm is the most widely used clustering algorithm. However, the K-Means algorithm has two issues that will make the great impact on the clustering results, one is hard to determine the number of centroids, and another is the outliers. Although there are many studies that have been proposed to enhance K-Means, they still have several drawbacks, such as the number of centroids that still need to be determined and the effectiveness of processing big data.
    In this paper, we propose a clustering algorithm with the fluctuant-centroid adjustment that can dynamically change the number of centroids. The algorithm enhanced the procedure of traditional K-Means that can select the new center point for clustering and combine clusters according to the distance between clusters at each iteration. Besides, the algorithm utilizes the distance threshold to mitigate the impact of outliers on clustering results. We integrate the clustering algorithm with our previous research and use real-world network traffic for P2P botnet detection to verify the performance of our clustering algorithm. The experiment results show that the average precision of our clustering algorithm is over 95%.

    Chapter 1 : Introduction 1 Chapter 2 : Background & Related Work 5 2.1 Background 5 2.1.1 BotCluster 5 2.2 Related Works 6 Chapter 3 : Methodology 9 3.1 Overview 9 3.2 First stage 11 3.2.1 Centroids Selection 11 3.2.2 Similarity Measurement & Update Centroids 11 3.3 Second stage 13 3.3.1 Centroids Selection 13 3.3.2 Similarity Measurement & Update Centroids 13 3.4 Centroids Combination 14 3.5 Comparison 15 Chapter 4 : Implementation 16 4.1 Centroids Selection in first stage 16 4.2 Similarity Measurement & Update Centroids in first stage 17 4.3 Centroids Selection in second stage 18 4.4 Similarity Measurement & Update Centroids in second stage 19 4.5 Combine Centroids 20 Chapter 5 : Experiments 21 5.1 P2P Botnet Detection Methodology 21 5.2 Integration 22 5.2.1 Grouping 22 5.2.2 Refinement 23 5.3 Environments 24 5.4 Dataset 24 5.5 Verification Method 25 5.6 Experiment Results 26 5.6.1 Experiment 1 – P2P Botnet Detection on Synthetic Dataset 26 5.6.2 Experiment 2 – Verify Distance Threshold of P2P Botnet Detection 26 5.6.3 Experiment 3 – Verify Session-IP Ratio of P2P Botnet Detection 28 5.6.4 Experiment 4 – Comparison of Clustering Approaches 29 5.6.5 Experiment 5 – P2P Botnet Detection on Real Traffic Dataset 31 Chapter 6 : Conclusion 32 Chapter 7 : References 33

    [1] Aggarwal, C. and Reddy, C. Data clustering: Algorithms and applications. Boca Raton, FL: CRC Press, 2013.
    [2] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Roy. Stat. Soc., vol. 39, no. 1, 1977, pp. 1-38.
    [3] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-Means Clustering Algorithm,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1,1979, pp. 100-108.
    [4] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD’96), 1996, pp. 226–231.
    [5] D. Arthur and S. Vassilvitskii, “k-means++: The advantages of careful seeding,” in Proc. ACM-SIAM Symp. Discr. Algorithms, Jan. 2007, pp. 1027–1035.
    [6] D. Pelleg and A. Moore, “X-means: Extending K-means with efficient estimation of the number of clusters,” in Proc. 17th Int. Conf. Machine Learning (ICML’00), 2000, pp. 727–734.
    [7] Juntao Wang; Xiaolong Su, "An improved K-Means clustering algorithm," Communication Software and Networks (ICCSN), IEEE 3rd International Conference on, 2011, pp.44,46, 27-29.
    [8] S. Chawla and A. Gionis, “k-means--: A unified approach to clustering and outlier detection,” in Proc. SIAM Int. Conf. Data Mining, 2013, pp. 189–197.
    [9] Hadoop. http://hadoop.apache.org/
    [10] Chun-Yu Wang, Chi-Lung Ou, Yu-En Zhang, Feng-Min Cho, Jyh-Biau Chang, Ce-Kuen Shieh, “BotCluster: A Session-based P2P Botnet Clustering System on NetFlow”, Submitted to Computer Networks.
    [11] Netflow. https://en.wikipedia.org/wiki/NetFlow
    [12] B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii, “Scalable k-means++,” Proceedings of the VLDB Endowment, vol. 5, no. 7, 2012, pp. 622–633.
    [13] A. Likas, N. Vlassis, and J. Verbeek, “The global K-means clustering algorithm,” Pattern Recognit, vol. 36, no. 2, 2003, pp. 451–461.
    [14] Whois.net. https://www.whois.net/
    [15] Alexa Top Sites. http://www.alexa.com/topsites/countries/TW
    [16] W. Strayer, D. Lapsely, R. Walsh, and C. Livadas, “Botnet detection based on network behavior,” in Botnet Detection, W. Lee, C. Wang, and D. Dagon, Eds., vol. 36, 2008, pp. 1–24
    [17] D. Zhuang and J. M. Chang, “Peerhunter: Detecting peer-to-peer botnets through community behavior analysis,” in Dependable and Secure Computing, 2017 IEEE Conference on. IEEE, 2017, pp. 493–500.
    [18] B. Piyush, D. Manoj and K.G.Mrinal, “A Framework for P2P Botnet Detection Using SVM," International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2012. pp.195-200.
    [19] L. Dan, L. Yichao, H. Yue, and L. Zongwen, "A P2P-Botnet detection model and algorithms based on network streams analysis," International Conference on Future Information Technology and Management Engineering, 2010, pp. 55-58.
    [20] Braavos. https://www.nchc.org.tw/tw/inner.php?CONTENT_ID=744.
    [21] Zeroaccess. https://en.wikipedia.org/wiki/ZeroAccess_botnet.
    [22] Kelihos. https://en.wikipedia.org/wiki/Kelihos_botnet
    [23] Sality. https://en.wikipedia.org/wiki/Sality
    [24] Waledec. https://en.wikipedia.org/wiki/Waledac_botnet
    [25] VirusTotal. https://www.virustotal.com/
    [26] Caiquan Xiong, Zhen Hua, Ke Lv, and Xuan Li, “An Improved K-means Text Clustering Algorithm by Optimizing Initial Cluster Centers,” IEEE 7th International Conference on Cloud Computing and Big Data (CCBD), 2016, pp. 265-268.

    無法下載圖示 校內:2023-08-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE