簡易檢索 / 詳目顯示

研究生: 林政昌
Lin, Cheng-Chang
論文名稱: 基於機器學習的資安事件回應協作架構
Machine Learning based Security Orchestration of Incident Response Scheme
指導教授: 侯廷偉
Hou, Ting-Wei
許任銘
Hsu, Jen-Ming
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 91
中文關鍵詞: 資安事件回應自動化資安協作與應變特徵萃取主成分分析
外文關鍵詞: Security Incident Response, SOAR, Feature Extraction, Principal Component Analysis
相關次數: 點閱:7下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 依據美國國家標準技術研究院(NIST)於2025年4月發布,網路安全風險管理的事件回應建議和注意事項(SP800-61r3)指出,當發現違反組織政策的資訊安全事件時,資安應變團隊的首要任務在於迅速限制事件可能造成的損害範圍,並同時確保組織營運的持續性,為達成此目標,準確識別惡意行為並及時做出適當回應,成為資訊安全事件早期應變的關鍵。
    為在事件發生初期能盡快發現並加以遏制,本研究提出一套結合機器學習與自動化協作機制的資安事件應變架構,當辨識出攻擊流量及類別,即產生並佈署遏制措施,以提升組織的資安韌性,確保在遭受攻擊時仍能維持核心業務運作,直至完成根因分析與事件根除。在評估階段選出相對適用於網路流量分析的模型,本研究選用三組公開資料集(InSDN、CIC-IDS2017、CIC-IDS2018),並透過 K-fold 交叉驗證比較四種監督式分類演算法:Decision Tree、Extra Tree、Random Forest與XGBoost,依評估結果本架構採用F1-Score達99.99%的Extra Trees作為適用於網路流量分析的模型(Opportune Model)。在實作階段,本研究設計了兩種攻擊場景,及使用上述三種資料集模擬分別不同網路環境之流量,以驗證架構可行性。結果顯示,本架構應用於不同網路環境仍能有效的偵測攻擊,並自動產生腳本與資安設備協同防禦,以達到遏制攻擊並限制損害範圍之目標。最後,為確保遏制措施持續有效,於每次佈署後旋即進行下一回合的流量取樣分析與腳本佈署,藉由佈署成效之回饋,調整協作腳本以避免損害範圍擴大。此外為能快速識別攻擊類別,本架構使用主成分分析法,以降低資料維度並維持準確率。結果顯示,相較未使用主成分分析法之情境,能縮短 28.6% 的訓練時間,F1-Score僅降低0.08%,相對更能符合應變團隊在事件發生時,及早發現且快速應變之需求。

    The National Institute of Standards and Technology (NIST) released SP 800-61r3 in April 2025, emphasizing the rapid containment of security incidents to minimize losses and ensure business continuity. To address this need, this study proposes an incident response framework that integrates machine learning and automated orchestration.
    To achieve this goal, this study proposes a machine learning-based automated incident response framework that integrates detection in time, classification, and containment. During the evaluation phase, we used three public datasets, InSDN, CIC-IDS2017, and CIC-IDS2018. We compared four classifiers: decision tree, Extra Trees, random forest, and XGBoost using K-fold cross-validation. The Extra Trees model achieved the best performance, with an F1 score of 99.99%, and was selected as the "Opportune Model" for network traffic analysis. Principal component analysis (PCA) was further used to extract ten key features, reducing training time by 28.6% while only decreasing the F1 score by 0.08%. During the implementation phase, the framework was validated in two simulated scenarios: a distributed denial of service (DDoS) attack and a brute force attack. The results demonstrated that the system effectively identified anomalous traffic, generated mitigation scripts for multiple security devices, and limited the spread of the attack. To ensure ongoing effectiveness, the framework iteratively reanalyzed traffic and redeployed scripts after each containment action, adapting to evolving threats. The results demonstrated that the framework achieved accurate detection, reduced latency, and automated response, meeting the containment objectives of NIST SP 800-61r3.

    摘要 I Extended Abstract II 誌謝 IX 目錄 X 表目錄 XIV 圖目錄 XV 第一章、緒論 1 1-1 研究動機與背景 1 1-2 研究目的 2 1-3 研究貢獻 2 1-4 論文架構 2 第二章、相關研究 4 2-1 資安事件應變 4  2-1-1 自動化資安事件協作與回應 4 2-2 資安事件應變管理程序 4  2-2-1 MITRE ATT&CK 5 2-3 入侵偵測機制 6 2-4 資料集 7  2-4-1 InSDN 7  2-4-2 CIC-IDS2017 8  2-4-3 CIC-IDS2018 10  2-4-4 KDD99 11  2-4-5 UNSW-NB15 12  2-4-6 CIDDS-001 12  2-4-7 資料集比較 13 2-5 機器學習模型 14  2-5-1 Decision Tree 15  2-5-2 Random Forest 16  2-5-3 Extra Tree 17  2-5-4 XGBoost 17  2-5-5 模型選擇說明 18 2-6 模型訓練與驗證 19  2-6-1 Hold-out Method 19  2-6-2 K-fold Cross-Validation 19  2-6-3 模型訓練與驗證方式比較 20  2-6-4 機器學習訓練與應用偏差 21 2-7 混淆矩陣 21  2-7-1 準確率(Accuracy) 22  2-7-2 精確率(Precision) 22  2-7-3 召回率(Recall) 22  2-7-4 F1-Score 22 2-8 資料平衡 23  2-8-1 隨機欠採樣 23  2-8-2 隨機過採樣 24  2-8-3 合成少數類別過採樣 24  2-8-4 基於密度合成少數類別過採樣 24  2-8-5 自適應合成採樣 24  2-8-6 資料平衡與增強方法比較 25 2-9 特徵堆疊技術 26  2-9-1 K-means 聚類 27  2-9-2 Gaussian Mixture Model 聚類 27 2-10 特徵降維 27  2-10-1 特徵選擇 28  2-10-2 特徵萃取 28  2-10-3 特徵降維方法選擇 29  2-10-4 主成分分析 30 2-11 特徵縮放 31 2-12 防禦裝置簡介 32 第三章、系統架構 34 3-1 整體架構概述 34 3-2 資料前處理 35 3-3 特徵縮放 37  3-3-1 重新定義攻擊類別 37  3-3-2 Z-score 標準化 38 3-4 類別再平衡(隨機重採樣) 38 3-5 特徵堆疊嵌入 39 3-6 特徵萃取(主成分分析) 41 3-7 模型訓練與評估 42 3-8 實作階段資料處理與特徵轉換 43 3-9 模型佈署與攻擊流量偵測 44 3-10 防禦措施 45  3-10-1 緩解腳本產生 45  3-10-2 緩解腳本佈署 45 第四章、實驗結果與分析 46 4-1 場景與假設 46  4-1-1 評估指標 48 4-2 評估階段 49  4-2-1 實驗一、主成分分析結果分析 49  4-2-2 實驗二、交叉驗證分析 56  4-2-3 實驗三、K-fold 折數影響分析 58  4-2-4 實驗四、降維效益分析 60 4-3 實作階段場景驗證 61  4-3-1 場景一、關鍵服務遭 DDoS 攻擊 62  4-3-2 場景二、攻擊者試圖暴力破解員工網路資源(web, ftp)之密碼 64  4-3-3 小結 66 第五章、結論與未來展望 67 5-1 結論 67 5-2 限制與未來研究方向 68 參考文獻 69

    [1] A. Nelson, S. Rekhi, M. Souppaya, and K. Scarfone, “Incident Response Recommendations and Considerations for Cybersecurity Risk Management: A CSF 2.0 Community Profile,” NIST SP 800-61 Rev. 3, Apr. 2025. doi: https://doi.org/10.6028/NIST.SP.800-61r3.
    [2] C. Neiva, C. Lawson, T. Bussa, and G. Sadowski, “Innovation Insight for Security Orchestration, Automation and Response (ID: G00338719),” Gartner Database, Nov. 30, 2017. [Online]. Available: https://www.gartner.com/en/documents/3834578 (accessed Aug. 8, 2025).
    [3] Md. A. Talukder, Md. M. Islam, M. A. Uddin, K. F. Hasan, S. Sharmin, S. A. Alyami, and M. A. Moni, “Machine Learning-based Network Intrusion Detection for Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction,” J. Big Data, vol. 11, no. 1, p. 33, Feb. 2024. doi: https://doi.org/10.1186/s40537-024-00886-w.
    [4] MITRE Corporation, “Frequently Asked Questions,” MITRE ATT&CK®, 2025. [Online]. Available: https://attack.mitre.org/resources/faq/ (accessed Aug. 10, 2025).
    [5] 林聖富, 基於二階段分類器之惡意流量偵測, 碩士論文, 資訊管理學系, 國立中央大學, 桃園市, 2023.
    [6] E. Osa, E. J. Edifon, and S. Igori, “Performance Analysis of Shallow and Deep Learning Classifiers Leveraging the CICIDS 2017 Dataset,” Int. J. Intell. Syst. Appl., vol. 17, no. 2, pp. 42–55, 2025. doi: https://doi.org/10.5815/ijisa.2025.02.04.
    [7] Z. P. Putra, “Evaluating the Performance of Classification Algorithms on the UNSW-NB15 Dataset for Network Intrusion Detection,” Jurnal Ilmiah FIFO, vol. 16, no. 1, p. 84, 2024. doi: https://doi.org/10.22441/fifo.2024.v16i1.009.
    [8] J. C. Mondragon, P. Branco, G.-V. Jourdan, A. E. Gutierrez-Rodriguez, and R. R. Biswal, “Advanced IDS: A comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems,” Appl. Intell., vol. 55, no. 7, 2025. doi: https://doi.org/10.1007/s10489-025-06422-4.
    [9] M. S. Elsayed, N.-A. Le-Khac, and A. D. Jurcut, “InSDN: A Novel SDN Intrusion Dataset,” IEEE Access, vol. 8, pp. 165263–165284, 2020. doi: https://doi.org/10.1109/access.2020.3022633.
    [10] A. Maulana Ibrahimy, F. Dewanta, and M. Erza Aminanto, “Lightweight Machine Learning Prediction Algorithm for Network Attack on Software Defined Network,” in Proc. 2022 IEEE Asia Pacific Conf. Wireless and Mobile (APWiMob), pp. 1–6, 2022. doi: https://doi.org/10.1109/apwimob56856.2022.10014244.
    [11] M. S. Ataa, E. E. Sanad, and R. A. El-khoribi, “Intrusion detection in software defined network using deep learning approaches,” Scientific Reports, vol. 14, no. 1, 2024. doi: https://doi.org/10.1038/s41598-024-79001-1.
    [12] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,” in Proc. 4th International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal, Jan. 2018, pp. 108–116. doi: https://doi.org/10.5220/0006639801080116.
    [13] K. Kurniabudi, D. Stiawan, D. Darmawijoyo, M. Y. Bin Idris, B. Kerim, and R. Budiarto, “Important Features of CICIDS-2017 Dataset For Anomaly Detection in High Dimension and Imbalanced Class Dataset,” Indonesian J. Electr. Eng. Informatics, vol. 9, no. 2, 2021. doi: https://doi.org/10.52549/ijeei.v9i2.3028.
    [14] Canadian Institute for Cybersecurity, University of New Brunswick, “CSE-CIC-IDS2018 on AWS.” [Online]. Available: https://www.unb.ca/cic/datasets/ids-2018.html. (accessed Aug. 8, 2025).
    [15]S. L. Lohr, Sampling: Design and Analysis, 2nd ed. Boston, MA, USA: Brooks/Cole, 2009.
    [16] S. Songma, T. Sathuphan, and T. Pamutha, “Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset,” Computers, vol. 12, no. 12, p. 245, 2023. doi: https://doi.org/10.3390/computers12120245.
    [17] S. Hettich and S. D. Bay, “The UCI KDD Archive,” Univ. of California, Dept. of Information and Computer Science, Irvine, CA, 1999. [Online]. Available: http://kdd.ics.uci.edu (accessed Aug. 8, 2025).
    [18] S. Sapre, P. Ahmadi, and K. Islam, “A Robust Comparison of the KDDCup99 and NSL-KDD IoT Network Intrusion Detection Datasets Through Various Machine Learning Algorithms,” arXiv, 2019. doi: https://doi.org/10.48550/arXiv.1912.13204.
    [19] N. Moustafa and J. Slay, “UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set),” in Proc. 2015 Military Communications and Information Systems Conf. (MilCIS), pp. 1–6, 2015. doi: https://doi.org/10.1109/milcis.2015.7348942.
    [20] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, “A Survey of Network-Based Intrusion Detection Data Sets,” Comput. Secur., vol. 86, pp. 147–167, 2019. doi: https://doi.org/10.1016/j.cose.2019.06.005.
    [21] L. Idouglid, S. Tkatek, K. Elfayq, and A. Guezzaz, “Next-gen Security In IIoT: Integrating Intrusion Detection Systems With Machine Learning For Industry 4.0 Resilience,” Int. J. Electr. Comput. Eng., vol. 14, no. 3, pp. 3512–3521, 2024. doi: https://doi.org/10.11591/ijece.v14i3.pp3512-3521.
    [22] A. Verma and V. Ranga, “On evaluation of Network Intrusion Detection Systems: Statistical analysis of CIDDS-001 dataset using Machine Learning Techniques,” IEEE, 2019. doi: https://doi.org/10.36227/techrxiv.11454276.v1.
    [23] A. Guezzaz, S. Benkirane, M. Azrour, and S. Khurram, “A Reliable Network Intrusion Detection Approach Using Decision Tree with Enhanced Data Quality,” Secur. Commun. Netw., vol. 2021, pp. 1–8, 2021. doi: https://doi.org/10.1155/2021/1230593.
    [24] Z. Chen, L. Zhou, and W. Yu, “ADASYN−Random Forest Based Intrusion Detection Model,” in Proc. 2021 4th Int. Conf. Signal Processing and Machine Learning, pp. 152–159, 2021. doi: https://doi.org/10.1145/3483207.3483232.
    [25] J. Sharma, C. Giri, O.-C. Granmo, and M. Goodwin, “Multi-layer Intrusion Detection System With Extratrees Feature Selection, Extreme Learning Machine Ensemble, and Softmax Aggregation,” EURASIP J. Inf. Secur., vol. 2019, no. 1, 2019. doi: https://doi.org/10.1186/s13635-019-0098-y.
    [26] N. Sharma and N. S. Yadav, “Ensemble Learning based Classification of UNSW-NB15 dataset using Exploratory Data Analysis,” in Proc. 2021 9th Int. Conf. Reliability, Infocom Technol. Optimization (Trends Future Directions)(ICRITO), pp. 1–7, 2021. doi: https://doi.org/10.1109/icrito51393.2021.9596213.
    [27] L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why Do Tree Based Models Still Outperform Deep Learning on Tabular Data?,” arXiv preprint, arXiv:2207.08815, Jul. 18, 2022. doi: https://doi.org/10.48550/arXiv.2207.08815
    [28] G. Maillard, S. Arlot, and M. Lerasle, “Aggregated Hold-Out,” J. Mach. Learn. Res., vol. 22, pp. 1–55, 2021. [Online]. Available: https://jmlr.org/papers/volume22/19-624/19-624.pdf (accessed Aug. 8, 2025).
    [29] Wikipedia contributors, “Cross-validation (statistics),” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Cross-validation_(statistics) (accessed Jul. 17, 2025).
    [30] V. Lumumba, D. Kiprotich, M. Mpaine, N. Makena, and M. Kavita, “Comparative Analysis of Cross-Validation Techniques: LOOCV, K-folds Cross-Validation, and Repeated K-folds Cross-Validation in Machine Learning Models,” Am. J. Theor. Appl. Stat., vol. 13, no. 5, pp. 127–137, 2024. doi: https://doi.org/10.11648/j.ajtas.20241305.13.
    [31] K. Shivashankar and A. Martini, “Maintainability Challenges in ML: A Systematic Literature Review,” in Proc. 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 60–67, 2022. doi: https://doi.org/10.1109/SEAA56994.2022.00018
    [32] D. M. W. Powers, “Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation,” arXiv preprint, arXiv:2010.16061, Oct. 11, 2020. doi: https://doi.org/10.48550/arXiv.2010.16061.

    [33] A. Balla, M. H. Habaebi, E. A. A. Elsheikh, Md. R. Islam, and F. M. Suliman, “The Effect of Dataset Imbalance on the Performance of SCADA Intrusion Detection Systems,” Sensors, vol. 23, no. 2, p. 758, 2023. doi: https://doi.org/10.3390/s23020758.
    [34] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A Study of The Behavior of Several Methods For Balancing Machine Learning Training Data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, Jun. 2004. doi: https://doi.org/10.1145/1007730.1007735
    [35] C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique,” Applied Intelligence, vol. 36, pp. 664–684, Apr. 2012. doi: https://doi.org/10.1007/s10489-011-0287-y
    [36] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning,” in Proc. 2008 IEEE International Joint Conference on Neural Networks (IJCNN 2008), pp. 1322–1328, 2008. doi: https://doi.org/10.1109/IJCNN.2008.4633969.
    [37] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967. Berkeley, CA, USA: Univ. of California Press.
    [38] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data Via the EM Algorithm,” J. R. Stat. Soc. Ser. B (Methodological), vol. 39, no. 1, pp. 1–22, 1977.
    [39] P. Domingos, “A Few Useful Things to Know About Machine Learning,” Communications of the ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012. doi: https://doi.org/10.1145/2347736.2347755.
    [40] S. Khalid, T. Khalil, and S. Nasreen, “A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning,” in Proc. 2014 Science and Information Conference (SAI), pp. 372–378, 2014. doi: https://doi.org/10.1109/SAI.2014.6918213.
    [41] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003. [Online]. Available: https://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf (accessed Aug. 8, 2025).
    [42] M. Espadoto, R. M. Martins, A. Kerren, N. S. T. Hirata, and A. C. Telea, “Toward a Quantitative Survey of Dimension Reduction Techniques,” IEEE Trans. Vis. Comput. Graph., vol. 27, no. 3, pp. 2153–2173, 2021. doi: https://doi.org/10.1109/tvcg.2019.2944182.
    [43] J. Liddle, W. Jiang, and N. Malleson, “Leveraging Principal Component Analysis To Uncover Urban Pedestrian Dynamics,” J. Geogr. Syst., vol. 27, no. 3, pp. 489–513, Jun. 2025. doi: https://doi.org/10.1007/s10109-025-00469-0.
    [44] I. T. Jolliffe and J. Cadima, “Principal Component Analysis: A Review and Recent Developments,” Philos. Trans. R. Soc. A Math. Phys. Eng. Sci., vol. 374, no. 2065, p. 20150202, 2016. doi: https://doi.org/10.1098/rsta.2015.0202.
    [45] B. Everitt and T. Hothorn, An Introduction to Applied Multivariate Analysis with R, Springer New York, 2011. doi: https://doi.org/10.1007/978-1-4419-9650-3.
    [46] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham, MA, USA: Elsevier, 2012.
    [47] S. Songma, T. Sathuphan, and T. Pamutha, “Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset,” Computers, vol. 12, no. 12, p. 245, Nov. 2023. doi: https://doi.org/10.3390/computers12120245.
    [48] A. Kamruzzaman, S. Ismat, J. C. Brickley, A. Liu, K. Thakur, et al., “A Comprehensive Review of Endpoint Security: Threats and Defenses,” in Proc. 2022 International Conference on Cyber Warfare and Security (ICCWS), Albany, NY, USA, Mar. 2022, pp. 1–7. doi: https://doi.org/10.1109/ICCWS56285.2022.9998470.
    [49] NIST, “Guidelines on Firewalls and Firewall Policy,” NIST Special Publication 800-41 Rev. 1, Sep. 28, 2009. doi: https://doi.org/10.6028/NIST.SP.800-41r1
    [50] K. A. Scarfone and P. Mell, “Intrusion Detection and Prevention Systems,” in Handbook of Information and Communication Security, P. Stavroulakis and M. Stamp, Eds. New York, NY, USA: Springer, 2010, pp. 177–192. doi: https://doi.org/10.1007/978-3-642-04117-4_9.
    [51] OWASP Foundation, “Web Application Firewall,” OWASP Community Pages. [Online]. Available: https://owasp.org/www-community/Web_Application_Firewall. (accessed Aug. 23, 2025)
    [52] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: with Applications in R, New York, NY, USA: Springer, 2013. [Online]. Available: https://www.stat.berkeley.edu/~rabbee/s154/ISLR_First_Printing.pdf (accessed Aug. 8, 2025).
    [53] M. Ring, S. Wunderlich, D. Grüdl, D. Landes, and A. Hotho, “A Survey of Network-Based Intrusion Detection Data Sets,” Comput. Secur., vol. 86, pp. 147–167, 2019. doi: https://doi.org/10.1016/j.cose.2019.06.005.
    [54] Scikit-learn developers, “Model persistence [User guide],” Scikit-learn: Machine Learning in Python. [Online]. Available: https://scikit-learn.org/stable/model_persistence.html (accessed Aug. 8, 2025).
    [55] B. G. Marcot and A. M. Hanea, “What Is an Optimal Value of K in K-Fold Cross-Validation in Discrete Bayesian Network Analysis?,” Comput. Stat., vol. 36, no. 3, pp. 2009–2031, 2021. doi: https://doi.org/10.1007/s00180-020-00999-9.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE