研究生: |
林政昌 Lin, Cheng-Chang |
---|---|
論文名稱: |
基於機器學習的資安事件回應協作架構 Machine Learning based Security Orchestration of Incident Response Scheme |
指導教授: |
侯廷偉
Hou, Ting-Wei 許任銘 Hsu, Jen-Ming |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
論文出版年: | 2025 |
畢業學年度: | 113 |
語文別: | 中文 |
論文頁數: | 91 |
中文關鍵詞: | 資安事件回應 、自動化資安協作與應變 、特徵萃取 、主成分分析 |
外文關鍵詞: | Security Incident Response, SOAR, Feature Extraction, Principal Component Analysis |
相關次數: | 點閱:7 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
依據美國國家標準技術研究院(NIST)於2025年4月發布,網路安全風險管理的事件回應建議和注意事項(SP800-61r3)指出,當發現違反組織政策的資訊安全事件時,資安應變團隊的首要任務在於迅速限制事件可能造成的損害範圍,並同時確保組織營運的持續性,為達成此目標,準確識別惡意行為並及時做出適當回應,成為資訊安全事件早期應變的關鍵。
為在事件發生初期能盡快發現並加以遏制,本研究提出一套結合機器學習與自動化協作機制的資安事件應變架構,當辨識出攻擊流量及類別,即產生並佈署遏制措施,以提升組織的資安韌性,確保在遭受攻擊時仍能維持核心業務運作,直至完成根因分析與事件根除。在評估階段選出相對適用於網路流量分析的模型,本研究選用三組公開資料集(InSDN、CIC-IDS2017、CIC-IDS2018),並透過 K-fold 交叉驗證比較四種監督式分類演算法:Decision Tree、Extra Tree、Random Forest與XGBoost,依評估結果本架構採用F1-Score達99.99%的Extra Trees作為適用於網路流量分析的模型(Opportune Model)。在實作階段,本研究設計了兩種攻擊場景,及使用上述三種資料集模擬分別不同網路環境之流量,以驗證架構可行性。結果顯示,本架構應用於不同網路環境仍能有效的偵測攻擊,並自動產生腳本與資安設備協同防禦,以達到遏制攻擊並限制損害範圍之目標。最後,為確保遏制措施持續有效,於每次佈署後旋即進行下一回合的流量取樣分析與腳本佈署,藉由佈署成效之回饋,調整協作腳本以避免損害範圍擴大。此外為能快速識別攻擊類別,本架構使用主成分分析法,以降低資料維度並維持準確率。結果顯示,相較未使用主成分分析法之情境,能縮短 28.6% 的訓練時間,F1-Score僅降低0.08%,相對更能符合應變團隊在事件發生時,及早發現且快速應變之需求。
The National Institute of Standards and Technology (NIST) released SP 800-61r3 in April 2025, emphasizing the rapid containment of security incidents to minimize losses and ensure business continuity. To address this need, this study proposes an incident response framework that integrates machine learning and automated orchestration.
To achieve this goal, this study proposes a machine learning-based automated incident response framework that integrates detection in time, classification, and containment. During the evaluation phase, we used three public datasets, InSDN, CIC-IDS2017, and CIC-IDS2018. We compared four classifiers: decision tree, Extra Trees, random forest, and XGBoost using K-fold cross-validation. The Extra Trees model achieved the best performance, with an F1 score of 99.99%, and was selected as the "Opportune Model" for network traffic analysis. Principal component analysis (PCA) was further used to extract ten key features, reducing training time by 28.6% while only decreasing the F1 score by 0.08%. During the implementation phase, the framework was validated in two simulated scenarios: a distributed denial of service (DDoS) attack and a brute force attack. The results demonstrated that the system effectively identified anomalous traffic, generated mitigation scripts for multiple security devices, and limited the spread of the attack. To ensure ongoing effectiveness, the framework iteratively reanalyzed traffic and redeployed scripts after each containment action, adapting to evolving threats. The results demonstrated that the framework achieved accurate detection, reduced latency, and automated response, meeting the containment objectives of NIST SP 800-61r3.
[1] A. Nelson, S. Rekhi, M. Souppaya, and K. Scarfone, “Incident Response Recommendations and Considerations for Cybersecurity Risk Management: A CSF 2.0 Community Profile,” NIST SP 800-61 Rev. 3, Apr. 2025. doi: https://doi.org/10.6028/NIST.SP.800-61r3.
[2] C. Neiva, C. Lawson, T. Bussa, and G. Sadowski, “Innovation Insight for Security Orchestration, Automation and Response (ID: G00338719),” Gartner Database, Nov. 30, 2017. [Online]. Available: https://www.gartner.com/en/documents/3834578 (accessed Aug. 8, 2025).
[3] Md. A. Talukder, Md. M. Islam, M. A. Uddin, K. F. Hasan, S. Sharmin, S. A. Alyami, and M. A. Moni, “Machine Learning-based Network Intrusion Detection for Big and Imbalanced Data Using Oversampling, Stacking Feature Embedding and Feature Extraction,” J. Big Data, vol. 11, no. 1, p. 33, Feb. 2024. doi: https://doi.org/10.1186/s40537-024-00886-w.
[4] MITRE Corporation, “Frequently Asked Questions,” MITRE ATT&CK®, 2025. [Online]. Available: https://attack.mitre.org/resources/faq/ (accessed Aug. 10, 2025).
[5] 林聖富, 基於二階段分類器之惡意流量偵測, 碩士論文, 資訊管理學系, 國立中央大學, 桃園市, 2023.
[6] E. Osa, E. J. Edifon, and S. Igori, “Performance Analysis of Shallow and Deep Learning Classifiers Leveraging the CICIDS 2017 Dataset,” Int. J. Intell. Syst. Appl., vol. 17, no. 2, pp. 42–55, 2025. doi: https://doi.org/10.5815/ijisa.2025.02.04.
[7] Z. P. Putra, “Evaluating the Performance of Classification Algorithms on the UNSW-NB15 Dataset for Network Intrusion Detection,” Jurnal Ilmiah FIFO, vol. 16, no. 1, p. 84, 2024. doi: https://doi.org/10.22441/fifo.2024.v16i1.009.
[8] J. C. Mondragon, P. Branco, G.-V. Jourdan, A. E. Gutierrez-Rodriguez, and R. R. Biswal, “Advanced IDS: A comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems,” Appl. Intell., vol. 55, no. 7, 2025. doi: https://doi.org/10.1007/s10489-025-06422-4.
[9] M. S. Elsayed, N.-A. Le-Khac, and A. D. Jurcut, “InSDN: A Novel SDN Intrusion Dataset,” IEEE Access, vol. 8, pp. 165263–165284, 2020. doi: https://doi.org/10.1109/access.2020.3022633.
[10] A. Maulana Ibrahimy, F. Dewanta, and M. Erza Aminanto, “Lightweight Machine Learning Prediction Algorithm for Network Attack on Software Defined Network,” in Proc. 2022 IEEE Asia Pacific Conf. Wireless and Mobile (APWiMob), pp. 1–6, 2022. doi: https://doi.org/10.1109/apwimob56856.2022.10014244.
[11] M. S. Ataa, E. E. Sanad, and R. A. El-khoribi, “Intrusion detection in software defined network using deep learning approaches,” Scientific Reports, vol. 14, no. 1, 2024. doi: https://doi.org/10.1038/s41598-024-79001-1.
[12] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,” in Proc. 4th International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal, Jan. 2018, pp. 108–116. doi: https://doi.org/10.5220/0006639801080116.
[13] K. Kurniabudi, D. Stiawan, D. Darmawijoyo, M. Y. Bin Idris, B. Kerim, and R. Budiarto, “Important Features of CICIDS-2017 Dataset For Anomaly Detection in High Dimension and Imbalanced Class Dataset,” Indonesian J. Electr. Eng. Informatics, vol. 9, no. 2, 2021. doi: https://doi.org/10.52549/ijeei.v9i2.3028.
[14] Canadian Institute for Cybersecurity, University of New Brunswick, “CSE-CIC-IDS2018 on AWS.” [Online]. Available: https://www.unb.ca/cic/datasets/ids-2018.html. (accessed Aug. 8, 2025).
[15]S. L. Lohr, Sampling: Design and Analysis, 2nd ed. Boston, MA, USA: Brooks/Cole, 2009.
[16] S. Songma, T. Sathuphan, and T. Pamutha, “Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset,” Computers, vol. 12, no. 12, p. 245, 2023. doi: https://doi.org/10.3390/computers12120245.
[17] S. Hettich and S. D. Bay, “The UCI KDD Archive,” Univ. of California, Dept. of Information and Computer Science, Irvine, CA, 1999. [Online]. Available: http://kdd.ics.uci.edu (accessed Aug. 8, 2025).
[18] S. Sapre, P. Ahmadi, and K. Islam, “A Robust Comparison of the KDDCup99 and NSL-KDD IoT Network Intrusion Detection Datasets Through Various Machine Learning Algorithms,” arXiv, 2019. doi: https://doi.org/10.48550/arXiv.1912.13204.
[19] N. Moustafa and J. Slay, “UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set),” in Proc. 2015 Military Communications and Information Systems Conf. (MilCIS), pp. 1–6, 2015. doi: https://doi.org/10.1109/milcis.2015.7348942.
[20] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, “A Survey of Network-Based Intrusion Detection Data Sets,” Comput. Secur., vol. 86, pp. 147–167, 2019. doi: https://doi.org/10.1016/j.cose.2019.06.005.
[21] L. Idouglid, S. Tkatek, K. Elfayq, and A. Guezzaz, “Next-gen Security In IIoT: Integrating Intrusion Detection Systems With Machine Learning For Industry 4.0 Resilience,” Int. J. Electr. Comput. Eng., vol. 14, no. 3, pp. 3512–3521, 2024. doi: https://doi.org/10.11591/ijece.v14i3.pp3512-3521.
[22] A. Verma and V. Ranga, “On evaluation of Network Intrusion Detection Systems: Statistical analysis of CIDDS-001 dataset using Machine Learning Techniques,” IEEE, 2019. doi: https://doi.org/10.36227/techrxiv.11454276.v1.
[23] A. Guezzaz, S. Benkirane, M. Azrour, and S. Khurram, “A Reliable Network Intrusion Detection Approach Using Decision Tree with Enhanced Data Quality,” Secur. Commun. Netw., vol. 2021, pp. 1–8, 2021. doi: https://doi.org/10.1155/2021/1230593.
[24] Z. Chen, L. Zhou, and W. Yu, “ADASYN−Random Forest Based Intrusion Detection Model,” in Proc. 2021 4th Int. Conf. Signal Processing and Machine Learning, pp. 152–159, 2021. doi: https://doi.org/10.1145/3483207.3483232.
[25] J. Sharma, C. Giri, O.-C. Granmo, and M. Goodwin, “Multi-layer Intrusion Detection System With Extratrees Feature Selection, Extreme Learning Machine Ensemble, and Softmax Aggregation,” EURASIP J. Inf. Secur., vol. 2019, no. 1, 2019. doi: https://doi.org/10.1186/s13635-019-0098-y.
[26] N. Sharma and N. S. Yadav, “Ensemble Learning based Classification of UNSW-NB15 dataset using Exploratory Data Analysis,” in Proc. 2021 9th Int. Conf. Reliability, Infocom Technol. Optimization (Trends Future Directions)(ICRITO), pp. 1–7, 2021. doi: https://doi.org/10.1109/icrito51393.2021.9596213.
[27] L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why Do Tree Based Models Still Outperform Deep Learning on Tabular Data?,” arXiv preprint, arXiv:2207.08815, Jul. 18, 2022. doi: https://doi.org/10.48550/arXiv.2207.08815
[28] G. Maillard, S. Arlot, and M. Lerasle, “Aggregated Hold-Out,” J. Mach. Learn. Res., vol. 22, pp. 1–55, 2021. [Online]. Available: https://jmlr.org/papers/volume22/19-624/19-624.pdf (accessed Aug. 8, 2025).
[29] Wikipedia contributors, “Cross-validation (statistics),” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Cross-validation_(statistics) (accessed Jul. 17, 2025).
[30] V. Lumumba, D. Kiprotich, M. Mpaine, N. Makena, and M. Kavita, “Comparative Analysis of Cross-Validation Techniques: LOOCV, K-folds Cross-Validation, and Repeated K-folds Cross-Validation in Machine Learning Models,” Am. J. Theor. Appl. Stat., vol. 13, no. 5, pp. 127–137, 2024. doi: https://doi.org/10.11648/j.ajtas.20241305.13.
[31] K. Shivashankar and A. Martini, “Maintainability Challenges in ML: A Systematic Literature Review,” in Proc. 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 60–67, 2022. doi: https://doi.org/10.1109/SEAA56994.2022.00018
[32] D. M. W. Powers, “Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation,” arXiv preprint, arXiv:2010.16061, Oct. 11, 2020. doi: https://doi.org/10.48550/arXiv.2010.16061.
[33] A. Balla, M. H. Habaebi, E. A. A. Elsheikh, Md. R. Islam, and F. M. Suliman, “The Effect of Dataset Imbalance on the Performance of SCADA Intrusion Detection Systems,” Sensors, vol. 23, no. 2, p. 758, 2023. doi: https://doi.org/10.3390/s23020758.
[34] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A Study of The Behavior of Several Methods For Balancing Machine Learning Training Data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, Jun. 2004. doi: https://doi.org/10.1145/1007730.1007735
[35] C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique,” Applied Intelligence, vol. 36, pp. 664–684, Apr. 2012. doi: https://doi.org/10.1007/s10489-011-0287-y
[36] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning,” in Proc. 2008 IEEE International Joint Conference on Neural Networks (IJCNN 2008), pp. 1322–1328, 2008. doi: https://doi.org/10.1109/IJCNN.2008.4633969.
[37] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967. Berkeley, CA, USA: Univ. of California Press.
[38] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data Via the EM Algorithm,” J. R. Stat. Soc. Ser. B (Methodological), vol. 39, no. 1, pp. 1–22, 1977.
[39] P. Domingos, “A Few Useful Things to Know About Machine Learning,” Communications of the ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012. doi: https://doi.org/10.1145/2347736.2347755.
[40] S. Khalid, T. Khalil, and S. Nasreen, “A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning,” in Proc. 2014 Science and Information Conference (SAI), pp. 372–378, 2014. doi: https://doi.org/10.1109/SAI.2014.6918213.
[41] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003. [Online]. Available: https://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf (accessed Aug. 8, 2025).
[42] M. Espadoto, R. M. Martins, A. Kerren, N. S. T. Hirata, and A. C. Telea, “Toward a Quantitative Survey of Dimension Reduction Techniques,” IEEE Trans. Vis. Comput. Graph., vol. 27, no. 3, pp. 2153–2173, 2021. doi: https://doi.org/10.1109/tvcg.2019.2944182.
[43] J. Liddle, W. Jiang, and N. Malleson, “Leveraging Principal Component Analysis To Uncover Urban Pedestrian Dynamics,” J. Geogr. Syst., vol. 27, no. 3, pp. 489–513, Jun. 2025. doi: https://doi.org/10.1007/s10109-025-00469-0.
[44] I. T. Jolliffe and J. Cadima, “Principal Component Analysis: A Review and Recent Developments,” Philos. Trans. R. Soc. A Math. Phys. Eng. Sci., vol. 374, no. 2065, p. 20150202, 2016. doi: https://doi.org/10.1098/rsta.2015.0202.
[45] B. Everitt and T. Hothorn, An Introduction to Applied Multivariate Analysis with R, Springer New York, 2011. doi: https://doi.org/10.1007/978-1-4419-9650-3.
[46] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham, MA, USA: Elsevier, 2012.
[47] S. Songma, T. Sathuphan, and T. Pamutha, “Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset,” Computers, vol. 12, no. 12, p. 245, Nov. 2023. doi: https://doi.org/10.3390/computers12120245.
[48] A. Kamruzzaman, S. Ismat, J. C. Brickley, A. Liu, K. Thakur, et al., “A Comprehensive Review of Endpoint Security: Threats and Defenses,” in Proc. 2022 International Conference on Cyber Warfare and Security (ICCWS), Albany, NY, USA, Mar. 2022, pp. 1–7. doi: https://doi.org/10.1109/ICCWS56285.2022.9998470.
[49] NIST, “Guidelines on Firewalls and Firewall Policy,” NIST Special Publication 800-41 Rev. 1, Sep. 28, 2009. doi: https://doi.org/10.6028/NIST.SP.800-41r1
[50] K. A. Scarfone and P. Mell, “Intrusion Detection and Prevention Systems,” in Handbook of Information and Communication Security, P. Stavroulakis and M. Stamp, Eds. New York, NY, USA: Springer, 2010, pp. 177–192. doi: https://doi.org/10.1007/978-3-642-04117-4_9.
[51] OWASP Foundation, “Web Application Firewall,” OWASP Community Pages. [Online]. Available: https://owasp.org/www-community/Web_Application_Firewall. (accessed Aug. 23, 2025)
[52] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: with Applications in R, New York, NY, USA: Springer, 2013. [Online]. Available: https://www.stat.berkeley.edu/~rabbee/s154/ISLR_First_Printing.pdf (accessed Aug. 8, 2025).
[53] M. Ring, S. Wunderlich, D. Grüdl, D. Landes, and A. Hotho, “A Survey of Network-Based Intrusion Detection Data Sets,” Comput. Secur., vol. 86, pp. 147–167, 2019. doi: https://doi.org/10.1016/j.cose.2019.06.005.
[54] Scikit-learn developers, “Model persistence [User guide],” Scikit-learn: Machine Learning in Python. [Online]. Available: https://scikit-learn.org/stable/model_persistence.html (accessed Aug. 8, 2025).
[55] B. G. Marcot and A. M. Hanea, “What Is an Optimal Value of K in K-Fold Cross-Validation in Discrete Bayesian Network Analysis?,” Comput. Stat., vol. 36, no. 3, pp. 2009–2031, 2021. doi: https://doi.org/10.1007/s00180-020-00999-9.