研究生: |
陳冠宇 Chen, Guan-Yu |
---|---|
論文名稱: |
基於代理式人工智慧的自動化漏洞管理與修補建議系統之建構與評估 Construction and Evaluation of an Agentic AI Based System for Automated Vulnerability Management and Patch Recommendations |
指導教授: |
蔣榮先
Chiang, Jung-Hsien |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2025 |
畢業學年度: | 113 |
語文別: | 英文 |
論文頁數: | 73 |
中文關鍵詞: | 公共漏洞和暴露 、漏洞管理 、大型語言模型 、代理式人工智慧 |
外文關鍵詞: | Common Vulnerabilities and Exposures(CVE), Vulnerability Management, Large Language Models (LLM), Agentic AI |
相關次數: | 點閱:26 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本研究中設計並實作了一套名為「基於代理式人工智慧的自動化漏洞管理系統」,成功解決企業在資產(如:網路設備)受到漏洞影響時,所產生的人力耗損與處理延遲問題。此系統運用語言理解技術與代理智慧技術,能自動擷取與整理各網路設備商的安全公告,將非結構化內容轉化為統一格式,生成具體修補建議,並產出條理清楚的圖像化報告,顯著降低工程師的判讀負擔與回應時間,提升整體作業效率。
這項系統的核心目的,是重新設計傳統漏洞處理流程中最耗時的環節。從資安通告比對、風險評估到漏洞解決方案的撰寫。我們觀察到,在現行作業中,資安工程師需針對每一筆新發表的公共漏洞和暴露,手動比對內部設備型號與軟體版本,並自行查詢多家廠商提供的應對方案,過程不僅繁瑣,也容易因資訊不一致而誤判。當漏洞橫跨多個廠商時,資訊搜尋與整理成本更是成倍上升,嚴重影響工程師的效率。
它以可持續更新的知識資料庫為基礎,自動擷取廠商原始公告內容,並透過語言模型進行結構化與關鍵資訊萃取,將格式各異、表達不一的公告內容轉化為一致且可用的漏洞修補知識。當傳統爬蟲技術在面對這類非結構化文本時,往往因缺乏語意理解能力且各家廠家報告格式各異,故無法正確判讀欄位位置或資訊重點,導致資料易碎、不一致,無法支撐後續分析。相較之下,本系統導入的語言模型會先對不同廠商公告進行深度語意理解,擷取其中的關鍵資訊,並將這些要點整理為結構化的報告內容,為後續的視覺化呈現與行動建議生成奠定堅實基礎。
且為了使工程師能快速掌握資訊,我們同時導入了資料視覺化呈現策略。本系統將解析後的公告轉換為結構化格式,並針對工程師的使用習慣與閱讀偏好進行加強,強化包括表格、條列、超連結與原始程式碼區塊等視覺元素,使整體閱讀體驗更加直觀清晰。系統也針對公告中的冗長描述進行知識精煉,只保留具備實務意義的資訊,進一步降低理解負擔。
為了驗證本系統的實際效益,我們邀請多位資安工程師進行十二組實例測試,結果顯示,在十一組測試中工程師明確表示偏好由本系統生成的報告版本。他們指出,這些報告條理分明、重點清楚,可快速鎖定修補方式,顯著降低閱讀時間與判斷成本。我們也引入大型語言模型作為中立第三方,對報告進行結構與表現評分,發現其結構完整度與表格清晰度皆比原始公告提升
除了主觀偏好與結構評分外,我們進一步設計了「下游問答測試」,以報告內容作為知識基礎,讓語言模型回答實際漏洞處理問題。結果顯示,準確率從使用原始公告的 79.65%,提升至使用本系統所產生報告下的 84.21%,證實該系統不僅提升可讀性,更強化了可應用性與資訊可擷取度。
儘管系統僅在部分廠牌上的公告上進行測試,但我們已成功打造出一套涵蓋漏洞資料擷取、結構化、知識精煉的全自動流程,讓工程師得以在第一時間掌握漏洞影響、迅速擬定修補對策。讓重複而繁瑣的資訊工作得以機器處理,釋放工程師的時間與心力,讓他們能真正專注於高價值的資安決策。
We have designed and implemented a system called AIVMS (Agentic-AI based Vulnerability Management System), which leverages agentic artificial intelligence to address the manpower strain and delays enterprises face when dealing with the overwhelming volume of vulnerability information. AIVMS utilizes large language models and Agent-AI technologies to automatically extract and organize security advisories from major vendors, convert unstructured content into a unified format, generate concrete remediation suggestions, and produce clearly structured visual reports. This significantly reduces engineers’ cognitive load and response time, thereby enhancing overall operational efficiency.
The core objective of this system is to redesign the most time-consuming components of traditional vulnerability handling processes—ranging from advisory matching, risk assessment, to recommendation writing. We observed that, under current workflows, cybersecurity engineers must manually match each newly published CVE against internal device models and software versions, and independently research remediation guidance from multiple vendors. This process is not only tedious but also prone to misjudgment due to inconsistent information. When a vulnerability spans multiple vendors such as F5 and Cisco, the cost of information gathering and organization multiplies, severely impacting engineers' efficiency and focus.
AIVMS was created to address these very pain points. It is built on a continuously updated knowledge base and automatically extracts original vendor advisories. Leveraging language models, it performs structuring and key information extraction, transforming diverse and inconsistently expressed advisories into standardized, actionable remediation knowledge. Traditional web scraping tools often struggle with such unstructured texts due to their lack of semantic understanding and the varying formats of vendor reports. This typically results in fragile and inconsistent data, which is unreliable for further analysis. In contrast, AIVMS employs language models to deeply understand the semantics of different vendor advisories, extract critical insights, and organize them into structured report content—laying a solid foundation for visual presentation and actionable recommendations.
To further enable engineers to quickly grasp information, we also incorporated data visualization strategies. AIVMS converts the parsed advisories into structured HTML formats, optimized based on engineers’ reading habits and preferences. It enhances visual elements such as tables, bullet points, hyperlinks, and code blocks to deliver a more intuitive and clear reading experience. The system also refines knowledge by trimming lengthy descriptions, retaining only practically meaningful content to reduce cognitive load.
To validate the practical effectiveness of AIVMS, we invited several cybersecurity engineers to conduct twelve real-world case tests. In eleven of these tests, engineers explicitly expressed a preference for the AIVMS-generated report versions. They noted that these reports were well-organized, clearly focused, and allowed them to quickly identify remediation methods—significantly reducing reading time and decision-making cost. Additionally, we used a large language model as an impartial third-party evaluator to score the structure and clarity of the reports, finding that AIVMS reports more than doubled the completeness of structure and clarity of tabular information compared to the original advisories.
Beyond subjective preferences and structural evaluation, we further designed a “downstream QA test”, using the report content as a knowledge base for a language model to answer real-world vulnerability handling questions. The results showed that the accuracy improved from 79.65% when using original advisories to 84.21% when using AIVMS-generated reports—demonstrating that the system not only improves readability but also enhances applicability and information retrievability.
Although our testing currently focuses on advisories from a limited set of vendors, we have successfully built a fully automated pipeline covering vulnerability data extraction, structuring, and knowledge refinement. This enables engineers to swiftly assess the impact of vulnerabilities and devise remediation strategies in real time. AIVMS automates repetitive and tedious information processing tasks, freeing up engineers’ time and mental energy so they can focus on high-value cybersecurity decision-making.
[1] MITRE Corporation. MITRE ATT&CK Framework, 2025.
[2] National Institute of Standards and Technology. National Vulnerability Database (NVD), 2025.
[3] The MITRE Corporation. Cve program organization: Cve numbering authorities (cnas), 2025.
[4] Nicolas Crocfer. Discover cve statistics, jan 2025.
[5] Cisco Systems, Inc. Cisco offical website, 2025.
[6] F5, Inc. F5 – application delivery and security solutions, 2025.
[7] Inc. Palo Alto Networks. Palo alto networks – cybersecurity protection software, 2025.
[8] Afees Akinade, Peter Adepoju, Adebimpe Ige, Adeoye Afolabi, and Olukunle Amoo. A conceptual model for network security automation: Leveraging ai-driven frameworks to enhance multi-vendor infrastructure resilience, 09 2021.
[9] Romilla Syed. Cybersecurity vulnerability management: A conceptual ontology and cyber intelligence alert system, 2020.
[10] Micha l Walkowski, Jacek Oko, and S lawomir Sujecki. Vulnerability management models using a common vulnerability scoring system, 09 2021.
[11] Zanis Ali Khan, Aayush Garg, and Qiang Tang. A multi-dataset evaluation of models for automated vulnerability repair, 2025.
[12] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023.
[13] OpenAI et al. Gpt-4 technical report, 2024.
[14] Aaron Grattafiori et al. The llama 3 herd of models, 2024.
[15] Timo Schick, Jane Dwivedi-Yu, Roberto Dess`ı, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools, 2023.
[16] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models, 2023.
[17] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with selffeedback, 2023.
[18] Marco Simoni, Andrea Saracino, Vinod P., and Mauro Conti. Morse: Bridging the gap in cybersecurity expertise with retrieval augmented generation, 2024.
[19] Rikhiya Ghosh, Oladimeji Farri, Hans-Martin von Stockhausen, Martin Schmitt, and George Marica Vasile. Cve-llm : Automatic vulnerability evaluation in medical device industry using large language models, 2024.
[20] Rikhiya Ghosh, Hans-Martin von Stockhausen, Martin Schmitt, George Marica Vasile, Sanjeev Kumar Karn, and Oladimeji Farri. Cve-llm : Ontology-assisted automatic vulnerability evaluation using large language models, 2025.
[21] Jiejun Tan, Zhicheng Dou, Wen Wang, Mang Wang, Weipeng Chen, and Ji Rong Wen. Htmlrag: Html is better than plain text for modeling retrieved knowledge in rag systems. In Proceedings of the ACM on Web Conference 2025, WWW ’25, page 1733–1746. ACM, April 2025.
[22] Cisco Systems, Inc. Cisco evolved programmable network manager and cisco prime infrastructure stored cross-site scripting vulnerabilities, 2025.
[23] Cisco Systems, Inc. Cisco ios xe software web-based management interface vulnerabilities, 2025.
[24] Cisco Systems, Inc. Cisco evolved programmable network manager and cisco prime infrastructure stored cross-site scripting vulnerabilities, 2025.
[25] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020.
[26] Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. G-eval: Nlg evaluation using gpt-4 with better human alignment, 2023.