Retrieval-Augmented LLMs for Security Incident Analysis
Xavier Cadet (Dartmouth College), Aditya Vikram Singh (Northeastern University), Harsh Mamania (Northeastern University), Edward Koh (Dartmouth College), Alex Fitts (PUNCH Cyber Analytics), Dirk Van Bruggen (PUNCH Cyber Analytics), Simona Boboila (Northeastern University), Peter Chin (Dartmouth College), Alina Oprea (Northeastern University)
Architectural Patterns & Composition Evaluation & Benchmarking
A RAG-based system that automates cybersecurity incident analysis by mapping evidence from heterogeneous logs to MITRE ATT&CK techniques and generating structured incident reports. It substantially reduces the manual effort of correlating intrusion alerts, network records, and authentication events into a coherent attack narrative.
Presentation
Talk
Paper Session 5: Security & Governance
Thursday, May 28 · 11:00 AM – 11:10 AM
Bayshore Ballroom
Poster
Thursday, May 28 · 4:30 PM – 6:00 PM
Carmel
Abstract
Investigating cybersecurity incidents requires collecting and analyzing evidence from multiple log sources, including intrusion detection alerts, network traffic records, and authentication events. This process is labor-intensive: analysts must sift through large volumes of data to identify relevant indicators and piece together what happened. We present a RAG-based system that performs security incident analysis through targeted query-based filtering and LLM semantic reasoning. The system uses a query library with associated MITRE ATT&CK techniques to extract indicators from raw logs, then retrieves relevant context to answer forensic questions and reconstruct attack sequences. We evaluate the system with eight LLM configurations on malware traffic incidents and a multi-stage Active Directory attack. We find that LLMs have different performance and tradeoffs, with Claude Sonnet 4 achieving 94% and DeepSeek V3 achieving 89% average recall across 17 malware scenarios, while DeepSeek costs 15× less than Claude per analysis, and locally-deployed Llama 3.1:70b achieves 81% recall at zero per-query cost. Attack step detection on the Active Directory scenario reaches 100% precision and up to 96% recall with an enumeration prompt. These results demonstrate that combining targeted query-based filtering with RAG-based retrieval-confirmed essential by ablation studies-enables accurate, cost-effective security analysis within LLM context limits.