SREGym: A Live Training Ground for AI SRE Agents with High-Fidelity Failure Drills
Jackson Clark (University of Illinois Urbana-Champaign), Yiming Su (University of Illinois Urbana-Champaign), Saad Mohammad Rafid Pial (Bangladesh University of Engineering and Technology), Lily Gniedziejko (University of Illinois Urbana-Champaign), Tianyin Xu (University of Illinois Urbana-Champaign)
Evaluation & Benchmarking
Summary
A live benchmark for AI SRE agents featuring high-fidelity failure drills with fault injection across OS kernels, hardware, and compound multi-event scenarios.
Description
SREGym is a new benchmark for AI-driven SRE (Site Reliability Engineering) techniques for diagnosing and mitigating production failures. SREGym provides a live training ground where high-fidelity failure drills are emulated through fault injectors. SREGym differs from existing SRE benchmarks such as AIOpsLab and ITBench in its realization of comprehensive, high-fidelity failure drills. SREGym implements an extensible software architecture that orchestrates fault injectors and simulators across system stacks, with new capabilities: (1) simulating low-level faults in OS kernels and hardware, (2) coordinating multiple concurrent events into compound drills, and (3) composing noises to model production environments. We demonstrate how to use and extend SREGym and present three representative cases of how AI agents tackle SREGym problems.