Registration has reached capacity. Join the waitlist

Valkyrie: A Microservice-Based Framework for Scalable Evaluation of AI Agents

Jarett Forzano (Vals AI), Omar Almatov (Vals AI), Langston Nashold (Vals AI), Nikil Ravi (Vals AI), Orestes Kassian (Vals AI)

Evaluation & Benchmarking Engineering & Operations

A microservice-based benchmarking framework that decouples benchmark code, agent logic, and execution infrastructure for scalable, reproducible evaluation of AI agents.

Presentation

Demo session

Friday, May 29 · 1:45 PM – 3:15 PM

San Jose / Santa Clara

View day schedule

Description

Existing frameworks couple benchmark code, agent logic, and execution infrastructure into monolithic repositories, relying on local execution, ephemeral storage, and trust rather than verification. We present Valkyrie, a microservice-based benchmarking framework that decouples these concerns into independently deployable services, scales task execution horizontally across isolated Daytona sandboxes, and persists all results in the organization's own cloud account. We validate the system by running four agents across SWE-Bench Verified and Terminal-Bench 2.0, demonstrating end-to-end orchestration across multiple benchmarks without per-run manual configuration.

Artifacts & Links

                        Authors
                        Jarett Forzano
Vals AI
Omar Almatov
Vals AI
Langston Nashold
Vals AI
Nikil Ravi
Vals AI
Orestes Kassian
Vals AI