Skip to main content
Registration is now open! Early-bird pricing available through May 5, 2026. Register now

All Accepted Demos

Valkyrie: A Microservice-Based Framework for Scalable Evaluation of AI Agents

Jarett Forzano (Vals AI), Omar Almatov (Vals AI), Langston Nashold (Vals AI), Nikil Ravi (Vals AI), Orestes Kassian (Vals AI)

Evaluation & Benchmarking Engineering & Operations

Summary

A microservice-based benchmarking framework that decouples benchmark code, agent logic, and execution infrastructure for scalable, reproducible evaluation of AI agents.

Description

Existing frameworks couple benchmark code, agent logic, and execution infrastructure into monolithic repositories, relying on local execution, ephemeral storage, and trust rather than verification. We present Valkyrie, a microservice-based benchmarking framework that decouples these concerns into independently deployable services, scales task execution horizontally across isolated Daytona sandboxes, and persists all results in the organization's own cloud account. We validate the system by running four agents across SWE-Bench Verified and Terminal-Bench 2.0, demonstrating end-to-end orchestration across multiple benchmarks without per-run manual configuration.

ACM CAIS 2026 Sponsors