Skip to main content
Registration is now open! Early-bird pricing available through May 5, 2026. Register now

All Accepted Papers

Vista: Verifier-in-the-Loop Agentic RL for Semantic Program Synthesis in Quantum Computing

Cong Yu (Aalto University), Tuo Shi (Aalto University), Valter Uotila (Aalto University), Shilong Deng (University of Liverpool), Lei You (Technical University of Denmark), Bo Zhao (Aalto University)

Architectural Patterns & Composition

Abstract

Semantic program synthesis increasingly depends on external verifiers such as compilers, simulators, and optimizers. In these settings, correctness is determined not by text plausibility but by staged execution against tool-defined semantics. This makes verifier-in-the-loop training a systems problem: verifier stages differ sharply in cost, latency, and informativeness, so executing the full verifier on every candidate is inefficient, while collapsing all verifier outcomes into a single reward can destabilize learning. We present Vista, a verifier-in-the-loop agentic RL system for semantic program synthesis, instantiated for OpenQASM 3.0 quantum circuit generation. VISTA introduces two mechanisms: hierarchical verified reward optimization, which converts staged verifier outcomes into stable learning signals spanning feasibility, behavior, objective quality, and utility; and budget-aware gated evaluation, which schedules expensive verifier stages using partial evidence from earlier stages. VISTA outperforms four classes of baselines - frontier LLM agents, quantum-specific agents, RL post-training agents, and RL agentic tool-use agents. Across quantum optimization tasks, it achieves 1.13$\times$ higher executability at Pass@10, improves semantic solution quality by 1.10$\times$, and cuts verifier cost by 1.77$\times$ under matched-budget evaluation.

ACM CAIS 2026 Sponsors