Skip to main content
Registration is now open! Early-bird pricing available through May 5, 2026. Register now

All Accepted Papers

Composing Policy Gradients and Prompt Optimization for Language Model Programs

Noah Ziems (University of Notre Dame), Dilara Soylu (Stanford University), Lakshya A Agrawal (UC Berkeley), Isaac Miller (Normal Computing), Liheng Lai (UC Berkeley), Chen Qian (Databricks), Kaiqiang Song (Zoom, Inc.), Meng Jiang (University of Notre Dame), Dan Klein (UC Berkeley), Matei Zaharia (UC Berkeley, Databricks), Karel D’Oosterlinck (Contextual AI), Christopher Potts (Stanford University), Omar Khattab (MIT)

Architectural Patterns & Composition

Abstract

Group Relative Policy Optimization (GRPO) has proven to be an effective tool for post-training language models (LMs). However, AI systems are increasingly expressed as modular programs that mix together multiple LM calls with distinct prompt templates and other tools, and it is not clear how best to leverage GRPO to improve these systems. We begin to address this challenge by generalizing GRPO to multi-prompt systems by grouping LM calls by module across rollouts and handling variable-length and interrupted trajectories. We find for the first time that GRPO (and its multi-module counterpart) composes well with automatic prompt optimization, and together they improve accuracy by 11% on average across classification, many-hop search, and privacy-preserving delegation tasks against the post-trained LM---with 5% gains against prompt optimization on its own. Our approach is released as an open-source learning algorithm for compound AI systems.

ACM CAIS 2026 Sponsors