Skip to main content
Registration is now open! Early-bird pricing available through May 5, 2026. Register now

All Accepted Demos

Wily: High-Performance Complexity Gated-Feedback for AI Coding Agents

Anthony Shaw (Macquarie University), Amin Beheshti (Macquarie University)

Engineering & Operations Evaluation & Benchmarking

Summary

A high-performance code complexity analyzer integrated as gated feedback for coding agents, reducing complexity growth by 10-27% while maintaining comparable resolution rates.

Description

AI coding agents can now resolve a large fraction of real-world software engineering tasks, yet the patches they produce are consistently more complex and less maintainable than those written by skilled human engineers. Left unchecked, widespread adoption of such agents will accelerate technical-debt accumulation in production codebases. We present Wily, a high-performance code-analysis tool that computes Cyclomatic Complexity (CC), Maintainability Index (MI), and logical lines of code (LLOC) across an entire git history at over 570,000 LOC/s and is up to 55× faster than existing alternatives. We integrate Wily as a complexity-gated feedback signal (backpressure) inside an agentic SWE-bench harness: after each candidate patch the agent receives a complexity delta report and may iterate to reduce regressions. In a controlled study of 500 SWE-bench Verified tasks with two frontier models — Claude Opus 4.6 and GPT-5.3-Codex — the Wily-feedback condition reduces mean complexity growth by 10–27% and LLOC growth by 8–23% relative to a control that receives only test-pass/fail feedback, while maintaining comparable resolution rates (<1.3 pp drop). Both AI conditions remain substantially above the human baseline across both models, confirming a persistent complexity gap that motivates continued research into quality-aware agent training.

ACM CAIS 2026 Sponsors