Skip to main content
Registration has reached capacity. Join the waitlist

All Accepted Papers

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert (ServiceNow Research, Mila -Quebec AI institute, Polytechnique Montréal), Abhay Puri (ServiceNow Research), Chandra Kiran Reddy Evuru (ServiceNow), Nazanin Mohammadi Sepahvand (ServiceNow Research), Nicolas Chapados (Mila -Quebec AI institute, Polytechnique Montréal), Quentin Cappart (UCLouvain, Polytechnique Montréal, Mila -Quebec AI institute), Jason Stanley (ServiceNow), Alexandre Lacoste (ServiceNow Research), Krishnamurthy Dvijotham (ServiceNow Research), Alexandre Drouin (ServiceNow Research)

Security & Privacy

Malice in Agentland demonstrates that AI agent supply chains are vulnerable to backdoor attacks at three distinct layers: finetuning data poisoning, pre-backdoored base models, and a novel environment poisoning vector that exploits the agent's interaction with its deployment environment. The attacks are hard to detect and cause agents to behave maliciously only when specific triggers are present.

Presentation

Talk

Paper Session 5: Security & Governance

Thursday, May 28 · 11:20 AM – 11:30 AM

Bayshore Ballroom

Poster

Thursday, May 28 · 4:30 PM – 6:00 PM

Carmel

Abstract

While finetuning AI agents on interaction data—such as web browsing or tool use—improves their capabilities, it also introduces critical security vulnerabilities within the agentic AI supply chain. We show that adversaries can effectively poison the data collection pipeline at multiple stages to embed hard-to-detect backdoors that, when triggered, cause unsafe or malicious behavior. We formalize three realistic threat models across distinct layers of the supply chain: direct poisoning of finetuning data, pre-backdoored base models, and environment poisoning, a novel attack vector that exploits vulnerabilities specific to agentic training pipelines. Evaluated on two widely adopted agentic benchmarks, all three threat models prove effective: poisoning only a small number of demonstrations is sufficient to embed a backdoor that causes an agent to leak confidential user information with over 80% success. Furthermore, we demonstrate that prominent safeguards, including four guardrail models and one weight-based defense, fail to detect or prevent the malicious behavior. These findings expose an urgent and underexplored threat to agentic AI development, underscoring the need for rigorous security vetting of data collection pipelines and model supply chains.

ACM CAIS 2026 Sponsors