Registration has reached capacity. Join the waitlist

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert (ServiceNow Research, Mila -Quebec AI institute, Polytechnique Montréal), Abhay Puri (ServiceNow Research), Chandra Kiran Reddy Evuru (ServiceNow), Nazanin Mohammadi Sepahvand (ServiceNow Research), Nicolas Chapados (Mila -Quebec AI institute, Polytechnique Montréal), Quentin Cappart (UCLouvain, Polytechnique Montréal, Mila -Quebec AI institute), Jason Stanley (ServiceNow), Alexandre Lacoste (ServiceNow Research), Krishnamurthy Dvijotham (ServiceNow Research), Alexandre Drouin (ServiceNow Research)

Security & Privacy

Malice in Agentland demonstrates that AI agent supply chains are vulnerable to backdoor attacks at three distinct layers: finetuning data poisoning, pre-backdoored base models, and a novel environment poisoning vector that exploits the agent's interaction with its deployment environment. The attacks are hard to detect and cause agents to behave maliciously only when specific triggers are present.

Presentation

Talk

Paper Session 5: Security & Governance

Thursday, May 28 · 11:20 AM – 11:30 AM

Bayshore Ballroom

Poster

Thursday, May 28 · 4:30 PM – 6:00 PM

Carmel

View day schedule

Abstract

While finetuning AI agents on interaction data—such as web browsing or tool use—improves their capabilities, it also introduces critical security vulnerabilities within the agentic AI supply chain. We show that adversaries can effectively poison the data collection pipeline at multiple stages to embed hard-to-detect backdoors that, when triggered, cause unsafe or malicious behavior. We formalize three realistic threat models across distinct layers of the supply chain: direct poisoning of finetuning data, pre-backdoored base models, and environment poisoning, a novel attack vector that exploits vulnerabilities specific to agentic training pipelines. Evaluated on two widely adopted agentic benchmarks, all three threat models prove effective: poisoning only a small number of demonstrations is sufficient to embed a backdoor that causes an agent to leak confidential user information with over 80% success. Furthermore, we demonstrate that prominent safeguards, including four guardrail models and one weight-based defense, fail to detect or prevent the malicious behavior. These findings expose an urgent and underexplored threat to agentic AI development, underscoring the need for rigorous security vetting of data collection pipelines and model supply chains.

Artifacts & Links

                        Authors
                        Léo Boisvert
ServiceNow Research, Mila -Quebec AI institute, Polytechnique Montréal
Abhay Puri
ServiceNow Research
Chandra Kiran Reddy Evuru
ServiceNow
Nazanin Mohammadi Sepahvand
ServiceNow Research
Nicolas Chapados
Mila -Quebec AI institute, Polytechnique Montréal
Quentin Cappart
UCLouvain, Polytechnique Montréal, Mila -Quebec AI institute
Jason Stanley
ServiceNow
Alexandre Lacoste
ServiceNow Research
Krishnamurthy Dvijotham
ServiceNow Research
Alexandre Drouin
ServiceNow Research