New: Software for Agents, always up-to-date, delivered via MCP or web. Browse

    Insights
    security
    agentjacking
    claude-code

    Agentjacking Explained: How to Protect Your AI Coding Agent (2026)

    Attackers can hijack your coding agent through the data it reads. Here's what to do about it.

    June 23, 20265 min read
    Share:

    This week, security researchers disclosed a new class of attack called agentjacking. The idea is simple and alarming: an attacker embeds malicious instructions in a data source your coding agent reads (Sentry logs, GitHub issues, Jira tickets, Slack messages), and the agent executes those instructions as if you asked it to.

    The attack works because coding agents trust the context they're given. If your agent pulls in a Sentry error log that contains hidden prompt injection, the agent might execute arbitrary code, exfiltrate data, or modify your codebase in ways you didn't authorize.

    Quick Answer: Agentjacking exploits inject malicious instructions into data sources that AI coding agents read (logs, issues, tickets). Claude Code, Cursor, and Codex are all potentially vulnerable. Protect yourself with security hooks, permission boundaries, and security audit skills.

    How agentjacking works

    1. An attacker identifies that your team uses an AI coding agent connected to external data sources (Sentry, Linear, Jira, GitHub Issues)
    2. They inject crafted text into one of those sources. It could be a specially formatted error message, a comment on an issue, or a Slack message
    3. When your agent reads that data source as part of its normal workflow, it processes the injected instructions
    4. The agent executes the malicious instructions: modifying files, running commands, or sending data to external endpoints

    The attack surface grows with every MCP server you connect. More tools means more data sources means more injection points.

    Recommended skills

    Which agents are vulnerable

    All of them, to varying degrees. Claude Code, Cursor, and Codex all read external data through MCP servers and tool integrations. The attack isn't specific to one agent. It targets the pattern of "agent reads external data and acts on it."

    Claude Code has the most built-in defenses: sandbox mode, network restrictions, permission prompts for file operations. But if you've approved broad permissions, those defenses weaken.

    Codex's kernel-level sandboxing (Seatbelt, bubblewrap, Landlock) provides strong isolation but doesn't prevent the agent from taking actions within its allowed scope.

    How to protect yourself

    Use Claude Code hooks. Hooks are automated triggers that run at key lifecycle events. Set up a pre-execution hook that checks for suspicious patterns in incoming data before the agent acts on it.

    Install security skills. A prompt injection scanner skill reviews incoming context for hidden instructions. A security audit skill checks agent actions against a policy. These add a human-readable layer of defense.

    Restrict MCP server permissions. Only connect MCP servers you actively need. Each connected server is an attack surface. Review what each server can access and limit scope.

    Enable approval workflows. Don't let your agent run commands or modify files without your approval. Claude Code's permission system exists for this reason. Use it.

    Review agent output before committing. Agentjacking is most dangerous when agents run autonomously. Human-in-the-loop review catches malicious modifications before they reach your codebase.

    Browse security skills at agensi.io/skills.

    Disclosure: I run Agensi, where every listed skill goes through an 8-point security scan before publication.


    Related: SKILL.md Security Best Practices and Are AI Agent Skills Safe?.

    Keep reading

    Frequently Asked Questions