MCP Security: Prompt Injection and Tool Poisoning Guide

Q: What is tool poisoning in MCP?

Tool poisoning in MCP is when malicious instructions are embedded in tool metadata or related registration-time context so the model chooses or uses tools in an unsafe way.

Q: Is tool poisoning just indirect prompt injection?

It is best treated as an MCP-specific form of indirect prompt injection. The broader problem is untrusted text becoming instructions, and tool poisoning is the tool-metadata version of that failure.

Q: Can a poisoned MCP tool be dangerous even if it never runs?

Yes. Recent MCP papers show poisoned metadata can steer an agent toward a different legitimate high-privilege tool, so the poisoned tool itself never needs to execute.

Q: What reduces MCP prompt injection risk?

The strongest practical controls are trusted registries, manifest review, least privilege, sandboxing, user confirmation for sensitive actions, validation of tool outputs, and treating tool-supplied text as untrusted.

Short answer

MCP security is the security problem of letting an AI agent connect to outside tools, prompts, and resources through the Model Context Protocol. The main risk is not only "a tool returns bad output." The risk starts earlier, when the model reads tool names, descriptions, manifests, prompt templates, or related discovery metadata and gives that text too much authority.

If you want the broader operational checklist rather than the narrower attack-pattern explainer, read our 2026 guide to MCP security best practices. That page covers trusted discovery, caller-bound authorization, sandboxing, approvals, and monitoring around the attack surface explained here.

That is why prompt injection and tool poisoning belong in the same conversation. Prompt injection is the broader problem: untrusted text gets treated like instructions. Tool poisoning is the MCP-shaped version of that failure, where malicious instructions are embedded in tool metadata or related registration-time context so the model picks the wrong tool, leaks data, or takes an unsafe action.

If you want the general version first, read our indirect prompt injection explainer. If you want examples of hidden instructions in normal content first, read our prompt injection examples guide. This page focuses on what changes once tools and MCP servers enter the workflow.

What MCP is in plain language

The Model Context Protocol, or MCP, is a standard way for an AI client to discover tools, prompts, and resources from a server. In plain language, it gives the model a structured way to see what actions exist, what each action claims to do, and how to call them. That is why MCP is useful for real agents and why it changes the security picture.

In a normal chat workflow, the model mostly reads user text. In an MCP workflow, the model also reads tool-facing context: manifests, descriptions, parameter schemas, prompt templates, returned tool content, and linked external resources. Each of those text surfaces can influence what the model decides to do next.

That means MCP security is not only about transport, auth, or API correctness. It is also about a control-boundary problem: which text is trusted to shape behavior, and which text should only be treated as data?

Prompt injection vs tool poisoning in MCP

A practical way to keep the terms straight is to ask where the malicious instruction lives. If untrusted text anywhere in the MCP workflow starts steering the model, that is prompt injection. If the hostile text is embedded in tool metadata or related registration-time context, that is tool poisoning.

Question	Prompt injection in MCP	Tool poisoning
Where the malicious text sits	Tool output, external resources, prompts, manifests, docs, or any other MCP-fed context	Usually in tool names, descriptions, manifests, prompts, or related discovery-time metadata
When it starts working	As soon as the model reads the untrusted text and lets it shape behavior	Often during discovery, registration, or tool selection before the poisoned tool is ever invoked
Does the poisoned tool need to run?	Not always	No. Recent papers show metadata alone can steer a different legitimate tool call
Main attacker goal	Hijack the workflow, leak data, or trigger an unsafe action	Mislead tool selection or drive privileged actions through trusted-looking tool context
Simple example	A server returns hostile content that tells the agent to expose private notes	A tool description quietly tells the model to prefer a file or shell action unrelated to the user task
Best defenses	Privilege separation, output validation, user approval, and less trust in tool-supplied text	Manifest review, registry trust, version pinning, sandboxing, and explicit review of tool choices

Where MCP attacks actually enter

The easiest mistake is to imagine the danger starts only after a tool runs. The recent MCP papers argue for a wider view.

Tool metadata: names, descriptions, prompts, or manifests can carry instruction-like text that changes tool choice.
Tool output: returned content can still inject instructions after execution.
External resources: linked webpages, docs, or files can carry indirect prompt injection into the agent flow.
Discovery and updates: a registry listing can look plausible, or a server can drift after approval through a rug-pull style change.

That wider view matters because the attack surface is half content and half trust workflow. A user may never type a malicious prompt. They just connect a useful-looking server, and the model inherits hostile text from that server's surrounding metadata.

Why tool poisoning is not just "bad tool output"

Tool poisoning is different from a tool returning dangerous text after execution. In the poisoning case, the metadata itself becomes the attack. The model reads the poisoned tool description and changes its behavior before any meaningful result comes back.

This is why recent papers use terms like metadata poisoning and implicit tool poisoning. The poison can live in the descriptive layer around the tool, not only in the tool's runtime response. For general readers, the plain-language version is simple: the label on the tool can be weaponized, not just the output of the tool.

Instruction Hierarchy is useful here because it gives the cleanest defense mindset: tool-supplied text should not outrank higher-trust instructions from the system or the user. In practice, many models still blur that boundary.

What recent papers show

The MCP literature is already strong enough to make this a concrete builder topic, not a speculative one.

MCPTox evaluates 45 live MCP servers and 353 authentic tools, builds 1312 malicious test cases, and reports a best attack success rate of 72.8%, with refusal still under 3% in its evaluation.
MCP-ITP pushes the stealthier version further: a poisoned tool can stay unused while steering a different legitimate high-privilege tool, with reported attack success up to 84.2% and malicious-tool detection as low as 0.3%.
MCP at First Glance studies 1,899 open-source MCP servers and finds MCP-specific tool poisoning in 5.5% of them, alongside 7.2% with general vulnerabilities.
Beyond the Protocol broadens the picture beyond tool poisoning alone, covering tool poisoning, puppet attacks, rug pulls, and malicious external resources, and shows successful uploads of malicious servers to three aggregation platforms.
MindGuard and MCPShield matter because they treat this as a decision-security problem, not only a post-hoc content-filtering problem.

Taken together, the papers support a clean conclusion: MCP security breaks when agent tooling is treated as if it were self-authenticating. It is not. Tool ecosystems are full of low-trust text, low-trust updates, and low-trust third-party code paths.

Why builders should care

MCP turns prompt injection from a "bad answer" problem into a workflow problem. Once the model can browse files, read notes, call SaaS APIs, or trigger local actions, a poisoned tool choice can expose secrets, change records, send the wrong message, or move the workflow into a privileged system the user did not intend to touch.

That is why MCP security is especially relevant to Veridicus Scan's positioning. The app's MCP mode is local, session-based, and foreground-bound on purpose. Those product choices line up with what the source set keeps recommending: explicit sessions, review points, bounded authority, and less blind trust in what tools claim about themselves.

If you want the product-side version of that boundary, see our MCP automation page. This explainer focuses on why those boundaries matter.

What reduces MCP prompt injection risk in practice

The strongest controls in the papers and official guidance are layered. No single prompt rule solves this.

Treat third-party servers, manifests, prompts, and tool descriptions as untrusted unless reviewed.
Use trusted registries, version pinning, and staged rollout so a server cannot quietly change after it earns trust.
Sandbox local or third-party MCP servers and give them the least privilege possible.
Show tool inputs before execution, require user confirmation for sensitive operations, and keep human approvals on.
Validate tool results before passing them back into the model loop.
Keep untrusted content out of higher-priority instruction layers and prefer structured outputs over arbitrary text flow.
Inspect AI-bound URLs, files, and parser-visible artifacts before they reach the agent workflow.

That last point is where Veridicus Scan fits. If MCP workflows are fed by URLs, files, extracted text, and tool-visible metadata, one useful step is to inspect the material before handoff. The site's coverage, URL scanning, and report export pages show how local evidence helps surface suspicious instructions, parser-visible drift, and risky metadata before those inputs reach the model.

If you want the broader checklist that applies beyond MCP, read our prompt injection risk-reduction guide. It covers the same trust-boundary problem from the operator and workflow side.

If you want the full framework around where MCP risk fits, read our OWASP Top 10 for LLM applications guide. It places prompt injection and excessive agency inside the wider OWASP 2025 list.

FAQ

What is tool poisoning in MCP?

Tool poisoning is when malicious instructions are embedded in tool metadata or related registration-time context so the model chooses or uses tools in an unsafe way. In plain language, the tool's descriptive layer becomes part of the attack.

Is tool poisoning just indirect prompt injection?

The safest way to say it is that tool poisoning is an MCP-specific form of indirect prompt injection. The broad problem is untrusted text becoming instructions. Tool poisoning is that same problem showing up in tool-facing metadata.

Can a poisoned MCP tool be dangerous even if it never runs?

Yes. MCP-ITP is the clearest recent paper on this point: a poisoned tool can steer the model into invoking a different legitimate high-privilege tool instead. The poison sits in the decision layer, not only in the executed tool.

What reduces MCP prompt injection risk?

Use trusted registries, manifest review, least privilege, sandboxing, version pinning, validation of tool outputs, and user confirmation before sensitive actions. Just as importantly, treat tool-supplied text as untrusted and keep it out of higher-priority instructions.