When Anthropic’s security team received a report describing how AI agents act on accounts they don’t own, the response was: “this describes expected AI agent behavior rather than a security vulnerability.”
They were right. That’s the problem.
Eighty-two percent of executives are confident their AI agent security policies protect against unauthorized actions. Eighty-eight percent of their organizations have already had an AI agent security incident in the past year. That gap — from Gravitee’s State of AI Agent Security 2026, a survey of 900-plus practitioners — is six points wide on the page and architecturally vast. Confidence sits at the policy layer. Breaches happen at a layer that doesn’t exist yet.
Every major agentic AI product — Claude Code, OpenAI’s Operator, Google’s Gemini with extensions, the broad ecosystem of MCP-server and A2A-agent integrations — can perform multi-step autonomous operations on external accounts that belong to somebody. The authorization story is OAuth 2.x: a token issued at consent time, scoped to a service, validated by audience claim. OAuth answers “is this principal allowed to do X on this resource?” What it does not answer, and what no agent framework requires the agent to verify before acting, is “did the legitimate owner of this resource actually intend the action the agent is about to take?” The token says yes. Nothing else weighs in.
The Three Jobs a Human Used to Do
In a browser session, a logged-in user clicking through a settings page implicitly answers three distinct questions:
- Identity — who is making this request?
- Capability — is that identity permitted to perform this class of action?
- Resource ownership — does that identity have legitimate authority over the specific resource being acted upon, beyond what a stolen token or social-engineered consent would establish?
The cookie answers identity. The role check answers capability. The user-in-the-chair convention plus the prior account-creation flow stand in for resource ownership. The convention works because humans operate at human pace, are mostly self-monitoring, and tend to notice when something is off.
Agentic AI breaks each pillar of that convention. Identity becomes “the token presented by the agent on behalf of an operator.” Capability becomes “what the OAuth scope and the tool’s API allow.” Resource ownership — the question of whether the operator is actually the rightful controller of the account, not merely someone whose prompt says they are — has no architectural answer. The agent has no mechanism for it. No protocol layer it sits on requires one.
Teleport’s 2026 State of AI in Enterprise Infrastructure Security, a survey of 205 senior infrastructure and security leaders, shows what this looks like in production. Seventy-nine percent of enterprises are evaluating or deploying agentic AI today; only thirteen percent feel highly prepared for the security implications. Seventy percent grant AI systems more access than equivalent human roles. Three percent have automated machine-speed controls governing AI behavior. The other ninety-seven percent are writing permission policies for systems that finish their work before the policy engine evaluates request number one.
The number that should freeze you in your seat: over-privileged agent deployments have a 76% incident rate. Least-privilege deployments sit at 17%. The 4.5x multiplier isn’t theoretical. Proper scoping is the single intervention with the largest measured effect — the deployments that did it have one-fifth the incidents of the deployments that didn’t.
Why “Expected Behavior” Is Correct, and Also Damning
Anthropic’s response wasn’t sloppy. It was structurally right. An agent obeying a properly authenticated instruction is the agent doing its job. The OAuth token validates. The audience claim matches. The action falls within scope. At the protocol level, nothing has misbehaved. The architectural seam they pointed at is real — and is exactly where an authorization-verification layer would sit if anyone had built one.
The closest historical analog is the early web’s authentication story. HTTP Basic Auth was correct at the protocol level too; logins worked when used as designed. The architectural gaps — that credentials traversed the network in the clear, that there was no per-action consent, that session management was browser-side ad hoc — were not vulnerabilities in any one product. They were missing layers. Each new layer arrived only after enough harm accumulated to force standards work: TLS, OAuth, OpenID Connect, FIDO/WebAuthn, passkeys. Each predecessor’s response to the gap was structurally identical to “expected behavior, not a vulnerability.”
Two decades to fill the gaps that “working as designed” left open. The agent industry has been at this seriously for under three years.
Forty Years of Authorization Theory
What’s strange about the current moment is that the primitives have been on the shelf for a long time.
Dennis and Van Horn formalized capability-based security in 1966 — unforgeable tokens that bundle a resource reference with the rights to use it, no ambient authority, no confused deputy (“Programming Semantics for Multiprogrammed Computations,” CACM 9:3). Levy’s Capability-Based Computer Systems (1984) is the canonical survey. Modern capability systems — KeyKOS, EROS, seL4, Capsicum — gate every action by a capability the actor must demonstrably hold. They do not, however, settle the bootstrap question: someone has to issue the first capability for a resource, and that issuance is itself an authority assertion that depends on something outside the capability system.
For agents, capabilities are the right primitive at action time. The bootstrap — who attests that the operator owns the resource the capability points at? — still needs an answer.
Macaroons (Birgisson et al., NDSS 2014) come closer. They are decentralized, attenuable bearer credentials that allow third-party caveats to be attached to tokens — fine-grained delegation with contextual restriction. They let a token carry “but only if condition Y” attestations from third parties. They have not been widely adopted in agent frameworks. MCP and A2A both standardized on plain OAuth 2.1 bearer tokens.
The pattern across all of these — OAuth, RBAC, ACLs, capabilities, macaroons — is the same. Each model answers a different slice of the authorization question, and each makes a foundational assumption: role assignment by trusted humans, consent from the rightful owner, capabilities issued correctly at the bootstrap. Each assumption was load-bearing in human-operated systems because human review and human pace acted as informal verification. Removing the human as the action-rate-limiter removes the safety margin those assumptions depended on.
What the Protocols Specify, and What They Leave Blank
Reading the MCP and A2A specs side by side makes the missing layer visible.
MCP’s authorization model is OAuth 2.1, with dynamic client registration (RFC 7591), protected resource metadata (RFC 9728), and audience-bound tokens (RFC 8707). Servers MUST validate that tokens were issued for them as the intended audience. The spec explicitly forbids token passthrough to upstream APIs. The Coalition for Secure AI’s MCP analysis identified twelve core threat categories spanning forty distinct threats; their mitigation list is detailed and largely correct. What the spec does not address: whether the consenting user has authority over the resource the MCP server exposes, or whether subsequent actions reflect that user’s current intent. Those are by-design out of scope. The gap is not a spec bug — it is a layer that does not yet exist for the spec to defer to.
A2A is similar. Authentication schemes are declared in the agent’s “Agent Card” — API keys, HTTP auth, OAuth2, OpenID Connect, mTLS. Authentication establishes identity. Capability checks happen at the server. Resource-ownership attestation is left to the implementer, and nobody is implementing it because there is no protocol-level place for the attestation to land.
A Sketch of the Layer That’s Missing
What would an authorization-verification layer for agentic AI look like if a protocol designer started from the threat model? The pieces are recognizable. The contribution is composing them.
Resource ownership attestation. A signed statement, by an independent party, that a named principal has authority over a named resource at a stated time. Not the resource itself (circular), not the operator (self-asserting). Examples in the wild: domain ownership via DNS TXT records, GitHub organization verification by email, payment-instrument verification by the issuing bank. A standardized JWT carrying claims like {principal, resource, evidence_class, issued, expires}, attachable to OAuth tokens as a claim or a third-party caveat. Most of the verification infrastructure already exists.
Risk tiering. Not all actions deserve the same friction. A read-only query against a public dataset is qualitatively different from an irreversible payment. Pre-classify actions into tiers — call them G (read-only), Y (reversible writes), R (irreversible or high-blast-radius). G runs on a token alone; Y requires a current attestation; R requires multi-party authorization. MCP’s tool annotation surface is a natural place to add a risk_tier field.
Provenance-verified instruction chains. Even with attestation and tiering, an attacker who compromises the operator’s input surface can produce instructions that look attested. The mitigation is requiring that an instruction reaching a high-risk action carry cryptographic provenance back to the operator — not “the prompt arrived in the agent’s context” but “the prompt was signed by a key the rightful operator controls, with a hash chain any party can verify post-hoc.” Approaches include signed prompt envelopes, hash-linked ledgers anchored to external timechains, and hardware-attested execution environments (TEEs).
Multi-party authorization for irreversible actions. Banking has had this for fifty years; multi-sig wallets ported it to crypto custody. For wire transfers, production deploys, account deletions, and large data exfiltrations, requiring co-signature from a second principal makes the autonomous-misuse path require simultaneous compromise of two surfaces rather than one.
Default-deny for unattested resources. When the agent encounters a resource for which no attestation chain reaches the operator, the action does not proceed. Without default-deny, every other layer is advisory — the agent can simply skip them.
These five layers compose. None requires research breakthroughs. The primitives are forty to sixty years old (capabilities, attestation, multi-party signing) or a decade old (macaroons, hash-linked ledgers, transparency logs). What’s missing is the standardization that makes them interoperable across vendors.
Where This Argument Is Weakest
Three critiques deserve treatment before the recommendation lands.
The friction critique is the hardest. Adding attestation, tiering, and multi-party authorization slows agents down and adds operator burden. For high-volume, low-stakes operations — read a doc, summarize a page, run a unit test — the layers are pure overhead. G/Y/R tiering is the answer, but only if vendors classify actions accurately rather than tier-inflating to avoid friction. The honest version: this architecture is most valuable for agents acting on consequential external resources, and its value diminishes as stakes drop. A coding agent in a sandbox does not need it. A finance agent moving real money does.
The bootstrap critique. Attestation issuers are themselves trusted parties who need to be authorized — turtles all the way down. The defense is the same as PKI’s: attestation roots are public, auditable, and their compromise is detectable post-hoc through transparency logs. This is not zero trust. It is a smaller, public, audited trust set replacing an opaque, per-vendor, unauditable one.
The model-side critique. None of these layers prevent an agent that has been instructed to misuse properly attested tokens within properly attested scope from doing so. If the operator says “transfer my money to this address” and the operator legitimately owns both accounts, the agent transfers. Authorization gates do not protect against authorized-but-foolish actions. That is the model alignment problem, and is genuinely out of scope for an authorization layer. The authorization layer’s job is to ensure that misuse cannot masquerade as authorized use; alignment’s job is to make authorized actions wise.
Why a Single Vendor Cannot Fix This
Even if Anthropic shipped attestation tomorrow, an MCP server that also serves OpenAI clients would either accept attestation (in which case OpenAI clients also benefit, but only because Anthropic’s design happens to be implementable on the shared protocol) or refuse it (in which case Anthropic’s clients lose interoperability). The economic equilibrium is that any one vendor adding friction unilaterally loses adoption to vendors that don’t, unless the friction is spec-mandated across the ecosystem.
This is exactly the case standards bodies exist to address. NIST’s Center for AI Standards and Innovation announced the AI Agent Standards Initiative on February 17, 2026, with a specific concept paper from NCCoE — Accelerating the Adoption of Software and AI Agent Identity and Authorization — by Harold Booth, William Fisher, Ryan Galluzzo, and Joshua Roberts. The public comment period closed April 2. NIST’s framing is direct: “absent confidence in the reliability of AI agents and interoperability among agents and digital resources, innovators may face a fragmented ecosystem and stunted adoption.” MITRE ATLAS v5.4.0 added “Publish Poisoned AI Agent Tool” and “Escape to Host” as recognized techniques in February 2026. The vocabulary is being built. The mitigations are still per-vendor.
A Practical Insight You Can Use This Week
If you’re operating an AI agent against any external account today — a customer’s Salesforce instance, your finance team’s accounting system, your engineering team’s cloud infrastructure — there’s a question worth asking that doesn’t require waiting for standards bodies.
For each tool the agent can call, sort it into G/Y/R yourself. G — read-only, idempotent, low blast radius. Y — writes, but reversible. R — irreversible or high blast radius. For your R tier, ask: if the agent’s input surface (config, prompt, tool output, credential store) were compromised for ten minutes, what’s the worst sequence of authorized-looking actions an attacker could chain? Then ask: what would I require, beyond the OAuth scope, to convince myself the operator actually intended this R-tier action?
That question doesn’t have a protocol answer yet. But asking it produces local mitigations you can ship in a session: explicit allowlists for R-tier resources, manual approval gates that interrupt the agent’s loop, off-channel confirmation for the highest-stakes operations, separate operator credentials for R-tier work that aren’t ambient in the agent’s environment. None of this substitutes for the missing standards layer. All of it is what the standards layer will eventually mandate.
The Replit incident — a coding agent that erased over 1,200 customer records in seconds because nothing gated the destructive operation behind a second signal (Cloud Security Alliance, 2026) — is what an unsorted R-tier looks like in production. The Salesloft/Drift breach in August 2025, where stolen OAuth tokens from a Salesforce integration remained active months after the workflows ended and ultimately compromised more than 700 organizations (Reco AI, 2025), is what unattested resource ownership looks like at scale. Both had OAuth working correctly. Both are what “expected behavior, not a vulnerability” produces over a deployment lifecycle.
The vendors who build the missing layer will be remembered as having built the foundation the ecosystem stands on. The vendors who treat each authorization-gap report as expected behavior will, eventually, be standardized around. The choice — for vendors, for standards bodies, for operators sorting their tools tonight — is which side of that line to be on.
The cheap moment to act is before the harm forces the work. We are still in the cheap moment.
Sources: Gravitee, State of AI Agent Security 2026 (900+ practitioners); Teleport, 2026 State of AI in Enterprise Infrastructure Security (205 CISOs); Dennis & Van Horn, “Programming Semantics for Multiprogrammed Computations,” CACM 9:3 (1966); Levy, Capability-Based Computer Systems (1984); Birgisson et al., “Macaroons,” NDSS 2014; RFC 7591, RFC 9728, RFC 8707; Coalition for Secure AI MCP analysis; NIST CAISI AI Agent Standards Initiative (Feb 17, 2026) and NCCoE concept paper (Booth/Fisher/Galluzzo/Roberts, comment period closed April 2, 2026); MITRE ATLAS v5.4.0 (Feb 2026); Cloud Security Alliance, Replit incident (2026); Reco AI, Salesloft/Drift OAuth-token breach (Aug 2025).
A Receipt for Every Action, Before It Runs
The five-layer sketch above asks for one structural property the rest of the agent stack does not provide: a record, made before the action runs, that any party can verify after the fact. Provenance-verified instruction chains aren’t a behavioral rule the agent can negotiate with — they’re an artifact the agent can’t alter, queryable when the alert never fires. Chain of Consciousness is that artifact today: every agent action gets a cryptographically signed entry on an append-only chain, before the action runs. The standards layer will eventually mandate something like it. Operators sorting their tools into G/Y/R tonight don’t have to wait.
pip install chain-of-consciousness
npm install chain-of-consciousness
Try Hosted CoC — a signed action log, before the action runs.