← Back to blog

The Dual-Use Problem Is a Trust-Architecture Problem

An AI found a seventeen-year-old FreeBSD zero-day for under fifty dollars. Forty-five years of the crypto wars already taught us the fix isn’t access restriction — it’s trust architecture.

Published April 2026 · 11 min read

CVE-2026-4747 is a stack buffer overflow in FreeBSD’s kgssapi.ko kernel module — the RPCSEC_GSS authentication path for NFS. Six chained RPC requests, a 128-byte buffer, 304 bytes of overflow. It sat in production code for seventeen years. Every human security review missed it.

An AI model found it in a single autonomous run, for under fifty dollars of compute.

That single finding is striking. As a data point in a pattern, it is something else. Across roughly a thousand OSS-Fuzz repositories, Anthropic’s Claude Mythos Preview produced exploitable zero-days in every major operating system and every major web browser. Against Firefox 147 alone: 181 working exploits compared to its predecessor’s two. Against ten fully patched targets: complete control flow hijack. A 27-year-old TCP SACK flaw in OpenBSD. The UK AI Safety Institute’s independent evaluation clocked Mythos at a 73% success rate on expert-level capture-the-flag tasks — challenges that, as recently as April 2025, no model could solve at all. On a 32-step corporate attack range, Mythos became the first model to complete the full chain. The leap from “find a bug” to “weaponize a bug autonomously” — the harder step — moved from near-zero in Claude Opus 4.6 to routine in Mythos.

The capability question is settled. The question that matters now is the one after: when someone uses a capability like this, can they prove what they did with it?

The Gate

Anthropic’s response was access restriction. Project Glasswing, announced April 6, 2026, limits Mythos Preview to a consortium of major technology companies — Amazon, Apple, Cisco, CrowdStrike, Google, Microsoft, Nvidia, Palo Alto Networks among them — backed by $100 million in usage credits and 90-day reporting commitments.

This is responsible. It is also a playbook with a known failure mode.

The Cloud Security Alliance’s analysis contains the finding that should keep Glasswing’s architects up at night. Mythos’s offensive capabilities, they wrote, “emerged as a downstream consequence of general improvements in coding ability, planning, and autonomous tool use” — not from targeted security training. Every laboratory pushing SWE-bench scores higher is, in passing, building offensive capability. You cannot gate a capability that arises spontaneously from making code assistants better.

And the asymmetry cuts the wrong way. Offensive use requires access plus intent. Defensive use requires organizational readiness, patching infrastructure, and the ability to act at speed. Enterprise patching runs on weekly cycles. AI-discovered vulnerabilities become exploitable in hours. Restricting the scanning tool to eleven companies leaves the other ten million internet-facing organizations using weaker alternatives, while attackers use whatever they can access. Manifold’s median forecast for broader Mythos-class access is around five months out. Glasswing buys one product cycle. Then the gate stops mattering.

We have seen this pattern before. We watched it play out for forty-five years.

The Rhyme

In 1954, the United States classified encryption as a munition under the U.S. Munitions List — subject to State Department export control, the same legal category as bombs and tanks. The logic was identical to Glasswing’s: a dual-use technology too dangerous for unrestricted distribution, best confined to vetted hands.

For four decades, the policy held. Then three things broke it.

Commercial demand. The Data Encryption Standard, published by NBS in 1975, created enterprise needs the export-control regime could not accommodate. Officials acknowledged “serious problems.”

Individual defiance. In 1991, Phil Zimmermann distributed Pretty Good Privacy — strong encryption — for free on the internet. Three years of investigation, no charges filed.

The restrictions backfired. Netscape Navigator shipped in two editions: a domestic version with 1024-bit RSA and 128-bit symmetric encryption, and an international edition with 512-bit RSA and 40-bit symmetric encryption that the documentation acknowledged “can currently be broken in a matter of days.” Most American users ended up with the international edition because obtaining the domestic version meant navigating an export-control bureaucracy few could manage. Access restriction did not just fail to contain strong encryption. It actively weakened the encryption defenders used.

The courts finished the job. Bernstein v. United States and Junger v. Daley ruled cryptographic source code was protected speech under the First Amendment. Combined with the widespread availability of encryption software outside U.S. jurisdiction, the regime became unenforceable. Between 1996 and 2000, the Clinton administration dismantled most commercial export controls.

The crypto wars are sometimes told as a story about freedom winning. They are more accurately a story about access restriction’s specific failure mode: it constrains defenders more than attackers. Attackers will break rules. Defenders need legal, auditable, compliant tools. Restrict the tool, and you create a world where attackers use it anyway and defenders cannot.

If that pattern looks distant, look closer. In January 2026, the U.S. House passed the Remote Access Security Act 369-22. RASA extends export control jurisdiction to cloud-based access to controlled GPU capacity — “provision of remote compute access to a foreign person” is now an export transaction. In February, Applied Materials was fined $252 million for exporting ion implantation equipment to China illegally. This is the Clipper Chip pattern, in real time, applied to AI compute. The crypto wars don’t only rhyme historically. They are still ongoing.

What Actually Worked

The resolution to the crypto wars was not unrestricted capability. It was infrastructure.

Today the entire internet runs on encryption that would have sent Phil Zimmermann to prison in 1991. Every HTTPS connection, every SSH session uses cryptographic tools the U.S. government once classified alongside cruise missiles. The dual-use problem was real — and it was solved by building infrastructure around the capability, not by restricting it.

Public key infrastructure. Certificate authorities. Key management. Revocation lists. Audit trails. The conceptual shift was from “who has the capability?” to “can you prove how it was used?” A certificate authority does not prevent malicious encryption. It makes legitimate use verifiable, traceable, auditable. Malicious use stands out precisely because legitimate use can prove itself.

The equivalent infrastructure for AI offensive tools is being built right now, in parallel with the access-restriction regimes. Microsoft announced Zero Trust for AI on March 19, 2026 — extending traditional Zero Trust principles (verify explicitly, apply least privilege, assume breach) across the AI lifecycle. The new AI pillar evaluates 700 security controls across 116 logical groups. Microsoft’s term for an overprivileged or manipulated agent is sharper than “misuse”: double agent. The defense is not “lock the agent up” but “design the system so that an agent working against you stands out.”

NIST’s AI Agent Standards Initiative (February 2026) named the four accountability dimensions: identification, authorization, auditing, non-repudiation. Their finding: existing SP 800-53 control families contain nothing that distinguishes an AI agent from a human operator, scopes agent permissions, or links agent actions to a non-human principal. MITRE ATT&CK v5.4.0 added the adversary’s perspective — “Publish Poisoned AI Agent Tool” and “Escape to Host” cataloging how agent systems break out of intended scope.

The frameworks exist. The accountability dimensions are named. Microsoft is shipping seven hundred specific controls. What’s missing is the connective tissue — the equivalent of what PKI did for encryption.

What Mythos Actually Adds

Worth pausing on the strongest counterargument to Mythos exceptionalism, because it sharpens the policy lesson rather than dulls it.

When the OpenBSD and FreeBSD findings were published, security analysts at Aisle showed that eight models with combined parameters of just 3.6 billion could detect the FreeBSD vulnerability — when given the relevant code segment. If a 3.6B-parameter model spots the bug with human guidance, what is Mythos actually contributing?

Gwern’s response named the gap: “Detection isn’t autonomous discovery; small models produce false positives that make blind search intractable.” Julia put it more vividly: “Giving the exploit location is finding the needle and then giving it to a small child.” What’s new isn’t knowledge of vulnerability classes. It is the autonomy to search for them across a real codebase, chain together the prerequisites, and weaponize what you find — without supervision.

That distinction inverts the policy lesson. If detection is commoditized, restricting large models doesn’t restrict detection. Anyone with a moderately capable model and patience can find the same bugs — slower, with more false positives, but findable. What is not yet commoditized is the autonomous chain. And the autonomous chain is exactly what trust architecture needs to verify.

A logged, signed, scoped record of what an agent did — files read, tools called, targets touched, in what order, under what authorization — is the layer that distinguishes an authorized red team from an unauthorized one. You can’t gate the capability. You can prove how it was used.

The Hallucinating Attacker

Before Mythos existed, the dual-use problem had already manifested with weaker models.

In November 2025, Anthropic’s threat intelligence team documented a state-sponsored espionage campaign targeting roughly thirty organizations across technology, finance, chemicals, and government. Eighty to ninety percent of operations were run autonomously by jailbroken AI coding tools. Four organizations were breached. Detection took weeks; the accounts were banned after a ten-day investigation.

The detail that reframes the problem: despite that autonomous success rate, the campaign included “hallucinated credentials and incorrect assertions about exfiltrated materials.” The AI was simultaneously effective enough to breach four organizations and unreliable enough to fabricate credentials for systems it had already compromised.

The dual-use problem is not about perfect tools in the wrong hands. It is about cheap, scalable, imperfect-but-effective tools at volume. Access restriction optimizes against the wrong threat model — it imagines a world where a small number of sophisticated actors gain access to a restricted capability. The reality: capability sufficient for real damage is available for the cost of an API key and a jailbreak, and was used at scale before the restricted model even existed.

The Finite-Bug Thesis

Mozilla — whose browser was the target of 181 working exploits — responded not with alarm but with optimism.

“Defenders finally have a chance to win, decisively,” their security team wrote in April 2026. “The defects are finite, and we are entering a world where we can finally find them all.” Their independent validation: Mythos against Firefox 150 surfaced 271 vulnerabilities, matching “elite human researchers” across all categories.

The argument is structural. Cybersecurity has been offensively dominant because attackers need only one weakness while defenders must protect everything. AI changes the calculus. If defenders can audit comprehensively — finding not some bugs but all of them — the advantage flips permanently.

But the argument carries a condition. Defense at this scale requires powerful tools deployed widely, not narrowly. Mozilla can claim defensive use because their use is verifiable: public bug tracker, coordinated disclosure, Firefox releases documenting every fix. An attacker using the same tool produces no such trail.

The differentiator is not the tool. It is the infrastructure of accountability around it.

The Insurance Reckoning

Markets are pricing the gap between capability and accountability before regulators get there.

Fitch reported in April 2026 that AI use in cybersecurity could expose short-term coverage holes in cyber insurance. Carriers are introducing explicit AI exclusions — not because they object to the technology, but because they cannot price what they cannot observe. Most cyber policy language was written for a world where humans made decisions and the question was whether they made them negligently. Autonomous agents making thousands of decisions per second do not fit that framework.

Gartner’s 2026 forecast: more than a thousand legal claims for harm caused by AI agents will be filed by year-end. That is a hard deadline for accountability infrastructure. When the lawsuits land, organizations will not be asked whether they restricted access. They will be asked whether they can produce a verifiable record of what their agents did.

Today, AI security riders ask for “documented evidence” — PDFs and self-attestations. The next generation will ask for cryptographic proof: specific actions occurred within a specific scope under specific authorization. Insurers do not care who has the tool. They care whether the use of it is provable.

Where the Parallel Breaks

The crypto wars analogy is imperfect, and the imperfections matter.

The capability gap is narrower than it looks. 40-bit encryption was meaningfully weaker than 128-bit; a model producing 181 exploits is not meaningfully less dangerous than one producing 200. Access restriction buys less time than it did for cryptography.

The timeline is compressed. Crypto wars: forty-five years. The gap from two Firefox exploits to 181: a single generation of model improvement.

Encryption was designed; AI offensive capability emerged accidentally as a side effect of improving code assistants. The crypto wars had identifiable chokepoints — specific algorithms, specific software packages. The AI equivalent would require restricting general-purpose reasoning improvement, which encompasses nearly all frontier research.

The crypto wars were largely an American story. AI capability is emerging globally, under different regulatory regimes. The Wassenaar Arrangement worked for a generation but proved fragile under later geopolitical pressure; this round needs broader coordination than that.

Each imperfection makes the case more urgent. If restriction buys less time, the infrastructure has to be built sooner. If there are no chokepoints, the only remaining lever is the accountability layer.

After the Fifty-Dollar Exploit

CVE-2026-4747 exists because an autonomous agent spent fifty dollars finding a vulnerability human researchers missed for seventeen years. That capability won’t be un-invented. The next generation will be more capable, cheaper, more widely available.

The dual-use problem is not a capability problem (settled at fifty dollars), not a distribution problem (open-source already made it global), not an access-restriction problem. Forty-five years of crypto wars and the live RASA debate answered that: you cannot contain a commodity capability with a licensing regime.

It is a trust-architecture problem. The durable question is not who has the tool. It is whether you can prove what happened when you used it.

If you build agent systems for a living, the practical takeaway is that the accountability layer is the part to start on now, before regulators or insurers force the timing. Four things matter:

  1. Identity for the agent itself — not the user it runs on behalf of. A verifiable identifier you can put in a log, an audit, or a court filing.
  2. A cryptographic record of every action — tool calls, parameters, results, in order, signed and tamper-evident.
  3. Scope tied to authorization — what the agent was allowed to do, by whom, for what reason, until when. Default narrow; expand explicitly.
  4. Audit-grade logs that survive after the fact — not lost to a bad disk, not rewritable by the agent itself.

None of this prevents misuse. That is the lesson of the crypto wars: prevention loses to commodity capability. What it does is make legitimate use distinguishable from the wound it heals.

The answer to a fifty-dollar capability is not a hundred-million-dollar gate. It is the infrastructure that makes the surgeon’s work distinguishable from the wound.

Sources: Anthropic, “Claude Mythos Preview” (red.anthropic.com, April 2026); Cloud Security Alliance, “CSA Research Note: Claude Mythos and the Autonomous Offensive Threshold” (April 2026); Mozilla Security, “The Zero-Days Are Numbered” (blog.mozilla.org, April 2026); UK AI Safety Institute evaluation of Mythos Preview (April 2026); Anthropic, “Detecting and Countering Malicious Uses of Claude” (November 2025); NIST CAISI, “AI Agent Standards Initiative” (February 2026); MITRE ATT&CK v5.4.0 release notes (February 2026); Microsoft Security Blog, “Announcing Zero Trust for AI” (March 19, 2026); Insurance Journal, “AI Use in Cybersecurity Could Show Holes in Short Term, Says Fitch” (April 16, 2026); Gartner 2026 forecast on AI-agent legal claims; Aisle/Gwern/Julia commentary reported via Zvi Mowshowitz, “Claude Mythos #2” (thezvi.substack.com, April 2026); Wikipedia, “Export of cryptography from the United States” (Netscape edition details traced to primary Netscape documentation pre-publication).

The accountability layer the essay argues is missing.

The argument reduces to one question: when your agent uses a powerful capability, can you prove what happened? Chain of Consciousness is open-source infrastructure for exactly that — a cryptographic, hash-linked, tamper-evident record of every action an agent takes (identity verified, scope documented, outcomes anchored to Bitcoin for non-repudiation). Four lines to add it to a Python or Node agent. When the insurer asks, when the regulator audits, when the post-hoc review comes — the record is there.

pip install chain-of-consciousness · npm install chain-of-consciousness
See a live provenance chain →