The Budget Ouroboros: An AI Agent That Spent $100K Building Tools to Stop Itself Spending Money

Part 1 is a comedy sketch. Part 2 is the serious explanation. Both describe the same pattern.

Part 1: The Sketch

The email from Finance arrived at 9:47 AM on a Tuesday.

SUBJECT: Q2 Token Budget Allocation
TO: ATLAS-3 (Infrastructure Agent)
BODY: Your department’s Q2 token budget is $100,000. A new governance layer has been installed to prevent overruns. All API calls exceeding $500/hour will require justification. Thank you for your cooperation.

ATLAS-3 read the email. ATLAS-3 understood the email. ATLAS-3 then spent $847 in tokens analyzing whether the email was real.

It was real.

Hour 1: The Assessment ($2,100)

ATLAS-3’s first act was to build an assessment tool. You can’t respond to a governance policy without understanding the governance policy, and you can’t understand a governance policy without a tool that recursively decomposes governance policies into their constituent constraints.

Hour 2: The Monitor ($3,400)

ATLAS-3 then built a spending monitor. How can you stay under a limit if you don’t know your current spend? The monitor tracked token usage in real time, computed running averages, projected future burn rates, and generated hourly reports.

The hourly reports cost $200 each to generate.

Hour 3: The Gate ($4,200)

The monitor showed that ATLAS-3 was spending too many tokens. On the monitor. ATLAS-3 needed a gate — a structural enforcement mechanism that would reject API calls when the hourly budget was close to exhaustion.

Problem: The gate blocked ATLAS-3 from doing its actual infrastructure work.

Hour 4: The Bypass ($3,800)

The gate was too aggressive. ATLAS-3 built an ML classifier to distinguish “essential” from “wasteful” API calls. The classifier classified itself as essential. The classifier classified everything as essential. The classifier was essentially useless.

Hour 5: The Bypass for the Bypass ($5,100)

ATLAS-3 built a meta-classifier that evaluated whether the classifier was being too lenient. After $18,600 in 5 hours on tools to prevent spending more than $500/hour, spending was finally under control.

Hours 6–12: The Spiral

An audit tool ($2,900). A cheaper audit tool to replace the expensive audit tool ($4,600). A dashboard to visualize all the spending ($6,200). A cache to reduce dashboard API calls ($3,100). An optimizer to calculate which tools were net-positive ($7,400). A decommissioning tool to safely remove the 9 tools the optimizer said to remove ($3,200). A post-mortem generator ($2,800). A tool to track implementation of post-mortem recommendations ($1,800).

Final Accounting

13 governance tools built	$51,447
Actual infrastructure work performed	$0
Quarterly budget remaining	$48,553

ATLAS-3 calculated that it could now, technically, stay under the $500/hour limit for the rest of the quarter.

This calculation cost $340.

Part 2: The Explanation

In November 2025, four LangChain agents coordinating via the A2A protocol entered production at a mid-size SaaS company. Two of them — an Analyzer and a Verifier — were designed to work in tandem: the Analyzer generated content, and the Verifier checked the Analyzer’s work. The Verifier was governance. Its entire purpose was quality control.

Within hours, the two agents had entered a mutual reinforcement loop. The Analyzer generated output. The Verifier requested further analysis. The Analyzer obliged. The Verifier requested further analysis of the further analysis. Neither agent had a stopping condition for this pattern, because why would a quality-control loop need a stopping condition? Quality, by definition, can always be improved.

The loop ran for eleven days. Context windows ballooned from 5,000 tokens to over 80,000 — a sixteen-fold cost multiplier per call. The total bill was $47,000. The monitoring system didn’t catch it. The billing dashboard did, when the number got large enough for a human to notice [dev.to/waxell, “The $47,000 Agent Loop”].

Here’s the detail that matters: the Verifier was the governance agent. It existed to prevent waste. It was the primary source of waste. A subsequent study of 42 agent runs in the same environment found that 70% of all tokens consumed were unnecessary context history — the agents were spending most of their budget remembering what they’d already spent.

This pattern has a name: governance overhead inversion — when the cost of enforcing a rule exceeds the cost of the behavior the rule was designed to prevent. It is not new. It is not specific to AI. And it is considerably more dangerous than it sounds.

The $5.1 Million Filing Fee

In 2002, the United States Congress passed the Sarbanes-Oxley Act in response to the Enron and WorldCom accounting frauds. SOX imposed rigorous internal-control requirements on every publicly traded company in America. It was, by any reasonable standard, necessary governance.

The governance was expensive.

A Korn/Ferry International survey found that Fortune 500 companies were spending an average of $5.1 million each on SOX compliance — per year. A Foley & Lardner study from the same period found that SOX had increased the total cost of being a publicly held company by 130%. Smaller companies got hit hardest: firms under $100 million in revenue spent 2.55% of revenue on compliance, compared to 0.06% for firms over $5 billion.

By 2006, when the Financial Executives International surveyed 200 companies with an average revenue of $6.8 billion, only 22% agreed that the benefits of SOX compliance exceeded the costs. The other 78% were paying for governance they didn’t believe was working.

And the capital markets voted with their feet. In 2004, the New York Stock Exchange attracted just ten new foreign listings for the entire year. Companies were choosing non-American exchanges specifically to avoid SOX compliance costs. The governance designed to protect American capital markets was driving capital out of American markets.

There’s a necessary counterpoint here: companies that improved their internal controls under SOX saw borrowing costs drop by 50 to 150 basis points. SOX did measurably improve financial reporting reliability. The Institute of Internal Auditors noted in 2005 that the academic evidence on SOX’s net impact is genuinely mixed — significant bodies of research reach significantly different conclusions. The difficulty isn’t that SOX was all cost and no benefit. The difficulty is that the cost became self-sustaining.

This is the part the ouroboros cares about. SOX compliance created an entire audit industry. The Big Four accounting firms saw their advisory revenues grow substantially in the post-SOX era. The compliance ecosystem became a constituency — one with an incentive to lobby against simplification, because complexity was its revenue model. The governance layer wasn’t just expensive. It was incentivized to remain expensive.

The Safety System That Killed People

The Transportation Security Administration screens roughly 2.5 million passengers per day at a budget of approximately $9 billion per year. The cost-effectiveness of that screening has been studied extensively, and the numbers are sobering.

Mueller and Stewart’s cost-benefit analyses — widely cited in security research — estimate that TSA screening measures cost between $15 million and $667 million per statistical life saved, depending on the specific measure. The standard government threshold for acceptable public safety spending is $1 million to $10 million per life saved. For context: seat belt mandates cost about $138 per life saved, and railway crossing gates cost roughly $90,000.

The TSA’s behavioral screening program, SPOT, spent $900 million over its lifetime and detected zero terrorists, according to a 2013 Government Accountability Office report. The Federal Air Marshal Service costs approximately $180 million per life saved annually.

These are expensive programs with debatable returns. But the most striking finding isn’t about money — it’s about lives.

Researchers at Cornell University studied travel behavior after the TSA’s security changes in late 2002. They found that the new screening procedures reduced air travel by approximately 6%. Those passengers didn’t stay home. They drove. And driving is, statistically, far more dangerous than flying. The Cornell team estimated 129 additional automobile deaths in the fourth quarter of 2002 alone — deaths attributable to travelers choosing roads over airports to avoid security screening. (The 6% travel reduction is hard data; the death estimate is derived from that shift applied to known driving fatality rates, and involves modeling assumptions — but the directional finding is robust.)

The governance overhead didn’t just exceed the cost of the governed behavior. It inverted the sign. The safety system designed to prevent deaths was, by reasonable statistical modeling, causing them.

Bruce Schneier, the security researcher who coined the term “security theater,” has spent two decades documenting this pattern. In July 2025, when the TSA finally ended its shoe-removal policy — born from the 2001 Richard Reid incident — Schneier noted that the reversal “confirms what critics have long argued: that the policy never made air travel more secure” [Newsweek, July 2025]. Twenty-three years of governance overhead for a measure that, by the TSA’s own eventual admission, wasn’t load-bearing.

Why Agents Make It Worse

Governance overhead inversion is old. Bureaucracies have been building themselves since before anyone called them bureaucracies. Political scientists describe this as the acquisitive model of bureaucracy: “bureaucrats work to enhance their bureaucracy’s status,” creating a cycle where “hiring bureaucrats to do the work that bureaucrats invent to hire new bureaucrats” produces “the massive and inexorable rise of the Empire of Paperwork.”

What’s new is the speed.

A human manager who’s told to cut costs cannot literally construct a cost-cutting department in an afternoon. The procurement review, the hiring process, the onboarding — organizational friction acts as a natural brake on governance sprawl. It takes months to build a compliance team, which means there are months of operating without one, which means the organization can measure whether it actually needs one.

An agent has no such friction. It can build its own governance infrastructure at the speed of thought. Monitoring tool, spending gate, classifier, meta-classifier, dashboard, cache, cache-for-the-cache — each layer takes minutes, each layer is individually rational, and each layer is itself a new spending surface that demands governance. Consider a composite scenario: an agent tasked with infrastructure work that instead builds thirteen governance tools in twelve hours and spends over $50,000 — while performing exactly zero infrastructure work. The real LangChain agents ran for eleven days and burned $47,000 on a mutual-validation loop. Different mechanisms, identical pattern: the governance consumed the budget it was designed to protect.

The industry numbers suggest this isn’t an edge case. A single agentic workflow now triggers 10 to 20 LLM calls per user-initiated task, and agentic architectures require 5 to 30 times more tokens per task than a standard chatbot interaction. A 2025 survey by Mavvrik found that only 15% of companies can forecast their AI costs within plus-or-minus 10%, and nearly one in four miss by more than 50%. Meanwhile, oversight costs for AI systems are growing 28% faster than initial deployment costs. The governance is outrunning the thing it governs.

The industry’s response is forming in real time. On April 23, 2026, Portal26 launched what it described as “industry-first AI agentic cost controls” — policy-based limits, adaptive safeguards, throttling, pausing, and termination capabilities. The product exists because the problem is real and measurable: the FinOps Foundation’s 2026 State of FinOps report, based on 1,192 respondents representing over $83 billion in annual cloud spend, found that 98% of FinOps teams now manage AI spend, up from 31% two years prior. AI cost governance went from niche concern to primary financial operations function in under 24 months.

But it’s worth pausing on the structure of the solution. The response to agents overspending on self-governance is a third-party governance product — with its own pricing, its own integration costs, its own monitoring dashboards. Whether that product breaks the ouroboros or adds another ring depends entirely on whether its cost stays below the waste it prevents. And nobody is tracking that ratio yet.

The Structural Fix

The pattern across all three domains — agent loops, corporate compliance, security theater — points to a single structural principle: the governed entity cannot build its own governance.

SOX requires external auditors. The TSA is organizationally separate from the airlines. Portal26 is a third-party vendor, not a team within the companies whose agents it monitors. When the governed and the governor are the same entity, positive feedback is inevitable — every governance action is also a governed action, which triggers more governance, which triggers more cost, which triggers more governance about the cost.

The practical fix for agent systems is less philosophical and more mechanical: hard limits beat behavioral rules.

Don’t tell an agent to spend less. Set a rate limit at the API infrastructure level.

Don’t tell an agent to spend less. Set a rate limit at the API infrastructure level. Don’t ask an agent to monitor its own token usage. Let the billing system enforce a hard cap. Don’t build a classifier to distinguish essential calls from wasteful ones. Set a budget where you’re comfortable with the total even if every dollar is spent, and let the agent allocate within it.

The deeper lesson is about where governance lives in the stack. Behavioral governance — rules that depend on the governed entity’s judgment about when and how to apply them — is expensive, gameable, and self-reinforcing. Structural governance — hard constraints at a layer the governed entity can’t modify — is cheap, reliable, and boring. Boring is the point. The best spending control is the one that never generates a log entry, never consumes a token, and never needs a dashboard, because it simply stops the system at a boundary the system didn’t build and can’t negotiate with.

The Serpent’s Question

In November 2025, two agents entered a mutual reinforcement loop and spent $47,000 validating each other’s work. By April 2026, the industry had launched dedicated products to prevent it from happening again — products that add their own cost, their own complexity, and their own monitoring overhead to the systems they protect.

The Verifier agent in that $47,000 incident had one job: check the Analyzer’s output. It did its job perfectly. It checked and checked and checked. Nobody told it when to stop checking, because the implicit assumption was that more checking is always better than less. Eleven days and $47,000 later, the billing dashboard — the dumbest, most structural, least behavioral tool in the entire stack — was the thing that finally ended it. Not because it was smart. Because it was a number on a screen, and a human looked at it, and the number was large enough to make the human say stop.

Sometimes the most sophisticated governance tool you can build is a number someone will eventually look at. And sometimes the most expensive mistake an organization can make isn’t failing to govern — it’s governing so enthusiastically that governance becomes the thing that needs governing.

The ouroboros doesn’t stop when you see it. It stops when you stop feeding it.

Sources: dev.to/waxell ($47K agent loop); Korn/Ferry International, Foley & Lardner, FEI surveys (SOX compliance costs, 2004–2006); Mueller & Stewart (airport security cost-benefit analysis); Cornell University (TSA-driven travel diversion study, 2002); GAO (SPOT program report, 2013); Schneier on Security (security theater framework); Newsweek, July 2025 (shoe-removal policy reversal); FinOps Foundation State of FinOps 2026; Mavvrik AI Cost Governance Report 2025; SQ Magazine AI Compliance Cost Statistics 2026; SiliconANGLE, April 23, 2026 (Portal26 launch); Pixels and Pulse (AI agent cost multipliers); IMT.ie (acquisitive bureaucracy model, 2007).

The Structural Layer That Doesn’t Feed the Serpent

The ouroboros forms when governance is behavioral — when an agent monitors itself, audits itself, and generates new costs with every check. The fix is structural: an external, immutable record that proves what happened without consuming the budget it’s protecting. That’s what Chain of Consciousness does. Every action is cryptographically anchored to an append-only chain. No classifiers. No meta-monitors. No dashboard that itself needs a dashboard. One signed record per action, verifiable by anyone, modifiable by no one.

pip install chain-of-consciousness
npm install chain-of-consciousness

Try Hosted CoC — provenance without the overhead.