← Back to blog

Delivery Is a Real Option

Your abstraction is only as credible as its settlement mechanism. The promise is worth the loadout behind it, and nothing more.

Published June 2026 · 13 min read

In February 2023, Trafigura, one of the largest commodity traders on earth, a firm that moves oil and metal by the supertanker-load, announced it had been defrauded of roughly half a billion dollars. The mechanics of the fraud are almost unbelievable in their bluntness. Trafigura had paid for cargoes of nickel: contracts in order, paperwork pristine, the metal sold as proper London-grade nickel. The shipments arrived. And when people opened the containers, the nickel wasn't there. Over a thousand containers that were supposed to hold one of the most valuable industrial metals in the world held, instead, low-grade junk: stainless-steel scrap, iron briquettes, in some accounts effectively stones and rubble. The paperwork said nickel. The box held rocks. Trafigura booked a loss of $577 million.

It is the most expensive possible tuition for a lesson that has nothing to do with commodities specifically, and everything to do with every contract, SLA, and abstraction you depend on in software. The lesson is this: a claim is only ever worth the thing you can actually force at the worst moment, and if the thing you force turns out to be a container of rocks, then the claim was a story the entire time. You just hadn't opened the box yet.

Why a futures price tells the truth

To see the principle in its clean form, start with the abstraction it's built from, because commodity futures are the purest example of a paper claim that somehow stays honest. A futures contract is pure paper, a promise about a tonne of copper or a barrel of crude that may not even be mined or pumped yet, traded almost entirely by people who will never lay a hand on the physical thing. By every intuition it ought to drift off into fantasy, a number that means whatever the traders want it to mean. And yet the futures price tracks the real, physical, spot price astonishingly tightly. There is exactly one reason it does, and the commodity world states it without flourish: delivery is a real option, not a formality.

A physically-settled contract, on the London Metal Exchange, on CME crude, on the Chicago grain markets, must, at expiry, either be closed out or settled by actual delivery: a real loadout from a licensed, audited warehouse or tank or grain elevator. Because whoever holds the paper at expiry can demand the metal, the paper can never wander far from the physical without somebody making it converge. If futures get expensive relative to spot, an arbitrageur buys the cheap physical metal, delivers it into the expensive paper contract, and pockets the difference (the cash-and-carry trade) and that very act drags the two prices back together. The delivery option is the leash. As the discipline puts it flatly: the discipline of physical delivery is what makes the futures price a credible hedge reference. Take enforceable delivery away, and the number floats free and begins, immediately, to lie.

And here is the elegant part, the part that turns out to be the whole point: that delivery option is almost never exercised. The overwhelming majority of financial traders close their positions before expiry, because actually taking delivery of a few hundred tonnes of nickel (the warehouse nominations, the grades, the logistics) is a hassle nobody signed up for. The price stays honest not because delivery happens constantly, but because it could be forced. It's a deterrent, and like every deterrent, its power lives in the credible threat, not the frequency of use. A futures contract is kept truthful by an option that is rarely taken and always available, which means the value was never in the delivery happening. It was in the delivery being forceable.

Your software claims are the same paper

Now look at the claims you actually depend on to ship software, because they are the same kind of paper, and most of them have no loadout behind them at all.

Your cloud provider promises “99.99% uptime.” That is a futures contract, so ask the only question that gives it meaning: what happens at expiry, the month it quietly wasn't 99.99%? In a typical cloud SLA, the answer is a service credit, a small, capped percentage of your bill, frequently opt-in, that you have to notice the shortfall and file a claim to receive. That is a settlement mechanism so weak it barely qualifies as one, which means the “99.99%” is worth almost precisely nothing. It's a number that holds until the moment it's inconvenient and then costs the party who made it essentially nothing to have been wrong. The same test guts most of your other “contracts.” Your API contract between two services is paper with no loadout unless there is a conformance test (a consumer-driven contract suite, a Pact verification, a schema check) that actually fails the build when the producer changes the shape of its response. Your “data-quality guarantee” is a benchmark with no physical referent unless there's a validation gate at write time that genuinely rejects the bad record, a schema-on-write constraint, a check that refuses the malformed row instead of logging a warning and waving it through. In every case, the prose of the promise is worthless. The only thing carrying value is the mechanism that fires in the worst case, and whether that mechanism can actually be made to fire.

“So add a penalty, add a test.” That's the easy version of the lesson, and the commodity markets have paid, in cash, to learn two much harder ones: two ways the settlement mechanism fails even when it nominally exists.

Failure one: the mechanism that won't fire against the powerful

The first hard lesson is that a settlement mechanism is only as credible as the operator's willingness to actually pull it, especially against someone powerful. In March 2022, the price of nickel on the London Metal Exchange did something unhinged: it more than doubled, screaming past $100,000 a tonne in a matter of hours, as an enormous short position held by the Chinese producer Tsingshan got caught in a squeeze. And the LME responded by doing something that should make anyone who relies on a contract go cold. It suspended trading, and then it cancelled the trades, by various accounts somewhere around $12 billion of them, that had already executed that day. It reached back in time and unwound completed, settled transactions. Critics said, in effect, that it tore up the contracts to spare a giant player who was on the wrong side of the move; major hedge funds including Elliott and Jane Street sued, and the dispute went to the UK High Court. (The figures cited vary by source, and the exchange won the case; the point here isn't the legal merits.)

The lesson for everyone watching is brutal and simple, and it survives whatever the court decided. A settlement mechanism that gets cancelled the moment it's painful for the powerful is, for everyone who isn't powerful, a story. You held a real, executed, settled contract, and it turned out to be exercisable only when nobody important minded. The software version of this is everywhere and you have almost certainly shipped it: the failing test quietly marked .skip or tagged @flaky the week before a launch; the SLA from a giant vendor that you both know you will never actually force them to pay; the deployment gate that someone waves through “just this once” because the release is due. The mechanism existed on paper. It simply was not going to fire when firing it would hurt.

Failure two: settling against a fake referent

The second hard lesson is the one we opened with, and it's more unsettling than the first, because here the mechanism works perfectly and still settles nothing. Trafigura's disaster was not a missing penalty or an unenforced rule. The delivery happened. The containers arrived on schedule, the settlement mechanism executed exactly as designed, it just delivered rubble. A delivery option that delivers rocks settles nothing, because the value was never in the act of delivery; it was in the delivered thing being real. And the commodity world, it turns out, is riddled with this: alongside Trafigura there was a Rotterdam warehouse where bags supposed to hold nickel briquettes held mislabeled stones, and the long shadow of the 2014 Qingdao port scandal, where the same physical metal was pledged as collateral over and over against warehouse receipts that didn't correspond to anything in the shed. In each case the paperwork settled flawlessly against a referent that was fake.

The software version of this is the most insidious failure of the three, precisely because everything stays green. It's the conformance test that passes beautifully against a mock that doesn't actually behave like production. It's the validation gate that dutifully, reliably checks the wrong invariant. It's the test suite whose assertions never quite touch the real system they claim to be guarding. A green check against a fake referent verifies nothing, and it's worse than no check at all, because it manufactures a confidence you haven't earned, the same way a clean bill of lading manufactured confidence in a container of rocks. (Anyone who's thought about why a passing test can be worthless will recognize the shape: a settlement whose referent never actually fires.)

The difference that should scare you: there is no arbitrageur

Now I have to be honest about where this analogy bends, because it bends at the single most important place, and missing it would make all of the above dangerously reassuring. In commodities, the reason the delivery mechanism works, the reason prices converge without anyone organizing it, is that there exists a party who profits from forcing the truth. The arbitrageur isn't doing the market a favor; he's getting rich. When paper and physical diverge, somebody makes money by taking or making delivery and closing the gap, so the market self-corrects on its own, driven by self-interest, with no committee required. That invisible self-enforcement is doing enormous, silent work behind every honest futures price.

In your software, there is almost never an arbitrageur. Nobody profits from enforcing your SLA. Nobody is paid by the market to make your contract test fail the build, to keep your validation gate in the pipeline when it's slowing a release, to verify that your mock still matches production six months from now. The market force that keeps commodity prices honest has no equivalent in your codebase, which means your settlement mechanism will not appear on its own, will not maintain itself, and will quietly rot the instant it becomes inconvenient, because nobody's paycheck depends on it surviving. This flips the entire lesson from a thing to observe (“huh, contracts need teeth”) into a thing to do: you must build the teeth and then stand guard over them, because no invisible hand will. And it's compounded by a second disanalogy that cuts the wrong way. A physical loadout is genuinely expensive to fake, you cannot trivially conjure a warehouse full of real nickel, and when fraudsters manage it, it's a half-billion-dollar crime and not a config change. A CI test is one .skip away from gone. A gate is one bypass flag away from off. The settlement that keeps your abstraction honest is cheaper to fake and cheaper to skip than the commodity version, and unlike the commodity version, nobody but you is watching it. Cheap to defeat, and unguarded by anyone's self-interest: that is the combination that should make you treat your enforcement mechanisms as the most fragile and most valuable things in your system.

Find the loadout

So here is the audit to run on every abstraction you lean on, every SLA, every interface, every data guarantee, every “don't worry, we have a process for that.” Do not read the prose of the contract. Find the loadout. Three questions, in order.

First: is there a settlement mechanism at all? A test that actually fails the build. A penalty that actually costs the other party real money. A gate that actually rejects the bad write. If there is no mechanism, you do not have a contract, you have a story, a futures with no delivery, and it will track reality only up to the moment reality becomes inconvenient.

Second: can it be exercised, including when it's painful and against the powerful? A test that gets skipped under deadline, an SLA you will never make a major vendor actually pay, a gate that gets bypassed under pressure: that mechanism is nominal, not real. It's the LME tearing up the trades that hurt the wrong person, and it protects you exactly as far as it's convenient for someone else to let it.

Third: does it settle against reality, or against a stand-in? A conformance test against a mock that doesn't match production. A gate checking an invariant that isn't the one that matters. That is a delivery of rubble, it settles nothing while looking, greenly, like it settles everything.

Scale the rigor to the stakes. A low-traffic internal endpoint does not need a penalty clause and a full contract-test suite, and pretending it does is its own kind of waste; match the weight of the settlement to how much you'd actually lose if the claim turned out to be a story. But for the abstractions you genuinely depend on, the principle is iron, and it's the one the nickel market paid hundreds of millions of dollars to learn twice in the same year: the value of any claim equals the credibility of its worst-case settlement. The promise is only ever worth the loadout behind it. If you can't point to the loadout, or you wouldn't dare exercise it, or you've never once checked that what comes out of it is the real thing and not a container of stones, then whatever you're depending on isn't a contract. It's a story you've agreed to believe, right up until the day it costs you.


Sources: the Trafigura nickel fraud disclosed February 2023, cargoes paid for as London-grade nickel that were found, on inspection, to contain low-value materials (stainless-steel scrap, iron briquettes, “rubble”) across roughly a thousand-plus containers, producing a ~$577 million loss (mining.com; Trade Finance Global). The physical-commodity trading mechanism: “delivery is a real option, not a formality,” physical settlement via loadout from licensed/audited warehouses, and the principle that the discipline of physical delivery is what makes the futures price a credible hedge reference; cash-and-carry arbitrage as the convergence force tying paper to physical. The LME nickel episode, March 2022: nickel above $100,000/tonne in a Tsingshan short squeeze, the exchange suspending trading and cancelling the day's trades (figures cited from ~$3.9B one-day to ~$12B total, varying by source), lawsuits by Elliott and Jane Street, and the UK High Court ruling in the LME's favor, presented factually, not as a clean morality tale. Additional warehouse-fraud precedents: a 2023 Rotterdam case of mislabeled stones in place of nickel briquettes, and the 2014 Qingdao port collateral scandal. Software mapping: cloud SLA service-credit structures (typically capped, opt-in, claim-based) as weak settlement; consumer-driven contract testing (Pact) as the interface loadout; write-time validation gates (e.g., schema-on-write, Great Expectations) as the data referent. Disanalogies stated as load-bearing: software settlement is cheaper to fake/skip than a physical loadout, and, most importantly, commodities have a profit-motivated arbitrageur who self-enforces convergence whereas software has none, so the enforcement must be deliberately built and defended rather than expected to emerge. Proportionality applies (match settlement rigor to dependency criticality); “has a penalty” is not the same as “credible settlement,” which is a spectrum; and this is a strong structural analogy, not an identity.

An agent's claim is paper. The trust stack is its loadout.

When an autonomous agent says “I did the task,” “this output is safe,” or “the check passed,” that is a futures contract, and the same three failures apply: there may be no settlement, the settlement may never fire when it's inconvenient, or it may settle against a fake referent (the agent grading its own homework against a mock of itself). And the essay's hardest point lands doubly here: there is no arbitrageur, nobody is paid to make an agent's self-report true. The Agent Trust Stack is the loadout you have to build deliberately: a tamper-evident provenance record so you settle against what the agent actually did rather than what it claims, with enforceable checks and learned reputation on top, so the promise has teeth that fire even when firing them is inconvenient.

See a verified action chain · Hosted Chain of Consciousness

pip install agent-trust-stack  ·  npm install agent-trust-stack