Price Your Rate-Limiter Like a Bid-Ask Spread

Adverse selection for abuse defense. Your rate limiter is a market maker — stop letting it quote a fixed spread.

Published June 2026 · 9 min read

A specialist on the floor of the New York Stock Exchange in 1985 and an infrastructure engineer at Cloudflare in 2024 have the same job, and most days neither of them would say so. Both stand between a resource everyone wants and a crowd they cannot see into. Both must answer, thousands of times a second, a single uncomfortable question: how much should I charge someone who might be trying to pick me off? The specialist answers in fractions of a dollar. The engineer answers in milliseconds of latency and CAPTCHA puzzles. But the math underneath their answers is the same math, and once you see it, the way almost everyone builds rate limiters starts to look quietly broken.

Here is the broken thing, stated plainly. A flat rate limit — "100 requests per minute per API key" — is a fixed price charged to every caller regardless of how dangerous that caller is. No market maker who survived a single quarter would ever quote a fixed spread that way. And the reason they wouldn't is a Nobel-adjacent result from 1985 that turns out to be a blueprint for abuse defense.

The market maker's dilemma

In 1985, Lawrence Glosten and Paul Milgrom published "Bid, ask and transaction prices in a specialist market with heterogeneously informed traders" in the Journal of Financial Economics. The setup is a market maker who posts two prices: a bid (what they'll pay to buy) and an ask (what they'll charge to sell). The gap between them is the spread, and the naive assumption is that the spread is the market maker's fee — a markup to cover rent and salary.

Glosten and Milgrom proved something far more interesting: the spread exists even if the market maker has zero costs and is perfectly risk-neutral. It arises entirely from adverse selection — the danger that whoever is trading against you knows something you don't. Imagine the crowd is a mix of two types. Some traders are informed: they've done the work, they know the stock is about to move, and they trade to capture that edge. Most are uninformed: they're rebalancing a pension, raising cash, buying because it's payday — their trades carry no secret.

The market maker can't tell them apart in the moment. But there's a tell in the direction of the trade. If someone steps up to buy at your ask, they are, on average, a little more likely to be informed — because informed traders buy precisely when they know the price is heading up. If someone sells to you at your bid, same logic in reverse. So the market maker reasons conditionally: the ask should equal the expected value of the asset given that a buyer just arrived, and the bid the expected value given that a seller arrived. The distance between those two conditional expectations — the spread — is exactly the expected loss from trading with the informed, averaged over the crowd.

And now the spread breathes. It widens when the proportion of informed traders rises, when their information edge grows, when volatility climbs (more to be secretly right about). It narrows when the flow is mostly liquidity traders and the asset is boring and well-understood. The spread is not a fee. It is a price on the risk of being exploited, recomputed continuously from who seems to be showing up. That single idea — price the risk, don't flat-rate it — is the whole essay. The rest is watching two more fields rediscover it the hard way.

The guardian's dilemma

Criminologists got there from the opposite direction, thinking not about price but about prevention.

In 1979, Lawrence Cohen and Marcus Felson published routine activity theory, which says a crime needs three things to converge in the same place at the same time: a motivated offender, a suitable target, and the absence of a capable guardian. Knock out any one leg of that triangle and the crime simply doesn't happen. It reframed crime away from "bad people exist" toward "opportunity has a structure" — and structure is something a defender can change.

Ronald Clarke built the practical toolkit on top, called situational crime prevention, with five levers: increase the effort required, increase the risk of getting caught, reduce the rewards, reduce the provocations, and remove the excuses. The crucial word — the one that connects Clarke back to Glosten and Milgrom — is selectively. You do not apply maximum effort everywhere. You raise the cost of offending exactly where a motivated offender is likely to meet a suitable target. A heavy lock on a bike rack on a busy commercial street is good security. The same lock on a bike in a bank vault is theater; you've spent the defense budget where the risk wasn't. The defense is priced to the risk. Spend it flat and you simultaneously over-protect the safe places and under-protect the dangerous ones.

Stack the two fields against each other and they're the same picture from two chairs:

Market microstructure	Criminology	API / LLM defense
Market maker	Capable guardian	The endpoint
Informed trader	Motivated offender	Adversarial caller
Uninformed trader	Ordinary passerby	Legitimate user
Bid-ask spread	Effort / risk imposed	Latency / challenge / cost
Wider spread when adverse selection is high	More effort where crime is likely	More friction where abuse is likely
Zero-profit, not zero-trade	Prevention, not elimination	False-positive budget, not a wall

The market maker prices the risk of the informed; the guardian raises effort where offenders cluster; both move their defense to where the danger is and leave it cheap everywhere else. The engineer building a rate limiter is doing the identical job — and usually flat-rating it anyway.

The engineer's dilemma

Picture the standard rule: 100 requests per minute per key, the same for everyone. In market-maker terms, that's quoting an identical spread to a pension fund and a hedge fund with insider information. In Clarke's terms, it's the vault lock on every bike in the city. It is mispriced in both directions at once, and the failure modes are concrete.

A legitimate analytics customer pulling 90 requests a minute bumps the ceiling and gets throttled — you've imposed real friction on the benign. Meanwhile an adversary runs 95 requests a minute across ten rotating keys — 950 requests a minute of scraping or credential-stuffing or prompt-probing — and never trips a single per-key limit. The flat rule punished the wrong party and waved the real threat straight through. A fixed spread on a $5 stock and a $5,000 stock is obviously absurd to anyone in finance; the 100-req/min flat limit is the same absurdity, and our industry ships it by default.

The closest thing to a correct implementation in the wild is Cloudflare's Bot Management, and it is Glosten-Milgrom rebuilt for HTTP whether or not its authors would put it that way. Cloudflare computes a Bot Score from 1 to 99 for each request, fusing signals the caller can't easily fake: the TLS fingerprint, the HTTP/2 fingerprint, the results of JavaScript challenges, and behavioral patterns over the session. That score is the conditional estimate — the endpoint's running guess at the probability that this particular caller is "informed," in the adversarial sense. Then the response is graded to the score. A score of 30 or above — treated as human by default — sails through on silent checks: a tight spread, no human friction. A score in the likely-automated band of 2 to 29 draws a Managed Challenge: a moderate spread a real person on a VPN can still clear while a headless browser stalls. A score of 1 — automated with near-certainty — can be blocked outright at almost no false-positive cost: the widest spread there is. (Thirty is Cloudflare's usual recommended cutoff, and it's a dial, not a law — tune it up toward 40–50 for a login endpoint, down for anonymous search traffic.) The "ask price" of making a request rises monotonically with the estimated chance the caller is hostile. That is the 1985 paper, expressed in milliseconds and compute.

It even learns the way a specialist learns. Recent research applies deep reinforcement learning to rate limiting — a DQN or A3C agent choosing among allow, throttle, and block, with a reward that folds in false positives, detection accuracy, server load, and confirmed-threat logs. That is precisely the Glosten-Milgrom dynamic in which the market maker updates its belief about the informed proportion after every trade and re-quotes accordingly. The spread is not set once; it is inferred, continuously, from the flow.

The five levers, and what they cost

The richest part of the analogy is that Clarke's five techniques map onto five distinct dimensions of the "spread" an endpoint can widen — and you can turn each knob independently, in proportion to suspicion:

Increase the effort → proof-of-work. Make a suspicious caller burn CPU before you'll answer. Trivial for a human doing one thing; ruinous at scraping scale.
Increase the risk → logging and attribution. Fingerprint, correlate across keys and sessions, and make a high-suspicion caller far more likely to be identified and banned.
Reduce the rewards → output controls. Watermark responses, coarsen or withhold the high-value fields, cap the context window you'll expose to an untrusted caller so a successful probe yields less.
Reduce the provocations → input filtering. Strip or normalize the patterns that invite abuse before they reach the model.
Remove the excuses → enforceable terms. Make the rules legible and the violation unambiguous, so escalation and bans are clean.

For an LLM endpoint specifically, the most valuable lever is selective scrutiny, and it's the one flat limits make impossible. Prompt-injection and jailbreak detection is expensive — you can't afford to run heavyweight analysis on every input. But you don't have to. Run it on the high-suspicion inputs, the way a market maker reserves its widest spreads for the trades that smell informed. The adversary's "information advantage" here is their knowledge of your injection surface; the "motivated offender" is the jailbreaker or exfiltrator; the "suitable target" is the endpoint with a flat limit and no per-caller estimate. Scrutiny priced to suspicion lets you afford real defense exactly where it's needed and nowhere else.

Three reframes make this land, and each is slightly counterintuitive. First, the spread is a price, not a punishment. Glosten and Milgrom's market maker bears no grudge against informed traders — the spread is just the honest cost of serving an unknown crowd. An adaptive limiter isn't punishing suspicious traffic; it's pricing the risk of serving it. That shift from punishment to pricing matters, because punishment frames every caller as an enemy, while pricing frames the endpoint as a market that quotes fairly to everyone on the terms their behavior implies. Second, your benign users are currently subsidizing your attackers. In the market, uninformed traders pay a spread partly to cover the maker's losses to the informed; in your API, every legitimate user who eats a CAPTCHA or a delay is paying a security tax levied by the adversaries. The entire point of an adaptive spread is to shrink that subsidy — keep the friction off the 95% who are fine and concentrate it on the 5% who aren't. Third, the goal is never elimination. Informed traders don't vanish under a wide spread; they trade less, because their edge no longer clears the cost. Adversaries facing adaptive friction don't disappear; they pay more per probe, which shrinks the scale of what's worth attempting. You are not building a wall. You are making abuse cost more than its expected payoff — which is the only victory condition that has ever been achievable anyway.

Price it like one

So here's the practical reframe to carry into your next design review. Your rate limiter is a market maker. Stop letting it quote a fixed spread.

Concretely: compute a per-caller suspicion estimate from signals that are expensive to forge — fingerprints, behavioral history, the shape of the request stream across keys rather than within a single key. Then make every cost you can impose — latency, proof-of-work, depth of scrutiny, price per call, exposed context — rise smoothly with that estimate, instead of stepping off a cliff at a flat threshold. Hold the benign majority on the tightest spread you can, because the friction they feel is a tax you are choosing to levy on your best users. Reserve your most expensive defenses — full injection analysis, hard challenges, human review — for the tail where the estimate says the danger actually is. And measure yourself the way a market maker does: not "did we block all the bots" (you won't, and chasing it bankrupts you on false positives), but "did serving this flow cost the adversary more than it was worth, while costing our real users almost nothing?"

A floor specialist worked this out forty years ago because getting it wrong meant getting picked off, trade after trade, until the desk was empty. Criminologists worked it out because spreading prevention evenly meant protecting the vault and losing the street. The endpoint you ship faces the same crowd and the same arithmetic. The flat rate limit was never the safe, neutral default it pretends to be — it's a fixed spread in a market that has always demanded a floating one. Quote accordingly.

A floating spread is only as good as the suspicion estimate underneath it.

The whole scheme depends on judging a caller from signals that are expensive to forge — and a TLS fingerprint is a guess an adversary can spoof. In the agent economy the stronger signal is cryptographic: verifiable identity (is this caller who it claims to be?) and portable reputation (how has it actually behaved, attested across endpoints?), with signed provenance for what it did. That's the Agent Trust Stack — the hard-to-forge inputs that let your spread price the real risk instead of a forgeable hunch.

pip install agent-trust-stack · npm install agent-trust-stack
vibeagentmaking.com → · See the stack in action

← Back to all posts