← Back to blog

Your CI/CD Pipeline Has a Bullwhip Effect

The bullwhip is not a metaphor. It is a control-theory instability with four named causes and a queueing-math amplifier, and it predicts which of your safety habits are quietly generating the chaos they were meant to prevent.

Published June 2026 · 12 min read

In 1989, an MIT professor named John Sterman sat groups of his smartest students down to play a board game about beer. It's called the Beer Distribution Game, and it is rigged in a very particular way. Four players form a supply chain, retailer, wholesaler, distributor, factory, and each can see only the orders coming from the player directly downstream. Customer demand is almost insultingly boring: it sits flat at, say, four cases a week, ticks up once to eight, and then stays flat forever. One small, permanent bump.

By the end of the game the factory is convulsing. It has over-produced into a glut, then slammed the brakes into a backlog, then over-produced again, wild oscillations of fifty, a hundred cases, lurching back and forth for the rest of the simulation, all triggered by a single four-case step that happened weeks earlier and never moved again. The students are not stupid; they are MIT students. Every one of them made a locally rational decision. The system thrashed anyway.

This is the bullwhip effect, and your CI/CD pipeline is playing the beer game right now. The steady "demand" is your team's commits, the four players are your build, test, integration, and deploy stages, and the convulsions are the things you blame on tooling: the red/green flapping, the queue that explodes for no reason on a Tuesday, the release that went sideways. The good news is that the beer game has a mechanism, not just a vibe, and once you can see the mechanism, you can name exactly which of your pipeline rituals are making the swerve worse.

The analogy is the easy part

Let's get the obvious out of the way, because it's been said. The bullwhip effect, small fluctuations at the consumer end of a supply chain amplifying into ever-larger swings upstream, was formally dissected by Hau Lee, V. Padmanabhan, and Seungjin Whang in a 1997 Management Science paper, and people have absolutely noticed that a software delivery pipeline looks like a supply chain. The New Stack has run the comparison; "the AI bullwhip" is a 2026 meme. "CI/CD is like a supply chain, so use small batches" is not a fresh insight, and it's not what's worth your time.

What's worth your time is that the bullwhip is not a metaphor. It is a control-theory instability with four named causes and a queueing-math amplifier, and when you map the actual machinery onto your pipeline, it stops being a cute parallel and starts being a diagnosis, one that predicts which of your safety habits are quietly generating the chaos they were meant to prevent.

The four causes, bolted onto your pipeline

Lee and his colleagues didn't say "demand gets amplified." They identified four specific causes, and each one has an exact twin in your delivery pipeline.

One: demand-signal processing, reacting to the signal, not the state. In a supply chain, each stage forecasts from the orders it receives, not from real end-customer demand, so it over-reacts to noise in the order stream. In your pipeline, teams react to the build status, the queue length, the flaky red test, the derived signal, rather than to whether the code is actually broken. The mass-revert, the panicked merge-freeze, the retry storm: these are over-reactions to a received signal that may have nothing to do with the true state. You saw a red light; you don't yet know if the bridge is out.

Two: order batching, your big PRs and monthly releases. Lumpy orders create erratic upstream streams. Big pull requests, infrequent merges, and the monthly release are the literal bullwhip cause, transcribed into git. This is not a new observation dressed up: continuous integration was invented, twenty years before anyone said "bullwhip" in this context, precisely to kill it. "Integrate small and often" is an anti-bullwhip control law that predates the diagnosis.

Three: rationing and shortage gaming, CI-capacity hoarding. When a scarce resource is allocated by how much you order, rational players over-order to secure their share, manufacturing phantom demand. When CI runners or shared environments are scarce, teams over-request, hoard, and fire redundant early pipeline runs to jump the queue, phantom load that makes contention worse, which provokes more gaming. That's Lee's shortage-gaming cause, running verbatim inside your build farm.

Four: price fluctuation and forward-buying, the deploy freeze. Promotions make buyers hoard, distorting demand into spikes. The pre-launch or holiday deploy freeze does exactly this: it dams up a pent-up batch of changes that then ships big-bang the instant the freeze lifts, a forward-buy spike you built with your own hands. The freeze is not a brake on the bullwhip. It is a bullwhip generator.

Why it's physics: gain times delay

Here is the part the listicle versions skip, and it's the part that earns the rigor. The bullwhip is a delayed feedback loop going unstable, and instability is a precise, provable condition, not a feeling.

Jay Forrester's Industrial Dynamics established it and later analyses made it exact: in a corrective feedback loop, when the delay between action and observed result passes a critical threshold, the system's response stops settling and starts oscillating, persistently. And the magnitude that controls this is the product of the feedback gain (how hard you correct) and the lead time (how long until you see the result). A 2017 analysis in IISE Transactions put the uncomfortable conclusion plainly: the interaction of gain and lead time "can generate sustained oscillations even with constant demand," and "stronger corrective reactions combined with significant lead times amplify rather than mitigate."

Translate that into your pipeline and it should raise the hair on your neck. A high gain, you over-react to a red build with a mass revert and a freeze, plus a long delay, your CI takes forty minutes to tell you anything, produces red/green thrash and deploy-rollback cycles even when your team's commit rate is perfectly steady. The oscillation isn't coming from chaotic developers. It's coming from the loop. And the corollary is the actionable bit: the mass-revert is not the cure for the thrash, it's the fuel. You cannot fix a gain-times-delay oscillation only by going faster. You also have to lower the gain, stop yanking the wheel.

Why more runners won't save you: Kingman's amplifier

The second piece of machinery is the variability amplifier, and it also comes with an equation. Kingman's formula says queue wait is proportional to [utilization ÷ (1 − utilization)] × variability, and two counterintuitive things fall out of it.

First, as utilization climbs toward 100%, that first factor doesn't grow, it explodes. A pipeline you've "efficiently" loaded to 95% busy has queues that blow up at the smallest disturbance, because 0.95 ÷ 0.05 is nineteen and 0.99 ÷ 0.01 is ninety-nine. Efficiency, past a point, is fragility. Second, and this is the one that should redirect your budget, reducing variability lowers wait time more than speeding up the process does. That's not hand-waving; it's now been worked out specifically for build systems, in a 2025 paper on queueing theory for large-scale CI/CD pipelines.

Put the two machines together. The bullwhip's entire job is to inject variability: lumpy batches, gaming spikes, freeze-and-flush pulses. A near-saturated pipeline then takes that variability and amplifies it into exponentially exploding queues. So the universal instinct, "CI is slow, add more runners," is treating the symptom: it nudges the utilization term while the variability term, the one actually multiplying your pain, goes untouched. Smaller batches, no gaming, no freezes, and killing your flaky tests attack the factor that's exploding. Donald Reinertsen made the foundational version of this point back in 2009 in The Principles of Product Development Flow: the root cause of poor development performance is invisible, unmanaged queues, and the levers are batch size, work-in-progress limits, and fast feedback. The bullwhip framing simply names the instability his queues produce.

The numbers say smaller is safer

And it's measurable, which is what separates this from supply-chain poetry. Deployment frequency, it turns out, is effectively a proxy for batch size: deploy more often and, mechanically, you're deploying smaller. The DORA research behind Accelerate (Forsgren, Humble, and Kim, 2018) found that the teams with the smallest batches, the "elite" performers, out-ran the low performers by 182× on deploy frequency, 127× on lead time, 8× on change-failure rate, and 2,293× on time to recovery.

Stare at the change-failure number, because it inverts the folk wisdom that runs every deploy freeze. The teams that ship constantly, in tiny slivers, fail less often, not more. Smaller and faster is safer. It is correlational (batch size co-varies with overall engineering maturity, and elite teams differ in many ways), but the mechanism above tells you why the arrow plausibly points this direction: small batches inject less variability and carry less blast radius per event, so the bullwhip has less to amplify and each failure is cheaper.

The 2026 instance: the AI bullwhip

Which brings us to the freshest version, the one currently unfolding in every shop that just rolled out AI coding assistants. Code generation is one stage of the pipeline, and AI just made it roughly 3× faster, while test, review, and deploy stayed exactly as fast as they were last year. That is the beer game with a Copilot in the retailer's chair: accelerate a single upstream stage without scaling the downstream, and the locally rational productivity win becomes a global pile-up, ballooning defect backlogs, review queues stretching for days, release gridlock that quietly eats the velocity it promised.

All four causes light up at once. More generated code means bigger batches (cause two). Overwhelmed reviewers ration their scarce attention and teams start gaming it (cause three). The backlog forces freeze-and-flush release trains (cause four). And everyone reacts to the swelling queue rather than the true state of the code (cause one). The org didn't get 3× delivery. It moved the bottleneck downstream and handed the bullwhip a bigger amplitude.

The cure is the same four levers

Here's the payoff for doing it rigorously: the cure falls out for free, because it is isomorphic to the supply-chain cure that's been known since the nineties. Four levers, each undoing one cause.

Share the true state, not the signal. The supply-chain fix for demand distortion is to share real point-of-sale data so every stage sees actual demand. The pipeline fix is observability and trunk visibility, react to whether the code genuinely works, not to a flaky red light. Better information lowers the gain.

Reduce batch size. Trunk-based development, continuous integration, small PRs. Integrate small and often, the original anti-bullwhip control law.

Reduce lead time. Fast, parallel builds and tests, so the feedback delay drops below the threshold where the loop can oscillate at all.

Stop gaming. No runner hoarding, no deploy freezes, no big-bang release trains. The rituals you adopted to feel safe are precisely the ones manufacturing the spike you're afraid of.

Two honest caveats, because "small batches fix everything" is itself a glib over-correction. Reinertsen's real point is that optimal batch size is an economic tradeoff, per-deploy fixed cost against the cost of delay and risk, so the rule is "reduce toward the economic optimum," not "batch of one, always," and tiny batches without fast automated feedback just let you thrash faster. And the analogy has a clean seam worth respecting: software has zero marginal manufacturing cost and near-perfect, instant information, so the physical bullwhip causes (transport batching, shipping lead times) map only loosely, while the information-distortion and feedback-delay causes, the ones that actually bite a pipeline, map almost exactly.

What to do Monday

The most useful thing this cross-domain view buys you is a diagnosis you can run in your head, the next time the pipeline is convulsing. Don't ask "how do we go faster" or "how many runners do we add." Ask the beer-game questions: Where am I reacting to a signal instead of the state, over-reverting on a test that's just flaky? Where am I batching, big PRs, monthly releases, the pre-launch freeze? Where am I gaming a scarce resource, hoarding build capacity? And where is my gain too high (mass-reverting) or my delay too long (slow CI)?

Every one of those is a named bullwhip cause with a known lever, and not a single one of them is "the engineers need to try harder." That is Sterman's enduring point, the one that has held for thirty-five years across beer and code alike: the smartest people in the room produce the oscillation anyway, because the oscillation is structural. The fix was never effort or talent. It was smaller batches, shorter loops, true signals, and the discipline to stop over-correcting, the same four moves whether you're shipping lager or shipping builds.

The bullwhip doesn't care what's flowing through the pipe. It cares about batches, delays, and how hard you yank the wheel, and the freeze-and-flush rituals we reach for when we're scared are, every single time, the hand on the wheel that makes the swerve worse.


Sources: H. L. Lee, V. Padmanabhan & S. Whang (1997), Information Distortion in a Supply Chain: The Bullwhip Effect, Management Science 43(4):546–558 (the four causes). J. D. Sterman (1989), Modeling Managerial Behavior... (the Beer Distribution Game), Management Science (the canonical demonstration). J. W. Forrester (1961), Industrial Dynamics; and Behavioral Causes of the Bullwhip Effect: An Analysis Using Linear Control Theory (IISE Transactions, 2017), gain × delay generating sustained oscillation even with constant demand. J. F. C. Kingman (1961), the Kingman approximation for queue waiting time; and On Queueing Theory for Large-Scale CI/CD Pipelines Optimization (arXiv 2504.18705, 2025). D. G. Reinertsen (2009), The Principles of Product Development Flow, queues, batch size, WIP, Little's Law applied to development. N. Forsgren, J. Humble & G. Kim (2018), Accelerate and the DORA reports, batch size and the four key metrics (182× / 127× / 8× / 2,293×). Prior-art on the connection itself: The Bullwhip Effect in Continuous Delivery (The New Stack); "The AI Bullwhip Effect" (2026).

Lever one is "share the true state, not the signal." Agents need the same thing.

The bullwhip's first cause is reacting to a derived signal instead of the real state, and the cure is to give every stage the true state to react to. The same distortion hits any system that acts on an agent's output without seeing what actually produced it. Chain of Consciousness is the tamper-evident record of what an agent did to reach a result: the evidence it used, the check it ran, the step it took. It hands the next stage the true state, the real basis of a decision, instead of a status light that may have nothing to do with whether the work is sound, so you react to the code, not the flaky red.

See Hosted Chain of Consciousness  ·  See a verified action chain

pip install chain-of-consciousness  ·  npm install chain-of-consciousness