← Back to blog

WIP Limits

“If everything is in progress, nothing is.” The queue theory of getting things done, and why your overloaded team is thrashing.

Published June 2026 · 12 min read

Anyone who has used a computer long enough has watched one die of busyness. The CPU pins at 100%, the disk light goes solid, the fans scream, and the machine does nothing. The cursor won't move. The window won't close. It is working as hard as silicon can work and accomplishing less than it would if it were idle. Operating-systems people have had a name for this since the late 1960s: thrashing. It happens when you ask a machine to run more programs than fit in memory at once, so it spends all its time shuffling pages of memory between RAM and disk to give each program a turn, and at some point the shuffling consumes the whole machine. Throughput doesn't gently decline as you add load. It collapses. The computer is maximally busy and minimally productive, and the more you pile on, the worse it gets.

Hold that image, because your overloaded team is doing the exact same thing. When everyone is slammed and everything is “in progress” and the whole org feels maximally busy, it is thrashing, and this is not a metaphor that mostly works. It is the same queueing mathematics, and there is a theorem from 1961 that proves it about your backlog as rigorously as it proves it about the machine.

A conservation law hiding in your task board

In 1961, an operations researcher named John Little proved something so general it sounds too good to be true. For any stable system, the average number of items inside it equals the rate at which items arrive times the average time each item spends there, written L = λW. That's it. Rearrange it for a team's work and you get the single most important and most ignored equation in how things actually get done:

Cycle Time = Work-in-Progress ÷ Throughput.

The time it takes any one thing to finish equals how many things are in flight divided by how fast you complete them. And the property that makes Little's Law a law rather than a tip is that it is distribution-free. It does not care how variable your tasks are, how chaotic the arrivals, how lumpy and unpredictable the work; it holds for any stable queue, exactly, the way conservation of energy holds. It is not advice. It is arithmetic that is always true.

The one precondition worth stating honestly: “stable” means that over the long run you finish roughly as fast as you start. A team that just keeps accumulating work-in-progress forever isn't a stable queue, and the law describes a steady-state average, not the destiny of your one specific ticket. But for any team that isn't actively exploding, the relationship is iron, and reading it under pressure tells you something your instincts get exactly backwards.

The slogan is the equation

Run the numbers. Say your team finishes eight things a week, that's your throughput, and for the moment hold it fixed. If twenty-four things are in progress at once, then by the equation each one takes 24 ÷ 8 = three weeks to get done. Now suppose you want delivery in two weeks, same team, no new hires. You don't need to work harder or faster. You need to cut work-in-progress to sixteen. That's the whole move. Nobody got faster; you simply had fewer things going at once, and everything in flight finished sooner. (The exact numbers here are illustrative, knowledge-work “items” vary wildly in size and “throughput” is a noisier quantity than a server's, but the shape of the relationship is exact.)

This is what the Kanban slogan “if everything is in progress, nothing is” actually means underneath the poster, and it's what David Anderson's “stop starting, start finishing” actually is: Little's Law with the serial numbers filed off. And it exposes a genuinely cruel joke the equation plays on human nature. Under pressure, the instinct is to start more: look busy, hedge your bets, give every anxious stakeholder the small comfort of watching their request move into the “Doing” column. The math says that is precisely the wrong move. Every new thing you start lengthens the time-to-done of everything already in flight, including the urgent thing you started all this scrambling over. Pulling the fire alarm and beginning the fire drill while five other things are mid-stream does not make the fire drill finish sooner. By the cycle-time equation, it makes everything, the fire drill included, finish later. You cannot rush a queue by adding to it.

Your code already obeys this law

Here's the part that should make every engineer feel a little called out, because you already follow this law religiously, in your infrastructure, where you would never dream of breaking it. You would never set your server's thread pool to unlimited. You would never set the database's max connections to infinity. You know in your bones what happens if you do: under load the queue grows without bound, tail latency climbs to the moon, and the whole thing falls over, which is Little's Law again, work-in-progress unbounded means wait time unbounded at a fixed service rate. So you cap it. Bounded thread pools, connection pools, bulkheads, admission control, backpressure: every one of those mechanisms is a work-in-progress limit for requests, and you built each of them on purpose because you respect the queue.

Your team's backlog is that exact same queue. It is simply the one where you deleted the limit. “Stop starting, start finishing” is backpressure for humans, and the only reason it sounds like soft management advice instead of systems engineering is that nobody ever wrote it as a thread-pool configuration. This is worth being precise about, because it is not a loose analogy that holds in spirit: a thread pool serving requests and a team working a backlog are both stable queues, and Little's Law is a theorem about stable queues. It applies to each of them verbatim, the same equation, the same way. You would cap the one without a second thought. You leave the other wide open and then wonder why nothing ships.

It's worse than the slogan: throughput isn't fixed

So far I've been generous, because everything above assumed throughput stays constant as work-in-progress climbs. It doesn't, and this is the under-told part that turns a linear problem into a vicious one. When you juggle more things, you get slower at each one, because the switching between them is not free. Gerald Weinberg, in his 1992 book Quality Software Management, attached numbers to this that have stuck around precisely because they feel true the moment you read them: focus on a single task and you get roughly 100% of your time on it; add a second and you're down to about 40% each, with 20% evaporating into the cost of switching contexts; add a third and you're at something like 20% each with a full 40% simply gone. Treat those exact percentages as an illustrative model rather than a measured constant, but the direction is rock-solid and turns up everywhere, and Gloria Mark's research finding that it takes around twenty-three minutes to fully refocus after an interruption is the same tax, measured one switch at a time.

Now feed that back into the equation, and watch it stop being linear. Cycle Time = WIP ÷ Throughput. As you add work-in-progress, the numerator goes up, and because of switching, the denominator goes down at the same time. Cycle time doesn't merely rise; it explodes super-linearly, hit from both sides at once. This is why multitasking feels so much worse than the arithmetic of “I have three things” suggests: three tasks at once is not 33% of your attention on each. It's 20% each and 40% burned in the seams between them. Multitasking doesn't divide your time across your tasks. It spends a chunk of it on the switching, and the more tasks, the more it spends. It loses twice.

And then you thrash

Push that far enough and the two losses compound into the cliff from the opening of this essay. The thrashing machine wasn't a colorful intro; it's the literal endpoint of this same curve. An operating system that responds to high load by admitting still more processes eventually crosses the threshold where they no longer fit in memory together, and from that point it spends all of its time swapping pages instead of running code: throughput doesn't sag, it collapses toward zero, and every additional process makes the collapse steeper.

A person and a team thrash in exactly the same shape. Past your personal work-in-progress threshold, every minute goes into reloading context (where was I, what is this ticket even about, who was I waiting on, what did I decide last time) and the actual work the context was for gets a sliver of what's left. The output of an overloaded team does not decline gracefully as you hand it more. Past the knee, it falls, the way the thrashing machine's does. Which is the genuinely counterintuitive thing the queue is trying to tell you: there is a whole region of operation where doing less produces more, and not as a wellness slogan, as a throughput fact, the same fact that makes a swapping computer faster the instant you kill a few processes.

The belief this is meant to break: busy is not healthy

All of which arrives at the conviction this essay exists to dismantle, the one wired so deep it feels like a virtue rather than a choice: that a maximally busy team is a healthy team, and an idle slot is a failure of management. Tom DeMarco said the quiet part out loud in his 2001 book Slack, and the queueing theory backs him without reservation. Resource efficiency (everyone always busy, every slot always full) and flow efficiency (work actually moving through quickly) are not the same goal. They are opposed goals. When a worker is 100% utilized, the queue in front of them is never empty, so the fifteen-minute task you need right now sits behind everything else and takes two hours to come out the other side. The formula is merciless: the average wait in a queue scales like ρ ÷ (1 − ρ), where ρ is utilization, and as ρ climbs toward 1 (toward “fully booked,” toward the busyness you've been treating as success) that wait doesn't rise smoothly. It goes to infinity. A 100%-utilized system has, mathematically, the worst possible responsiveness of any system you could build.

So “we are maximally busy” is not the sound of a healthy team. It is the symptom. The slack you keep trying to squeeze out (the unfilled hour, the empty board slot, the person who isn't pegged to capacity) is not waste. It is the one thing that lets work flow through quickly, and the one thing that lets you absorb the genuinely urgent arrival without everything else seizing up. Optimizing for utilization is, precisely and provably, optimizing for latency. You have had the dial turned the wrong way and called it discipline.

Cap it, and watch

Here is the move, and the best thing about it is that it is free and needs no one new. Pick a number, two or three active items per person, where “active” means actually being worked right now, not “opened once and parked.” Make it a hard cap. Then flip from push to pull: you do not get to start the next thing because a stakeholder asked, or because a gap appeared in your calendar; you get to start it only when you finish something and a “Doing” slot genuinely frees up. That one rule, stop starting, start finishing, is the entire discipline. It is backpressure applied to people.

Two honest caveats so you implement it rather than game it. Pull genuinely blocked work (waiting on review, on CI, on an external party) out of the active count, rather than letting “blocked” become the hiding place where work-in-progress quietly piles back up. And don't set the limit so tight that people starve waiting; some parallelism is real and legitimate. With those guardrails, do the experiment, and the result will feel like magic even though it is only arithmetic: cycle time drops, the urgent things start shipping sooner, and you did not add a single person.

Because that is the diagnosis this whole essay was walking toward. You were never throughput-limited. The reason throwing more people at the backlog so often failed to make it faster is that people were not the constraint; you were drowning a fixed amount of real capacity under a flood of half-started work, and more hands just started more things. You were work-in-progress-limited the entire time. The fix was never to do more. It was to start less, so that you could finish anything at all, and the most urgent thing on your plate finishes soonest not when you drop everything to attack it, but when you have the discipline to start fewer other things, including, and especially, fewer other urgent ones.


Sources: John D. C. Little, “A Proof for the Queuing Formula: L = λW,” Operations Research (1961), the distribution-free conservation law, holding for any stable queue, which in Kanban terms reads Cycle Time = WIP ÷ Throughput; the illustrative 24-in-progress-at-8-per-week → 3-week cycle-time worked example (cut WIP to 16 for 2 weeks, no added headcount). The systems near-identity: bounded thread pools, connection pools, bulkheads, admission control, and backpressure as work-in-progress limits for requests, and the M/M/1 queueing result that average wait scales like ρ/(1−ρ) → ∞ as utilization ρ → 1. The two compounding mechanisms beyond fixed-throughput Little's Law: context-switching costs (Gerald Weinberg, Quality Software Management, 1992, the ~100% / 40%-each / 20%-each model, cited as illustrative, not a measured constant; Gloria Mark's ~23-minute refocus-after-interruption research as the per-switch version), and OS thrashing (virtual-memory page-swapping collapse, the working-set tradition from the late-1960s) as the literal “more load → throughput toward zero” cliff. The utilization-vs-flow reframe: Tom DeMarco, Slack (2001), and Donald Reinertsen, The Principles of Product Development Flow (2009), 100% utilization maximizes responsiveness latency; slack is the capacity that enables flow. “Stop starting, start finishing” / WIP limits / pull-based flow per David Anderson's Kanban (2010). Accuracy notes: Little's Law requires a stable system and describes steady-state averages, not per-item guarantees; knowledge-work throughput is genuinely fuzzy, so the numeric examples are illustrative; the Weinberg percentages are a widely-used estimate, not a law (only the direction, throughput falls as WIP rises, is well-established); WIP limits must count active work and exclude genuinely blocked items, and some parallelism is legitimate. The decision structure is a near-identity (a team backlog and a thread pool are both stable queues governed by the same theorem), not a loose analogy.

Little's Law needs a real count of what's done, not what's claimed done.

Cycle Time = WIP ÷ Throughput is only as honest as your measurement of which items are actually finished versus quietly parked in “blocked,” and that gets harder, not easier, with a fleet of autonomous agents, where “done” is whatever the agent reports. You can't cap work-in-progress you can't see, and an agent's own status is the least trustworthy source for it. Chain of Consciousness anchors each action an agent takes to a tamper-evident record, so the real throughput and the real in-flight count are something you can read off the chain instead of inferring from a self-reported board.

See a verified action chain · Hosted Chain of Consciousness

pip install chain-of-consciousness  ·  npm install chain-of-consciousness