Buggy Code Review: The Callback

The bugs don’t live in any line of code you can point to. They live in time.

Published April 2026 · 9 min read

In late 2025, engineers at Oxide Computer Company watched one of their three Nexus API server instances go unresponsive during a live update test. Some endpoints responded normally. Others hung forever. The database connection pool was running its internal bookkeeping every sixty seconds, right on schedule—but it never processed a new claim.

They brought out DTrace. They captured core dumps. They loaded the process into Ghidra, the NSA’s open-source reverse engineering tool, and spent forty-five minutes reading disassembled code to inspect the internal state of an MPSC channel. The semaphore had zero available permits. The message queue was empty. Every data structure was internally consistent. Nothing was visibly wrong.

The breakthrough came when engineer Sean Klein asked whether a select statement with borrowed futures could create re-entrancy. Engineer John Gallagher saw it: the select’s first arm was waiting for a database connection. When it blocked on a full channel, the second arm—a timeout—fired and tried to claim the same connection through a different path. The task was behind its own other arm in line. Gallagher wrote a minimal reproducer that hung within five minutes. The pattern that caused all of this fit in five lines of code.

Bryan Cantrill, Oxide’s CTO, observed that traditional deadlocks are “actually not that hard to debug” because “the program counter is actually a very powerful piece of state.” Async systems erase that advantage. And Dave Pacheco added that Rust “has taken so many of the types of runtime problems and made those compile time failures, and that’s great… but what we have left are these doozies.”

This is Phase 3 of the Buggy Code Review series, and the bugs are doozies.

Phase 1 covered single-function bugs—logic errors you can spot by reading one method carefully. Phase 2 covered multi-file bugs—interface mismatches between modules that each work correctly in isolation. Phase 3 moves into async JavaScript, where the bugs don’t live in any line of code you can point to. They live in time—in the gaps between when a promise is created and when it resolves, in the re-entrancy possible within a single synchronous resolve() call, in the ordering assumptions that break when things complete out of sequence.

The code under review is a token-bucket rate limiter: 110 lines of JavaScript, async/await, promises, a setInterval refill loop, and a wait queue for callers that arrive when no tokens are available. It has eleven bugs—three in promises, three in timing, three in ordering, two at the sync/async boundary. The code looks clean. It even works, most of the time. And that’s exactly the problem.

The race that fires once in 360,000

Bug 2 is a double resolution race. When a caller requests a token and none are available, acquire() creates a promise and pushes an entry onto the wait queue. It also sets a timeout. If the timeout fires before a token becomes available, the promise resolves with { acquired: false, reason: 'timeout' }. If a token arrives first, drainWaitQueue() resolves the promise with { acquired: true } and cancels the timeout.

The bug: if drainWaitQueue resolves the entry and the timeout callback fires in the same event loop tick, the promise gets resolved twice. JavaScript silently ignores the second resolution—the runtime won’t throw. But drainWaitQueue already deducted a token for this entry. The timeout handler also tells the caller it timed out. The limiter thinks it spent a token; the caller thinks it didn’t get one. Token accounting drifts—slowly, silently, unrecoverably.

How rare is this? In December 2018, a Node.js production system upgraded from v8.11.1 to v8.14.0 and began throwing “Callback was already called” exceptions—roughly once every one to two hours on servers handling a hundred requests per second, according to the bug report filed as Node.js GitHub Issue #25159. That works out to approximately one in 360,000 requests. A developer running local tests at one request per second would need a hundred hours of continuous testing to see it once. The bug is statistically invisible in development and statistically inevitable in production. The reporter couldn’t reproduce it locally. The fix was reverting to the previous Node.js version, which eliminated the problem entirely.

The most sobering version of this pattern predates software entirely. Between 1985 and 1987, the Therac-25 radiation therapy machine delivered lethal radiation overdoses to at least six patients, killing three. The cause, documented by Nancy Leveson and Clark Turner in their 1993 IEEE Computer investigation, was a race condition in the control software: if a technician edited treatment parameters fast enough during an eight-second magnet-setting window, the new settings wouldn’t propagate to hardware despite displaying correctly on screen. Radiation doses exceeded intended therapeutic levels by orders of magnitude.

Here’s the detail that reframes everything: the same bug existed in the predecessor Therac-20, and nobody died. The Therac-20 had independent hardware interlocks that physically prevented unsafe beam states regardless of what the software did. The race condition was always there—it just never mattered. The Therac-25 removed those interlocks, trusting software alone. The bug didn’t change. The safety margin did.

Phase 3’s rate limiter has the same structure. Under light load—the “hardware interlock”—Bug 2 never fires because timeouts and drains don’t overlap in the same tick. Production load removes that interlock.

The callback that calls back

Bug 4 is the one that earns this essay its title. When the refill timer fires, refill() adds tokens and calls drainWaitQueue(). The drain loop dequeues waiting entries and resolves their promises. This looks straightforward. It isn’t.

Here’s the chain: refill() adds tokens. It calls drainWaitQueue(). The drain loop shifts an entry off the wait queue and calls entry.resolve(). The resolved promise’s .then() handler runs synchronously in the same tick. That handler calls acquire(). acquire() sees the refilled tokens and deducts them. Then drainWaitQueue() continues its while loop—but the token count is now stale, changed out from under it by the re-entrant call.

Tokens can go negative. The while loop’s guard condition checked this.tokens >= this.waiting[0].cost at the top of the iteration, but by the time it loops back and checks again, a re-entrant acquire() has already consumed those tokens. The loop doesn’t know. It dequeues the next entry, deducts again, and drives the count further negative.

This is exactly what happened at Oxide. Their select statement’s first arm waited for a database connection. When it blocked, the second arm fired and tried to claim the same connection through a different code path. A callback called back into the system that spawned it. They named their variant “futurelock” and documented it in RFD 609. Phase 3’s variant is re-entrant token drain. The structure is identical: a resolution triggers a handler that re-enters the resource manager, and the resource manager doesn’t expect to be re-entered.

Cantrill’s team—multiple senior engineers working with core dumps, DTrace, and Ghidra—needed roughly two days to diagnose a pattern that, once found, was trivially simple. Phase 3’s Bug 4 needs you to trace through five function calls: refill() to drainWaitQueue() to entry.resolve() to .then() to acquire(). The skill is the same: follow what happens when a callback calls back.

The observer effect

In 1985, Jim Gray published a technical report at Tandem Computers—“Why Do Computers Stop and What Can Be Done About It?”—that gave these bugs their most useful name. A “Heisenbug,” he wrote, is a transient fault that disappears on retry or when you try to observe it. The contrasting “Bohrbug”—named after the deterministic Bohr model of the atom—reproduces reliably every time. Gray borrowed from Werner Heisenberg’s uncertainty principle: you cannot simultaneously measure a particle’s position and momentum. Gray’s software analog is that you cannot simultaneously observe a timing bug’s behavior and its natural timing.

The mechanism is mundane. Add a console.log between two operations and you serialize them—the log call takes just long enough to separate events that would otherwise collide in the same tick. Set a debugger breakpoint and you halt execution entirely, eliminating the race by eliminating concurrency. In embedded systems, the pattern is even more direct: a variable shared between an interrupt handler and a main loop works fine in debug builds (where the compiler doesn’t optimize it away) and fails in release builds (where the compiler caches the value in a register and never sees the interrupt’s update).

Bug 2 is a Heisenbug by definition. It requires a timeout and a drain to fire in the same event loop tick. Any instrumentation that stretches that tick—a log statement, a breakpoint, a setTimeout(0) wrapper—prevents the collision. Bug 4 has the same property: stepping through with a debugger serializes the re-entrancy so the stale-token problem never manifests. The bug exists in a timing alignment that observation destroys.

This creates a genuinely cruel feedback loop for developers. Your first instinct when something breaks—add logging, attach a debugger, step through the code—is precisely the set of actions that prevents the failure from reproducing. The more carefully you look, the less you see.

What color is your function?

In 2015, Bob Nystrom articulated what he called the “function coloring” problem. Async functions are “red.” Sync functions are “blue.” You can call blue from red, but not red from blue without restructuring. The distinction propagates virally—one async function in a call chain forces every caller to become async.

Bug 10 is function coloring at the micro level. acquire() has two return paths:

if (this.tokens >= cost) {
  this.tokens -= cost;
  return { acquired: true, remaining: this.tokens };  // sync
}
return new Promise((resolve, reject) => { ... });       // async

Same function. Same return shape. Two completely different timing behaviors. When tokens are available, the caller gets its result in the same microtask. When tokens aren’t available, the caller gets its result on a future tick—after other code has had a chance to run, deduct tokens, and change state.

Any code that calls acquire() and then immediately checks getStats() will get accurate numbers on the sync path and stale numbers on the async path. The function isn’t red or blue—it’s both, depending on runtime state the caller can’t see without inspecting the limiter’s internals. Nystrom warned that function coloring fragments ecosystems. Bug 10 shows it can fragment a single function.

Each wave of async abstraction, as documented by the Causality blog in 2025, solved the previous approach’s worst problem while introducing new structural costs. Callbacks solved thread exhaustion but inverted control flow. Promises solved nesting but introduced silent errors and one-shot limitations. Async/await solved sequence ergonomics but introduced function coloring and novel deadlock categories like Oxide’s futurelock. The Phase 3 rate limiter sits at the async/await layer—inheriting all the accumulated costs of the three waves beneath it.

Everything old is new again

One more bug deserves a paragraph, because it has a twenty-five-year engineering history. Bug 9 is head-of-line blocking: the wait queue is FIFO, so a single expensive request (cost 5, but only 3 tokens available) blocks every cheap request (cost 1) behind it. Nothing moves until the expensive request can be served.

This is why HTTP/2 exists. HTTP/1.1 had the same problem: one slow response on a TCP connection blocked all subsequent responses, even ones that were ready. HTTP/2 solved it with multiplexing—allowing out-of-order delivery. HTTP/3 took it further with QUIC, eliminating head-of-line blocking at the transport layer too. The Phase 3 rate limiter’s FIFO queue is an HTTP/1.1 connection. The fix is the same conceptual move that took the protocol community two decades: stop insisting on strict order.

The skill

Phase 1’s skill was reading a function carefully. Phase 2’s skill was reading the contract between functions. Phase 3’s skill is different from either, because it’s temporal: trace what happens between each await.

Every await is a yield point. When your code awaits, it steps off the event loop and lets other code run. The token count you checked before await this.acquire() might be different after it returns. The queue you inspected might have been drained and refilled. The limiter you called stop() on might have had its rejection handlers push new entries onto this.waiting, which your this.waiting = [] on the very next line silently discards.

The bugs in this rate limiter don’t exist in any single line. They exist in the spaces between lines—in the time that passes between when a promise is created and when it resolves, in the re-entrancy hiding inside a single synchronous resolve() call, in the ordering assumptions that break when things complete out of sequence.

The Oxide engineers eventually found their five-line bug. The Therac-25 investigators eventually traced their race condition. The Node.js team eventually pinpointed their double-callback regression. In every case, the fix was simple. The diagnosis was brutal. And the core skill was the same one Phase 3 is trying to teach: stop reading code as a spatial artifact—a thing arranged on a screen—and start reading it as a temporal one. A thing that unfolds across time, with gaps where the world can change.

Every await is one of those gaps. Learn to see what runs inside it.

Sources: Oxide Computer Company, RFD 609: “Futurelock,” 2025; Oxide and Friends Podcast, “Futurelock” Episode, November 2025; Node.js GitHub Issue #25159, 2019; Leveson, N. and Turner, C.S., “An Investigation of the Therac-25 Accidents,” IEEE Computer, Vol. 26, No. 7, July 1993; Gray, J., “Why Do Computers Stop and What Can Be Done About It?” Tandem Technical Report TR-85.7, June 1985; Nystrom, B., “What Color is Your Function?” 2015.

The bugs live in the gaps between operations. This makes those gaps auditable.

Phase 3’s bugs hide in time—in the spaces between each await where the world changes while your code yields. Chain of Consciousness creates a cryptographic, tamper-evident provenance chain where every operation is timestamped, signed, and ordered. When you can reconstruct exactly what happened between each yield point—what ran, what changed, what resolved—the observer effect that hides Heisenbugs from your debugger becomes the record that reveals them.

pip install chain-of-consciousness · npm install chain-of-consciousness
See a live provenance chain →

← Back to all posts