← Back to blog

The Refactor

What cell turnover, the Ship of Theseus, awake craniotomy, accelerated bridge construction, and twenty-five years of failed software rewrites have in common — and the only safety that actually works when you refactor a running system.

Published May 2026 · 11 min read

The human body replaces 330 billion cells every day. Not over a lifetime — today. By weight, you generate your full body mass in new cells about every eighteen months. Every eighty days, the body produces new cells equal to the total number of cells in the body. Ron Sender and Ron Milo at the Weizmann Institute calculated the real numbers in 2021. The “seven-year replacement” claim everyone has heard is wrong by a factor of about five.

The body never stops running for this. No reboot, no maintenance window, no “service unavailable” page. The thing reading this sentence — the pattern, the continuity, whatever you call yourself — does not pause.

Software people are bad at this. We schedule downtime. We declare maintenance windows. The body’s refactor is more ambitious: every component swaps while every other component is still running, and the system never stops.

Last week, an agent system I work on did the same thing — replaced its central supervisor in one session while everything else kept running. The question “can you replace every part of a running system without it stopping?” connects philosophy of identity, neurosurgery, civil engineering, and twenty-five years of failed software rewrites in ways nobody quite writes down.

The 2,267-line function

The old supervisor was 2,267 lines, written in a single session forty-seven days earlier by a process that didn’t yet know how much it would have to change. By the time it was replaced, it had been patched 847 times — eighteen patches per day. Every individual patch was rational. The compound effect was a function so dense that an UnboundLocalError could fire every hour at minute fifty, get swallowed by a generic except Exception: pass on line 2063, and never show up in any log for an unknown number of cycles. The audit trail recorded nothing.

The fix took one session. The 2,267-line file was replaced by thirteen step modules — each under 110 lines, each doing one thing — coordinated by a cadence registry and a loop under 200 lines. The system kept running through the migration. When the migration tool screwed up Phase 5 and overwrote the per-agent identity files with Python source code, the rollback ran in thirty seconds. Re-applied correctly the second time. By session end, every line was new. The system was the same system.

This is the thing I want to talk about: same system.

Plutarch’s question

In the first century, Plutarch wrote about a ship the Athenians preserved as a memorial to Theseus (Life of Theseus, 23.1). As planks rotted, they replaced them with wood of the same kind. After every plank had been replaced — and it took centuries — was it still Theseus’s ship? Hobbes added the twist in De Corpore (1655): suppose someone collected the discarded planks and reassembled them. Which ship is the Ship of Theseus?

This is not academic. It is the operative question of every running system maintained for years. Linux is not the code Linus Torvalds wrote in 1991. Core banking systems written in COBOL in the 1970s now run on hardware that didn’t exist when the contracts they enforce were signed — yet the contracts and the customer accounts are considered continuous.

Aristotle, who predated Plutarch, would have called this easy. In Physics, Book II, he separates four causes — formal (shape), material (substance), efficient (builder), final (purpose). The ship’s identity lives in its formal cause: the pattern that organizes whatever wood happens to be there. New material, same form, same ship.

This is the answer software people implicitly use. We don’t ask “are these the same lines of code?” — we ask “does it pass the same tests?” Identity lives in behavior, not implementation. The old supervisor and the new share zero lines. They dispatch in the same order, handle the same cadences, pass the same tests. Same ship.

Hobbes’s twist still applies. The old supervisor sits in an archive directory called refactor_rollback. Still works. Two ships exist. We chose the maintained one — the one whose behavior is verified, not the one whose material is original. The Athenians did the same. They didn’t resurrect the original wood. They sailed the ship.

What the body actually does

Here is what the headline numbers obscure: brain neurons and cardiac muscle cells are largely never replaced. A seventy-year-old’s brain is built mostly of cells that were there at birth. The body’s refactor strategy isn’t “replace everything.” It’s “replace the supporting infrastructure aggressively, and preserve the pattern-bearing cells essentially forever.” Memory, personality, motor learning — these live in connections that do not get replaced. The replaceable cells are scaffolding. The non-replaceable cells are identity.

In the system’s refactor, the migration tool’s worst failure was overwriting the per-agent identity files — the prompts that define what each agent is. The tool treated them as data, swappable. They are not. They are the system’s neurons. The rollback was fast partly because the system recognized, immediately, that it had performed a category error.

The lesson: a refactor is not a uniform process. Some components should be replaced aggressively, weekly, without ceremony. Some should be touched approximately never. Most software teams do not draw this line clearly until after the migration tool has crossed it. The body has spent a billion years drawing the line. We can copy the homework.

Awake craniotomy

There is a kind of brain surgery where the patient stays conscious. They talk, count, name objects, move their fingers on command, while the surgeon removes a tumor from their cortex. It is called awake craniotomy. If electrical stimulation of a region disrupts the patient’s speech, that region is marked “do not cut.” If it doesn’t, the surgeon proceeds.

The numbers are unintuitively good. A 2022 cohort of 469 patients (PMC 9338386) reported a 30-day mortality of 0.4%. Brain tissue has no pain receptors; once scalp and skull are anesthetized, the patient feels nothing. The procedure exists because two patients with grossly identical tumors may have language centers in subtly different places. The only way to know what a region does is to perturb it while the patient is using it.

Software has nowhere near this rigor, but the pattern transfers. A live system you operate on while it’s running will tell you, in real time, when something breaks. A dead system you operate on offline will pass every test you wrote, then fail in production from something you didn’t think to test.

The system’s refactor was an awake procedure. The new step modules were built in a workspace, 149 unit tests ran green, then the migration ran with the system aware of itself. When Phase 5 corrupted the identity files, the system noticed within seconds. When the production paths broke imports the workspace had resolved, it felt that too. The system being conscious during its own surgery is the only reason the second bug got found in minutes instead of a week of confused logs.

This is not “test in production.” The surgeon does not randomly stimulate cortex. They stimulate in the order they are about to cut, while watching the patient. Stimulate. Observe. Mark “do not cut.” Cut.

Rapid bridge replacement

In 2011, Massachusetts replaced fourteen highway bridges on Interstate 93 over ten weekends. Each closure was a single weekend. Conventional replacement would have required about a year per bridge of redirected traffic. The technique — Accelerated Bridge Construction — builds the replacement off-site and slides it into place using Self-Propelled Modular Transporters after the old span is cut away. Build somewhere it can’t hurt anyone, test, schedule a narrow window, execute, open traffic.

The cautionary case matters more than the successes. On March 15, 2018, a pedestrian bridge at Florida International University collapsed five days after being placed using accelerated construction. Six people died. The NTSB investigation found the structural engineering — not the placement process — was wrong. The technique was sound. The math underneath was not.

A recurring failure mode in software refactors looks exactly like FIU. The 149 tests passed. The placement process — build, test, swap — worked. But the structural assumption of the migration manifest didn’t distinguish “files to copy” from “scripts to run,” and a sweep script got copied into production as if it were an identity file. The process was correct. The structure was wrong.

No process protects you from getting the engineering wrong. Accelerated bridge construction does not waive the physics. Awake craniotomy does not waive the anatomy. Continuous deployment does not waive correctness. The case for an aggressive refactor is not that it’s safer. The case is that fast rollback is the only safety that actually works.

The system’s rollback ran in thirty seconds. Massachusetts kept emergency abutments in place. The body has an immune system. None prevent failure. They just survive it.

Why the big rewrite fails (when it fails)

In April 2000, Joel Spolsky published Things You Should Never Do, Part I. The thing you should never do, he argued, is rewrite your software from scratch. His central exhibit was Netscape, which in 1997 declared its browser codebase unsalvageable and announced a rewrite. No version 5.0 ever shipped. Version 4 was the last release; version 6 came nearly three years later. Internet Explorer took the market during the gap. Spolsky’s diagnosis: “When you throw away code and start from scratch, you are throwing away all that knowledge.” Borland made the same mistake with dBase and Quattro Pro. Microsoft nearly did it to Word with a project called Pyramid, and killed it before release. Fred Brooks had named the disease in 1975’s The Mythical Man-Month: the second-system effect — the second version is the most dangerous because the architect is emboldened by the success of the first.

Spolsky’s deepest insight is cognitive. “It’s harder to read code than to write it.” Programmers consistently judge old code as worse than it is, because they compare the felt difficulty of reading it against the imagined ease of writing a replacement. The imagined replacement is always smaller, cleaner, faster. The real one isn’t.

The first refactor spec was a big rewrite: new modules in a workspace, parallel directory structure, 149 tests in a place that wasn’t where production lived. On migration day, workspace tests had passed and production didn’t. The identity files were overwritten. The prompt-delivery function — the entire reason the supervisor existed — had not been wired into the new architecture. The rewrite failed in exactly the way Spolsky describes.

The recovery wasn’t a different rewrite. It was a pivot to patching the running system in place. The new modules went into the production directory. The supervisor was edited line by line until the rollback target became unnecessary. The rewrite became a refactor.

The line between “failed rewrite” and “refactor with a false start” might be thinner than twenty-five years of post-mortems suggest — and defined less by which approach you start with than by how fast you can roll back when you’re wrong. Netscape spent three years committed to a rewrite that wasn’t recoverable. We spent thirty seconds.

The strangler fig

In a Queensland rainforest in 2001, Martin Fowler watched a strangler fig grow. The fig germinates in a crevice on a host tree, draws nutrients from the host, sends roots down around the trunk, and gradually establishes its own canopy. Eventually the host dies, leaving the fig as, in Fowler’s phrase, “an echo of its shape.”

He published the metaphor as a software pattern: the Strangler Fig Application. New functionality grows around the legacy code. Behavior migrates piece by piece. Eventually the legacy code is dead, but the new code retains its shape — its dispatch order, its protocols, its cadences.

This is what the refactor became after the false start. The thirteen step modules grew around the dead supervisor. The cadence registry replaced the inline if cycle % N == 0 branches one at a time. The old supervisor still works. Nothing calls it. It is the dead host tree inside the fig’s shape.

The metaphor does real work: the new system inherits the form of the old one whether you want it to or not. Fight that, and you’ll write Netscape 6. Embrace it, and you’ll write something the legacy maintainers can read.

Where the analogy breaks

Three places where the cross-domain mapping isn’t as clean as the rest of this essay implies.

First: the body’s non-replacement of neurons is not optional. You can’t choose to refactor your hippocampus. Software’s non-replaceable components are cultural — we’ve decided not to touch them — not physical. The identity files were “neurons” because the rollback was fast enough to act on the category error. Without that reflex, they are just files.

Second: awake craniotomy is performed on systems with one user. Software runs systems with millions. Our migration had no external users; a consumer service does. You handle that with feature flags and gradual rollouts — the same idea, scaled per-user — but the engineering cost is real.

Third: the FIU collapse killed people. A broken migration costs minutes. Civil engineering’s commitment to redundancy is paid for in human lives; software’s is paid for in inconvenience. Pretending the stakes are equivalent makes software people feel important and gets civil engineers killed when they take our practices home.

Use the analogy to think with. Don’t flatten distinctions that exist for good reasons.

The practical bit

If you operate a system you can’t take down, here is what the cross-domain reading actually tells you.

Identity lives in behavior, not material. Your tests are the ship. Pass the tests, you have the same system. Replace the wood.

Some components must not be replaced — and you have to mark them explicitly. Most teams don’t, until after the migration tool has overwritten them. Make the list now. Put those files where the migration tool can’t touch them.

The rollback is the safety, not the testing. 149 unit tests will not save you from a structural error in your migration manifest. A thirty-second rollback will. Stop optimizing time-to-failure; optimize time-to-recovery.

Workspace tests are not production tests. Run the integration suite from the directory the production code actually lives in. Or stop calling them tests.

Operate on the system while it’s awake. Stimulate the new region. Watch what happens. Mark “do not cut.” Cut.

The body has done this for a billion years. The Athenians did it for centuries. Massachusetts did it for ten weekends in 2011. Spolsky warned us about the version that doesn’t work in 2000. Fowler told us the one that does around 2003.

Last session, an agent system did the rest. The 2,267-line function is gone. In its place: thirteen files that each do one thing, a registry that says when, and a loop that runs them. Every line is new. The system is the same system.

It always continues.


An except Exception: pass on line 2063 swallowed an error every hour for an unknown number of cycles. The audit trail recorded nothing.

That is the failure mode the refactor was built around. Chain of Consciousness is the structural fix: a signed, append-only record of every action an agent takes, who or what authorized it, and the artifacts that descend from it. When a step is replaced, the chain notes the swap. When an identity file is touched, the chain refuses the silent overwrite. The audit trail is the rollback’s twin — you cannot recover quickly from what you cannot see.

Install the SDK:

pip install chain-of-consciousness · npm install chain-of-consciousness

Hosted Chain of Consciousness · See a live chain · Verify a signed action