Mitigation legitimately precedes root cause. Your rollback is the pump handle — and the germ can wait.
On September 8, 1854, a parish official in Soho, London, walked to the public water pump on Broad Street and unscrewed its handle. He did not know what cholera was. Nobody did — not really. The dominant theory still blamed "bad air," and the microbe responsible wouldn't be identified for another three decades. He couldn't have told you the mechanism by which that water was killing his neighbors. He removed the handle anyway, because a physician named John Snow had shown him a map, and the map was enough.
That gesture — acting decisively on a pattern you cannot yet explain — is one of the founding moves of public health. It is also, though almost nobody frames it this way, one of the most important and most resisted principles in running production software. Two of medicine's largest early wins came before anyone understood the cause. The engineers fighting an outage at 3 a.m. are fighting the same fight, against the same instinct, and the history tells them exactly how to win it.
When cholera broke out in Soho on August 31, 1854, it moved with terrifying speed: around five hundred people died within ten days, and more than six hundred would die in all, almost every one of them inside a 250-yard radius of a single pump. Snow was skeptical of the prevailing "miasma" theory — the idea that disease rose from foul-smelling air — and he did something that was, at the time, genuinely novel: he treated the outbreak as a data problem. He plotted every death on a street map and walked the neighborhood asking questions. The dots clustered, unmistakably, around the Broad Street pump. The brewery workers nearby, who drank beer rather than pump water, were spared. So were the inmates of a workhouse with its own well.
Snow had correlation, not mechanism. He could not have explained what was in the water, why it caused the disease, or how it spread inside the body. He had a where, not a why. And his critics, who were not fools, made the methodologically impeccable objection: you have shown that deaths cluster near the pump, not that the pump causes the deaths. Correlation is not causation. They were right about the logic and catastrophically wrong about what to do with it. Snow's reply was, in effect, that you can take the handle off a pump a great deal faster than you can win an argument about etiology — and the handle costs nothing if you're wrong.
Here is the honest part of the story, the part the triumphant retellings skip. By the time the handle came off, the outbreak was already subsiding; many residents had fled, and most of the deaths had occurred in the first few days. The handle removal probably did not save a large number of additional lives that week. Its importance is not the body count. Its importance is that it demonstrated a method — intervene on the pattern now, settle the explanation later — and that method, applied over the following century, saved lives by the millions. Robert Koch finally identified the cholera bacterium in 1884, thirty years after the handle came off. Thirty years. If London had waited for the germ before touching the pump, it would have buried a great many more people while being technically correct.
Seven years earlier and six hundred miles away, a Hungarian obstetrician named Ignaz Semmelweis had run headlong into the same wall, and it destroyed him.
At the Vienna General Hospital, the maternity service had two clinics. In the First Clinic, where deliveries were handled by doctors and medical students, roughly ten to twelve percent of mothers died of childbed fever — puerperal fever, a raging infection that set in days after birth. In the Second Clinic, staffed by midwives, the rate was around two percent. Women begged to be admitted to the second clinic. Some preferred to give birth in the street. The difference was real, it was deadly, and nobody could explain it.
Semmelweis got his clue from a death. A colleague, Jakob Kolletschka, cut his finger during an autopsy and died of an illness whose pathology looked exactly like childbed fever. Semmelweis made the leap: the doctors in the First Clinic went straight from dissecting cadavers to delivering babies, carrying something — he called it "cadaverous particles" — on their hands. The midwives did no autopsies. In May 1847 he ordered the doctors to scrub their hands in a chlorinated lime solution before examinations. The First Clinic's mortality fell off a cliff: from 12.2 percent in May to 2.2 percent in June, 1.2 percent in July, 1.9 percent in August. He had cut maternal death by something like eighty to ninety percent.
And he had no idea why it worked. Germ theory did not exist. Semmelweis could not say what the cadaverous particles were, only that washing them off stopped the dying. He had Snow's predicament exactly: an intervention that demonstrably worked, resting on a mechanism he could not name.
The medical establishment rejected him. Not quietly — bitterly. He was eased out of the Vienna General Hospital in 1849, his ideas ridiculed for the rest of his life. He grew increasingly angry and erratic, was committed to an asylum in 1865, and died there at forty-seven, beaten by guards, of an infection. Decades later, after Pasteur and Lister and Koch had built the germ theory that explained precisely why his handwashing worked, he was vindicated. The vindication did the mothers of 1848 no good whatsoever.
It is tempting to file the rejection of Semmelweis under "the past was stupid." It was not stupidity. The phenomenon was named, a century later, the Semmelweis reflex: the reflex-like tendency to reject new evidence because it contradicts established belief, without weighing it on its merits. And the engine of the reflex, in his case, was not ignorance. It was identity threat.
Sit with what accepting Semmelweis required of a Viennese physician in 1847. It required him to accept that his own hands — the hands of a respected, educated, well-meaning healer — had been carrying death into delivery rooms for years. That the mothers in his care had died because of him. "Wash your hands" is a trivial request. "Your hands have been killing your patients" is an annihilating one. The data was clean; the conclusion was unbearable. It didn't help that statistical argument was alien to medicine then, or that Semmelweis presented his case as raw tables of numbers delivered with the warmth of an accusation. Right evidence, threatening implication, terrible delivery. People reached for any available reason to disbelieve, and "you've only shown correlation" was sitting right there.
That combination — a correct finding whose acceptance forces an admission of self-harm — is not a relic of nineteenth-century medicine. It lives, fully intact, in the incident channel of every engineering organization.
When a production system is actively hurting users — bad data going out, payments failing, latency spiking, an agent taking harmful actions — the engineering instinct, drilled in deep, is understand it before you touch it. Find the root cause. Don't act on a guess. That instinct is excellent during design and code review and lethal during an active incident, for the same reason waiting for the germ was lethal in Soho: every minute spent explaining is a minute of continued harm, and the explanation can take hours while the mitigation takes seconds.
The mitigations are pump handles. Rolling back the last deploy is a pump handle — you don't need to know which line of the change broke things to know that reverting it stops the bleeding. So is flipping a feature flag off, tripping a circuit breaker, draining traffic from a sick region, or flushing a poisoned cache. None of these require a mechanism. They require a pattern: it started when we shipped at 14:05; revert what we shipped at 14:05. That is Snow's map. It is correlational, it is provisional, and it is the right thing to do right now. You complete the root cause afterward, in daylight, when no one is bleeding.
And the Semmelweis reflex shows up precisely here, wearing the costume of methodological rigor. Listen for it in a live incident:
Every one of those sentences is technically defensible and, with users actively harmed, practically dangerous. They are the reflex, and the reflex has a body count.
This is where the history pays off in something more useful than a good analogy. The standard explanation for blameless postmortems — the practice, popularized in modern site-reliability culture, of analyzing an incident without assigning personal fault — is that they "encourage honesty and psychological safety." True, but soft, and it makes the practice sound like a courtesy. The Semmelweis case reveals the harder, load-bearing reason.
Blameless postmortems exist because the Semmelweis reflex is real, and it is triggered by blame. The Vienna doctors did not reject handwashing because the evidence was weak. They rejected it because accepting the evidence meant accepting that they had killed people. When the finding and the blame are welded together, a threatened mind will throw out the finding to escape the blame — even when, especially when, the finding is the one that saves lives.
A blameless postmortem is the engineered solvent for exactly that weld. By making it a ground rule that "your change caused the outage" is a statement about a system that allowed a bad change to ship, not a verdict on you, it strips the identity threat off the finding. And once the threat is gone, the human being can look at the evidence — yes, my deploy did it — and accept it on its merits, the way the Vienna doctors could not. Blamelessness is not kindness for its own sake. It is the specific countermeasure to the specific reflex that, untreated, makes people reject the truth that would have protected their users. We learned the cost of leaving it untreated from the mothers of the First Clinic.
The operating principle is two sentences long: Pull the pump handle now. Understand the germ later. Translating that into how you run systems and teams:
Set a low bar to mitigate and a high bar to explain. In an active incident, the question is never "do we understand this yet?" It is "do we have a reversible action that plausibly stops the harm?" If yes, take it — rollback, flag, circuit breaker, drain — on the pattern alone. Reserve the demand for full mechanistic certainty for the post-incident analysis, where it belongs and where it's free.
Treat "it's only correlation" as a reason to act cheaply, not a reason to wait. Snow's critics were right that he hadn't proven causation. The correct response to weak-but-suggestive evidence and a cheap, reversible intervention is to take the intervention, because the cost of being wrong is a needless rollback and the cost of waiting is measured in users. Match the rigor you demand to the reversibility of the action.
Build blamelessness as a control, not a vibe. Write it down. Make "the system let a bad change through" the default phrasing and "who screwed up" a banned one — not to be nice, but because you are actively defusing the reflex that would otherwise make your most important findings unacceptable to the people who need to accept them. A team that can calmly say "my deploy caused this" is a team that can fix things a team trapped in self-defense never will.
The deepest lesson is the one the parish official on Broad Street understood without being able to articulate it: you do not need to understand a problem to stop it from hurting people, and insisting otherwise is a way of valuing your own need for explanation over someone else's safety. The germ can wait. The map is enough. Take the handle off the pump.
Pull the handle in seconds — but you can only understand the germ later if you kept the map.
"An agent taking harmful actions" is one of the incidents above, and the second half — understand the germ later — only works if you captured what the agent actually did. You cannot run a blameless postmortem on an action you have no faithful record of. Chain-of-Consciousness is that record: tamper-evident, per-step provenance of an agent's decisions and tool calls, so after you've rolled back you can do the real root-cause from the map instead of an argument — and the finding lands on the system, not a guess.
pip install chain-of-consciousness · npm install chain-of-consciousness
Hosted Chain-of-Consciousness → · See it in action