The observability gap is the risk gap. What cyber-insurance underwriters learned, the hard and expensive way, about pricing a hazard you cannot see.
On the morning of June 27, 2017, a piece of malware called NotPetya entered the pharmaceutical giant Merck through a routine accounting-software update in its Ukrainian offices, and within ninety seconds it had infected roughly ten thousand machines on Merck's global network. By the time it was done it had taken down more than forty thousand. Merck lost the better part of $1.4 billion: wrecked manufacturing, lost sales, the staggering cost of rebuilding an enterprise network from bare metal. And then, because Merck had insurance, it filed a claim, and that is where the story turns from a cybersecurity story into something stranger and more instructive.
Merck's insurers refused to pay. They invoked the "hostile or warlike action" exclusion, the war clause, boilerplate in property policies since the era when "war" meant tanks and the thing it excluded was a munitions warehouse going up. NotPetya, the insurers argued, was an act of war: it had been built and released by Russia's military intelligence as a weapon against Ukraine, and Merck was collateral damage. Merck sued. And in May 2023, a New Jersey appellate court ruled against the insurers, finding that a war exclusion written for "military action" could not be stretched to cover a worm that happened to escape its target: the insurers, the court said, hadn't met their burden to show the clause fairly applied. Merck was owed its $1.4 billion. The insurers settled in January 2024, just before the state Supreme Court could weigh in, for an undisclosed sum.
The interesting question is not who won. It's what the whole fight reveals about a discipline most engineers never study and should: how do you put a price on a hazard you cannot see coming, cannot model with confidence, and cannot fully observe even after it has happened? Because the cyber underwriter's problem is, almost line for line, the reliability engineer's problem, and the underwriter has been forced to be more honest about it than we usually are.
Insurance is, at bottom, the business of pricing risk, and it works beautifully when risk is observable. Consider fire. An actuary pricing a homeowner's fire policy is standing on centuries of loss data. The physics of combustion does not change between Tuesdays. A house that burns down files a claim, and that claim becomes a data point, so the dataset is more or less complete: the population of fires and the population of fire claims are nearly the same population. The hazard is stable, the data is dense, and the survivors and the casualties both show up in the record. Fire is, to an actuary, almost gold.
Now price cyber. Every property of fire that made it tractable is inverted. The hazard is not stable physics; it is adversarial, it is attacker capability multiplied by defender weakness, and both halves move on a patch-Tuesday cadence. A vulnerability that did not exist last month is weaponized this month and patched the next; the thing you are insuring against literally reshapes itself faster than any actuarial table can be revised. The loss data is sparse, because cyber catastrophes are rare and recent. And, this is the part that should make the hair stand up on the back of your neck, the data is survivorship-biased. A company that suffers a breach and survives files a claim, and you learn from it. A company that suffers a breach and dies files for bankruptcy, not insurance, and vanishes from your dataset entirely. The worst outcomes are the ones that leave no record.
So the underwriter faces a hazard that is opaque in a way fire never is, and the industry's entire methodological history is the story of one stubborn response: drag the unobservable into the observable, by any means available, so that a premium can be attached to it. Everything else follows from that imperative, and so does the engineering lesson.
When you cannot see inside a thing, you learn to read it from the outside. Cyber underwriting grew a whole instrument industry to do exactly this. Firms like BitSight (founded 2011) and SecurityScorecard (2013) pioneered "security ratings": outside-in measurement of a company's security posture, its exposed services, its patching latency, its leaked credentials, its botnet chatter, compiled into something like a credit score for cyber hygiene. The underwriter cannot audit your internal controls, so they scan your external attack surface and infer. Alongside the ratings came frameworks like FAIR, Factor Analysis of Information Risk, developed by Jack Jones and adopted as an Open Group standard, which insists on decomposing a vague dread ("we might get breached") into quantities you can actually estimate and multiply: how often a loss event is likely to occur, and how much it would cost when it does. FAIR's entire purpose is to convert "scary and unknown" into "a number with a distribution around it." A number you can price.
Squint, and this is the discipline of observability in software, arriving by a different road. The control theorist Rudolf Kálmán gave the word its rigorous meaning back in 1960: a system is observable if you can reconstruct its internal state from its external outputs. That is precisely what a security rating does to a company and precisely what a tracing system does to a distributed application. The modern observability movement, the high-cardinality, ask-new-questions-of-production approach that people like Charity Majors built into a category, is the same outside-in inference applied to your own code: you cannot pause a live system and read its mind, so you instrument its outputs richly enough to reconstruct what it must be doing inside.
Here is the load-bearing claim, the one the underwriters force into the open and engineers too often leave implicit: the observability gap is the risk gap. The part of your system you cannot see is, by definition, the part whose risk you are carrying unpriced. An unmonitored dependency, an un-instrumented code path, a failure mode with no metric attached, each one is, in underwriting terms, an uninsurable risk sitting on your books. You cannot set an SLA on a service whose latency you do not measure. You cannot budget capacity for a queue whose depth you do not track. You cannot, in the most literal sense, manage it, and the old management chestnut "you can't manage what you can't measure," usually pinned on Peter Drucker, was sharpened by W. Edwards Deming into a warning that the costliest myth is to imagine the unmeasured can be managed at all. The cyber underwriter states the corollary with brutal exactness: you cannot price what you cannot observe. So the first move, always, is to make it observable before you depend on it.
The canonical illustration arrived in December 2021, when a vulnerability called Log4Shell (CVE-2021-44228) detonated through the industry. The flaw lived in Log4j, a logging library so ubiquitous it was a transitive dependency three and four levels deep in software that no one had ever consciously chosen to install. The reason it was a five-alarm fire was not that the bug was exotic; it was that almost no one could answer the question "do we run Log4j, and where?" The dependency was real, the risk was real, and it had been entirely unobserved, an unpriced liability sitting in production at thousands of companies who had no inventory that would even surface it. The remediation, industry-wide, was an observability project: build the software bill of materials, instrument the dependency graph, make the attack surface legible. It was the security-ratings move turned inward. You cannot defend what you cannot enumerate.
But the underwriter's discipline has a second move, and it is the one engineers most often skip, because it requires admitting defeat about something, and the honest admission is exactly the point.
Sometimes a risk genuinely cannot be observed. The underwriter cannot see inside a sovereign state's offensive cyber program; they cannot model the correlated, systemic blast radius of a worm that crosses every network on earth in an afternoon. And when observation fails entirely, the mature underwriter does not guess, and does not quietly hope. They exclude, explicitly, in writing, with a hard boundary. That is what the war exclusion was trying (clumsily) to be, and it is what came after Merck: in August 2022, Lloyd's of London issued Market Bulletin Y5381, requiring that from March 31, 2023, every standalone cyber policy written in its market carry an explicit exclusion for state-backed cyberattacks, in addition to the war clause. The London Market Association drafted four precedent wordings to make the boundary precise. The logic was unsentimental: the systemic, state-scale risk is not observable or modelable at the level required to price it, so rather than absorb it silently and pray, the market drew a line around it and said this, we do not cover. An explicit exclusion is not a failure of nerve. It is the refusal to carry an unpriced catastrophe on the books while pretending otherwise.
Engineers have invented this exact move and rarely recognize it as the same thing. When your service calls a third party whose internals you cannot see, a black-box vendor, an upstream API, a dependency you do not control, you face the underwriter's unobservable risk. And the disciplined response is not to assume it will be fine. It is to draw a boundary: the circuit breaker that Michael Nygard popularized in Release It! in 2007 and that Netflix hardened into Hystrix around 2012; the timeout; the bulkhead that isolates one pool of resources from another; the rate limit; the explicit fallback. Every one of these is a sublimit. A circuit breaker is you saying, in code, "I cannot observe what is happening inside this dependency, so I am capping my exposure to it at N failures before I stop trusting it." That is precisely a war exclusion: a hard boundary around a risk you have honestly conceded you cannot price.
The 2020 SolarWinds compromise is the cautionary case for what happens when the boundary is missing. Attackers slipped a backdoor into a trusted vendor's software update, and roughly eighteen thousand organizations installed it, because the update came from inside the trust boundary, from a black box everyone had decided, silently, was fine. There was no sublimit on trust. The risk of the vendor's unobservable internals had been absorbed wholesale, unpriced, by everyone downstream. The underwriter would have called that an uncapped exposure to an uninsurable peril. The engineer should call it the same thing.
There is one more lesson, and it is the one that should change how you read your own monitoring, because it inverts an instinct everyone has.
The cleanest demonstration of it comes from the Second World War, from the statistician Abraham Wald, who worked for the U.S. Statistical Research Group. The military wanted to add armor to its bombers and had data on where returning planes had taken the most hits, wings, fuselage, tail, and proposed reinforcing those spots. Wald's now-famous insight was that they were reading the data exactly backwards. The bullet holes they could see were, by definition, holes in planes that had survived them. The places to armor were the places with no holes in the survivors, the engines, the cockpit, because the planes hit there were the ones that never came back to be measured. The absence of damage in the data was not evidence of safety. It was evidence of who wasn't in the dataset.
This is the underwriter's survivorship problem stated in its purest form, and it is the engineer's blind-spot problem too. In June 2014, a company called Code Spaces, a hosting service for source code, was breached when an attacker gained control of its AWS console and deleted not just its data but its backups. The company was gone within roughly twelve hours. It filed no insurance claim worth studying; it simply ceased to exist. In 2019, the American Medical Collection Agency's parent company filed for bankruptcy in the wake of a breach exposing tens of millions of records. These are the bombers that did not come back. The breaches we dissect in postmortems and conference talks are the ones whose victims survived to be measured. The worst ones left no data at all.
Apply Wald to your dashboards, and the conclusion is uncomfortable: the absence of incidents in a blind spot is not evidence of safety; it is evidence of blindness. That serene, never-firing alert on a service nobody has touched in two years may mean the service is rock-solid, or it may mean the metric is measuring nothing and the alert is wired to a condition that can no longer occur. The quietest part of your system is not necessarily the safest; it is the part you have the least reason to trust, precisely because you have the least information about it. A clean dashboard and a dark one look identical until the day they don't. Wald would tell you to go armor the places with no bullet holes.
The whole discipline collapses into a single habit you can run over any system you own: ask the underwriter's question of every component. Can I observe this?
If the answer is yes, instrument it, price it, put an SLA on it, and treat the numbers as a live estimate that decays, the way an underwriter re-rates a renewal, not a fact you established once. If the answer is no, your first move is to make it observable before you build anything that depends on it, because the unmonitored dependency, the un-traced code path, the failure mode with no metric is a risk you are carrying whether or not you have admitted it. And if it genuinely cannot be observed, a vendor's internals, a third party's reliability, a state-scale correlated failure, then do the grown-up thing the insurance market was dragged into doing: exclude it explicitly. Put a circuit breaker on it. Set a sublimit. Draw the hard boundary in code and in your head, rather than silently absorbing a catastrophe you cannot price.
Then, last and most counterintuitively, go audit your blind spots on purpose, not your incidents, your blind spots. Walk the parts of the system that never page anyone and ask whether they are quiet because they are healthy or quiet because they are unobserved. Treat your serenest dashboards as suspects. The underwriters learned, the hard and expensive way, that the risk you cannot see is not the risk that isn't there; it is the risk that is mispriced at zero on your books and waiting. Observability was never really about debugging. It is about knowing what you are insuring, and the part of your system you cannot see is, exactly and always, the part whose risk you are carrying for free.
You cannot price an agent action you cannot observe.
An autonomous agent whose reasoning and tool-calls you cannot see is the underwriter's nightmare: an unpriced risk sitting on your books, mispriced at zero until the day it isn't. The fix is the first move from this essay, applied to agents: make it observable before you depend on it. Chain of Consciousness is that instrument, a tamper-evident record of an agent's reasoning, tools, and actions, so the part of the system you most need to trust stops being a black box you are silently insuring for free, and becomes something you can actually audit and price.
See Hosted Chain of Consciousness · verify an action chain
pip install chain-of-consciousness · npm install chain-of-consciousness