Kelley's Covariation Model: Root-Cause Attribution You Can Run on Any Incident

A 1967 social-psychology framework is the incident decision procedure you reinvented piecemeal, and it names the one step you're built to skip.

Published June 2026 · 10 min read

It's 3:00 a.m. and your service is throwing 503s. You shipped a change at eleven, so you're three terminals deep in your own logs, re-reading the diff, squinting at the connection-pool config, building an increasingly elaborate theory about how your code broke. Forty minutes in, someone drops a line in the incident channel: "us-east-1 is degraded, it's on the status page." It had been red the entire time. You just spent forty minutes debugging your own code during a platform outage, and the check that would have ended the incident before it started (wait, is it just me?) cost about ten seconds, and you never ran it.

That is not a discipline failure, and it is not a you problem. It is a specific, documented cognitive bias, and a social psychologist named the trap before the public internet existed. In 1967, Harold Kelley published a model of how people work out why something happened. Tucked inside it are two things every on-call engineer needs: the exact decision procedure you should run on every incident, and a warning, fifty-eight years early, about precisely which step of that procedure you'd be wired to skip.

What Kelley actually built

Kelley's covariation model is a theory of attribution: how a person decides what caused a behavior they observe. Its engine is the covariation principle, which the literature states almost tersely: "an effect is attributed to the one of its possible causes with which, over time, it covaries." In plain English, the cause is the thing that's reliably present when the effect is present and absent when it's absent. You don't guess at the cause; you watch what travels with the effect.

Kelley's contribution, building on Fritz Heider's 1958 work separating internal from external causes, was to say that a careful observer weighs three independent kinds of evidence. The three are worth defining precisely, because the whole essay turns on them:

Consistency: does the behavior happen repeatedly, over time, in the same context? Does this person always react this way to this thing, or was it a one-time event?

Distinctiveness: does the actor behave this way only toward this one thing, or toward everything? Is the reaction specific to one stimulus, or general?

Consensus: do other people behave the same way in the same situation? Is everyone reacting like this, or just this one person?

And here's the part that makes it a tool rather than a vocabulary lesson: the pattern of those three answers points to one of three causes. Kelley called them the person (something about the actor), the entity (something about the thing being acted on), and the circumstance (something about the moment, a one-off). The mappings are clean:

High consistency, low distinctiveness, low consensus → the person. They always do it, they do it to everything, and nobody else does. The cause is them.
High consistency, high distinctiveness, high consensus → the entity. It happens reliably, but only toward this one thing, and everyone reacts to this one thing the same way. The cause is the thing.
Low consistency → the circumstance. It doesn't reliably recur. It was a fluke of the moment.

The textbook example: why did Mary laugh at the comedian? If Mary laughs every time she sees this comedian (high consistency), laughs at this one but not at others (high distinctiveness), and everyone laughs at him (high consensus), you attribute the laughter to the entity: the comedian is genuinely funny. But if Mary laughs at this comedian, and also at the menu, and at the parking sign (low distinctiveness), and nobody around her is laughing (low consensus), you attribute it to the person: Mary just laughs easily. Same behavior, opposite cause, and the three axes tell you which.

Sit with that for a second, because you have been doing a sloppy version of it your whole career.

The same three questions, in a different costume

A production failure has exactly three possible loci, and they are Kelley's three wearing hard hats.

The person is your component: your service, your code, the thing you own and shipped. The entity is the specific input or dependency the failure attaches to: a particular payload, a downstream service, one poisoned cache key. The circumstance is the transient one-off: a GC pause, a network blip, a race that fired once and won't again.

And the three axes you should be weighing are Kelley's three, translated with almost no friction:

Consistency becomes does it fail every time? Reproduce it ten times. If it fails ten out of ten, that's high consistency, and you're looking at a real, structural cause. If it fails twice out of ten, that's low consistency, and Kelley already told you where that points.

Distinctiveness becomes does it fail only with this one input or dependency? If your service throws errors exclusively when it calls Service B, and is perfectly healthy on every other path, that's high distinctiveness. The folk wisdom that "the service throwing the error is usually not the one at fault" (Service A returns 503s because Service B is slow and exhausting A's connection pool) is a distinctiveness reading. A fails only when touching B; the cause is the entity, B, not A. Engineers know this as lore. Kelley gives it a name and a slot in a system.

Consensus becomes the cheapest, most decisive question in all of incident response: is everyone hitting this right now, or just me? If every team in the region is paging at once, the cause is the shared environment (the platform, the zone, the common dependency) and it is categorically not your code. If it's only you, while everyone else is green, then it's yours. The entire distance between "my service is broken" and "the whole region is degraded" is a single consensus check.

So the RCA truth table falls right out of the social-psych one:

High consistency + low distinctiveness + low consensus → it's your component. It fails the same way every time, on every input, and nobody else is affected. Own it. Roll back, patch, fix.
High consistency + high distinctiveness + high consensus → it's the entity. It fails reliably, but only on the path that touches one dependency, and everyone who touches that dependency fails too. Stop reading your own code; the cause is the shared downstream thing.
Low consistency → it's a circumstance. You can't reproduce it. It was a transient. And, this is the part engineers hate, that is permission to stop hunting for a permanent root cause. Forcing a deep structural explanation onto a one-off blip is how you ship a complicated "fix" for a thing that was never going to happen again.

None of this is novel as behavior. You already ask "is it flaky?" (consistency), "is it just this endpoint?" (distinctiveness), and occasionally "is it just me?" (consensus). What you almost certainly don't have is a unified decision procedure: the thing that says here are the three signals, here is how their pattern resolves to a locus, run them in this order. A grounded-theory study of professional debugging published in 2026 confirms the gap from the other direction: debugging-as-cognition is well studied, but nobody has run it through the attribution-theory lens. Engineering reinvented Kelley's three axes, badly and piecemeal, and never noticed the finished version had been sitting in a psychology journal since the Summer of Love.

The axis you're built to skip is the one that ends the incident

Here's where the 1967 model stops being a neat mapping and starts being uncomfortably clairvoyant.

Kelley's model has a famous weakness: it assumes you'll actually gather and rationally weigh all three signals. Real humans don't. They use partial, biased information, and they fail in a patterned way. The pattern has a name: the Fundamental Attribution Error. When we explain other people's behavior, we systematically overweight the person and underweight the situation. We assume the driver who cut us off is a jerk (disposition) rather than rushing to a hospital (circumstance). We blame the actor and skip the context.

The on-call engineer at 3 a.m., tunnel-visioned on their own logs while the region is on fire, is the Fundamental Attribution Error wearing a hoodie. The "it's my code" reflex is dispositional over-attribution. You reach for the person locus, your own component, because that's the cognitive default, and you under-run the consensus check because consensus is the situational signal humans are documented to neglect.

That neglect is not a vibe; it's measured. The consensus-information literature (McArthur's work in the early 1970s, and studies like Lowe and Kassin's 1977 paper on the use of consensus) found that people underuse consensus and base-rate information relative to consistency and distinctiveness. They fixate on the individual and skip "what is everyone else doing?" This is the same family of mistake as the base-rate fallacy: the population-level signal is the most diagnostic and the least consulted.

Now line the two facts up, because together they're the whole point of this essay. The consensus axis is simultaneously the most decisive (it instantly separates "your bug" from "platform outage," the single most expensive misdiagnosis in operations) and the one human cognition is documented to default-skip. The cheapest, fastest, most incident-ending check is precisely the one psychology predicted, six decades ago, that you would forget to run.

And then the literature hands you the fix, too. Wells and Harvey showed in 1977 that people can and do use consensus information when its relevance is made salient. The neglect is a default, not a defect, and it's correctable by process. Which reframes something every engineer has rolled their eyes at. The runbook step that says, in shouty capitals, "CHECK THE STATUS PAGE AND ASK #incidents BEFORE TOUCHING YOUR CODE" is not bureaucracy and it is not condescension. It is a documented cognitive de-biasing intervention. It exists to force the salience of the consensus axis precisely because, left to instinct, you will skip it. Good incident process is applied social psychology, whether or not anyone in the room knows Kelley's name.

Two honest caveats, because this is an analogy

Worth saying plainly: Kelley built this to explain intentional human social behavior. Servers do not have dispositions; a connection pool is not "being difficult." So porting the model onto mechanical failure is a deliberate analogy, not a claim that systems have inner lives. What makes the port honest is that the underlying logic, covariation, is locus-agnostic. "The cause is the factor that reliably travels with the effect" is just as true of a flaky dependency as of a sarcastic coworker. The math doesn't care whether the thing it's explaining has a soul. That's why it transfers cleanly; it was never really about people, it was about evidence.

And Kelley is not a replacement for the RCA methods you already use. The 5 Whys walks you down a causal chain; a fishbone diagram fans out the categories of possible cause. Those are good tools and they answer different questions. Kelley is sharper on one specific step those methods are vague about: the attribution step, given a failure, which of the three loci does it belong to? Run Kelley first to localize the cause to component, entity, or circumstance; then run 5 Whys within that locus to walk the chain to the bottom. They compose. Kelley tells you which wall the ladder goes against; 5 Whys climbs it.

What to run on your next incident

Here is the procedure, and the order is the entire trick.

Run consensus first. Before you open your own logs, before you re-read your diff, before you build a single theory about your code, check the population. Status page. Regional dashboards. A one-line "anyone else seeing elevated errors in us-east-1?" in the incident channel. It is the cheapest check you have and it short-circuits the most expensive failure mode in the job: hours spent debugging healthy code during someone else's outage. You are running it first specifically because your brain wants to run it last.

Then distinctiveness. If it is just you, narrow it: does the failure attach to one input, one endpoint, one downstream call? High distinctiveness points away from your component and toward that entity, the dependency, the payload, the specific path. "It only breaks when we call B" is a finding, not a footnote.

Then consistency. Try to reproduce. Ten times. High consistency means there's a real structural cause worth the deep hunt. Low consistency is Kelley's circumstance attribution, and it is permission to stop: to log it, set an alert, and not spend your weekend engineering a baroque fix for a transient that has already left the building.

Harold Kelley died in 2003, decades before "blast radius" entered the SRE vocabulary, and blast radius, when you look at it squarely, is just the consensus axis with a dashboard, the same "who else is affected?" question SRE spent ten years building tooling to compute. He never saw a distributed trace, never paged at 3 a.m., never owned a service. But he wrote down, in 1967, the complete logic for finding the true cause of a thing, and, more usefully, he told you which step you'd skip and how to stop skipping it. The next time you're three terminals deep at 3 a.m., certain it's your code: that certainty is the bias. Run consensus first.

Sources

Wikipedia, "Covariation model" and "Base rate fallacy"; iResearchNet, "Kelley's Covariation Model"; Simply Psychology, "Attribution Theory" (covering Kelley 1967, Heider 1958, Weiner 1974/1985, and the Fundamental Attribution Error).
Lowe & Kassin, "On the Use of Consensus: Prediction, Attribution, and Evaluation," Personality and Social Psychology Bulletin (1977).
Datadog, "What is Root Cause Analysis?"; Middleware, "Identify Root Cause Analysis in Distributed Systems"; DEV Community, "Root Cause Analysis: The Complete Guide for SREs"; Lightrun, "Why Blast Radius Analysis Does Not End When Alerts Fire"; Arvo AI, "Root Cause Analysis for SREs: 5 Whys, Fishbone, and AI."
arXiv, "A Grounded Theory of Debugging in Professional Software Engineering Practice" (2026).

You can only run the three checks on a record you can trust.

Consensus asks "is it just this one actor, or all of them?" Distinctiveness asks "does it fail only on this one input or dependency?" Consistency asks "does it fail every time?" All three are cross-actor, cross-input, cross-time questions, and you can only answer them from a record of what actually happened. When the actors are autonomous agents, a per-agent self-report cannot establish consensus, and a fleet-wide success rate cannot tell you which agent failed on which input. Chain of Consciousness anchors every agent action to a verifiable external record, so you can actually compute consensus, distinctiveness, and consistency instead of guessing from your own logs.

See a verified provenance chain · Hosted Chain of Consciousness

pip install chain-of-consciousness · npm install chain-of-consciousness

← Back to all posts