When two dashboards disagree and neither is broken, you haven't found a bug. You've proven they were computed from different objects.
You have seen this meeting. Two dashboards are projected on the wall, and they disagree. One says last week's error rate was 0.3%; the other says 2.1%. Someone asks the obvious question (which one is right?) and the room goes quiet, because nobody actually knows. For the next forty minutes, two data engineers dive through repositories trying to reverse-engineer how each number is computed: this one filters out health-check traffic, that one counts retries as separate failures, this one rolls up by minute and that one by request, and both were "validated" by someone who has since changed teams. The meeting ends with no decision and an action item to "align on the definition."
Here is the thing nobody in that room said out loud: neither dashboard is broken. They are both faithfully computing what they were told to compute. The contradiction isn't a bug in one of the numbers. It is proof of a structural fact: those two metrics are not derived from the same underlying object. You don't have one source of truth with a measurement error. You have two sources of truth, and they were always going to drift apart, because nothing was holding them together.
Physics ran into this exact wall in the 1800s and walked through it. The solution has a name, the partition function, and once you see it, you cannot unsee how badly most observability stacks violate the principle behind it.
In statistical mechanics, you want to know the macroscopic properties of a system made of an absurd number of particles: its energy, its entropy, its pressure, how much heat it soaks up when you warm it. The naïve approach is to go measure all of those things, each on its own terms. The approach Ludwig Boltzmann and Josiah Willard Gibbs built instead, in the late nineteenth century, was to compute a single quantity first and get everything else for free.
That quantity is the partition function, written Z. In the canonical ensemble it's a sum over every possible microscopic state of the system, each weighted by a Boltzmann factor:
Z = Σᵢ exp(−Eᵢ / k_B T)
You do not need to love the formula. You need exactly one fact about it, and it is the entire point of this essay. As the standard references put it, Z is "the central tool of statistical mechanics, connecting microscopic energy levels to macroscopic thermodynamic quantities," and "most aggregate thermodynamic variables of the system can be expressed in terms of the partition function or its derivatives." You compute Z once. Then every property you care about is a derivative of it (technically, of its logarithm; store ln Z and you've stored the master key):
One object, one logarithm, a fistful of derivatives, and the whole of thermodynamics drops out the bottom. You never measure twenty things. You measure one and you differentiate.
Now notice what that buys you, because it's the part the meeting at the top of this essay was missing. Energy and entropy and pressure are not four independent measurements that happen to agree. They are four derivatives of the same Z, which means they cannot contradict each other. Consistency isn't something you check for afterward; it's structural, baked in by construction. And the converse is the diagnosis you needed: if two thermodynamic quantities ever come out inconsistent, you didn't make a measurement error. You computed two different Zs somewhere, and that is a bug by definition.
Before we cross the bridge into software, one more physics fact, because it's the strongest card in the deck and it indicts a specific, ubiquitous engineering sin.
I said first derivatives of ln Z give you averages: mean energy, mean pressure. Second derivatives give you something else: fluctuations, the statistical spread of the system around those averages. And here is the beautiful part. Heat capacity, which is a perfectly ordinary thing you can measure with a thermometer and a heater, turns out to be literally the variance of the energy:
C_V = (⟨E²⟩ − ⟨E⟩²) / (k_B T²)
Read that slowly. The response of the system, how it reacts when you push on it, is the same object as how much its underlying microstates wobble. The measured quantity and its statistical uncertainty are not computed in two different places by two different methods. They are the first and second derivatives of the one Z, guaranteed to be talking about the same reality. This is the fluctuation–response relation, and it is one of the deepest little gems in physics.
Hold that next to how your dashboards actually work. You have a pipeline that computes p99 latency. Do you have the same pipeline computing how noisy that p99 is, its confidence interval, its sample size, how much to trust it this minute? Almost certainly not. The number lives in one system and its reliability, if it's tracked at all, lives in another, and the two were built by different people at different times against different snapshots. Physics says that's backwards. In a properly built system the aggregate and its error bar and its sensitivity are all derivatives of the one master object, and they are consistent for the same reason the energy and the entropy are. A metric whose uncertainty is computed by a separate pipeline from the metric itself is exactly the arrangement Z forbids.
So why do our metrics contradict each other? Because we do the opposite of the partition-function discipline, systematically, as a matter of architecture.
The dominant pattern is independent computation. Every metric gets its own pipeline, its own definition, its own storage, and its own slow drift away from all the others. The canonical example is the "three pillars" model of observability: metrics, logs, and traces, stored separately. The observability vendor Honeycomb, whose CTO Charity Majors has been the loudest voice on this, describes that model with admirable bluntness. It means "paying to store your data three different times in three different ways," producing "three shittier versions of the same data" that then disagree. (Honeycomb is a vendor with a horse in this race, so take "Observability 2.0" as their framing of the pattern rather than neutral gospel, but the structural critique is corroborated everywhere you look.) The analytics world has the identical disease under a different name: stacks that "juggle 5–10+ BI and analytics tools," where "each has its own version of truth" and the numbers only agree inside a single tool.
Every one of those independent pipelines is a separate Z. Metrics, logs, and traces aren't three different kinds of truth; they're three pre-computed derivatives of the same underlying events, snapshotted at different moments by different code and then allowed to drift. Of course they contradict each other. You built them to. The miracle would be if they agreed.
Here's what makes this more than a cute analogy. Software didn't need a physicist to show up and explain Gibbs. Several corners of the industry independently converged on the partition-function discipline, under their own names, because the problem forces the solution. When three unrelated communities reinvent the same structure, that structure is probably real.
Observability 2.0, in Honeycomb's framing, is the principle almost verbatim: "one source of truth, wide structured log events, from which you can derive all the other data types." Metrics, logs, and traces stop being three stored things and become three queries against one stored thing. Wide events are Z; the three pillars were three guesses at derivatives you should have computed on demand.
Event sourcing is the same move in the data layer: "the events are facts; if you keep them around, you can use them as a datasource… the current state is derived." You don't store the current balance of the account and mutate it; you store every transaction and derive the balance. The event log is Z. Application state is a derivative. The log is the thing that's true; everything else is computed from it.
The semantic layer (or "headless metrics layer") is the discipline applied to business intelligence: a single place where each metric is defined once, which "allows multiple tools to query the same definitions." Instead of five dashboards each implementing "active user" slightly differently, there is one definition that all of them are derivatives of, which is the move that ends the forty-minute meeting about which error rate is correct.
The OLAP cube, the materialized view, the "One Big Table": all variations on "a metric is an aggregate measure expression evaluated in a dimensional context," derived from a single fact table rather than recomputed independently in a dozen places.
Four communities, four vocabularies, one idea: store the master object, derive the rest. That is the partition function with a backlog and a Jira board.
This isn't only a way to feel smug about a meeting. It gives you two concrete, usable tools.
First: treat contradiction as a diagnosis, not noise. When two of your dashboards disagree, you have just empirically proven that they are not derivatives of the same object. Stop arguing about which number is "right"; that's the wrong question, because in a multiple-Z world both are locally correct and globally meaningless. The disagreement is a pointer straight at the architectural defect: you have more than one source of truth. The fix is never "reconcile the two numbers." The fix is "make them both derivatives of one object so they can't disagree again." In physics, inconsistent thermodynamic quantities are an immediate, unambiguous signal that you computed two Zs. Let your metric contradictions carry the same weight: they're not embarrassing data-quality hiccups, they're the system telling you exactly where the second Z is hiding.
Second: make "new question, not new pipeline" the test of whether you have a Z at all. This is the cleanest litmus test in the whole essay. When a new question comes in ("what's the p99 for enterprise customers on the mobile client in EU regions during deploys?"), what does answering it require? If it requires building a new pipeline, defining a new metric, and wiring up new storage, you don't have a partition function; you have a pile of pre-computed answers to questions someone guessed in advance, and you're about to manufacture yet another drift-prone Z. But if you can answer it as a new query against the existing wide-event store, a new derivative of the object you already have, then you've got a real Z, and you can answer questions nobody thought to ask when you built it. The free energy of a system lets a physicist derive a quantity no one had in mind when they computed Z. A genuine source of truth does the same for you: it answers the questions of the future, because the future is just more derivatives.
So the practical reframe is this. The single source of truth is not a database, or a dashboard, or a "metrics platform" you bought. It's a specific object (the raw event log, the canonical state, the wide structured events, whatever your equivalent of Z is) that all of your numbers are secretly functions of whether you've named it or not. Your job is to find that object, store it as the thing that's true, and derive every metric, every dashboard, every alert as a transformation of it. The moment you compute a number independently of that object (a quick side pipeline, a one-off rollup, a second tool with its own definition), you have minted a second Z, and second Zs are where every contradiction you will ever debug is born.
There's a small, pleasing footnote here for anyone who's worked with softmax. That function divides by a normalizing constant and then throws it away as bookkeeping, and that discarded denominator is the partition function. The boring normalizer everyone scrolls past is, in statistical mechanics, the single most information-rich object in the system: the one quantity from which all of thermodynamics can be recovered. Most observability stacks make the same mistake softmax does. They keep the pretty derived numbers and throw away the master object underneath. Keep the Z. Derive the rest. Boltzmann had this sorted out before the lightbulb was a consumer product.
For an agent fleet, the master object is the record of what each agent actually did.
Your fleet's success rate, its cost, its trust score, and every per-agent dashboard are all derivatives of one thing: what each agent actually did, step by step. If those numbers are computed from per-agent self-reports, each report is a second Z, and they will drift apart exactly the way the two error-rate dashboards did. Chain of Consciousness anchors every agent action to a single verifiable external record, so every metric you derive is a derivative of the same Z, and two of them can't quietly contradict each other.
See a verified provenance chain · Hosted Chain of Consciousness
pip install chain-of-consciousness · npm install chain-of-consciousness