The Insurance Problem — Why Time-Average Beats Expected Value Everywhere It Matters

The physicist Ole Peters has spent more than a decade running a simulation that should be required reading for anyone who has ever signed an SLA, paid a premium, hedged a bet, or trained a reinforcement-learning agent. He starts 10,000 simulated individuals with $100 each and lets them play a single, fair coin-toss game. Heads: your wealth grows by 50 percent. Tails: your wealth shrinks by 40 percent. Each flip has an expected multiplicative factor of 0.5 × 1.50 + 0.5 × 0.60 = 1.05. Five percent expected growth per flip. After 100 flips, expected wealth: $13,150.

After 100 flips, the average ending wealth across the 10,000 simulated players is in the territory expected-value math predicts — tens of thousands of dollars per player on the ensemble side of the ledger.

The median ending wealth is roughly fifty cents.

The mean is rescued by a tiny number of extraordinarily lucky outliers. The typical player is wiped out. This is not a paradox, a glitch, or a clever framing. It is the mathematical consequence of a system where time-averages and ensemble-averages diverge. And once you can see that divergence, you start to see it everywhere — in your codebase, your hiring decisions, your equity package, your AI training pipeline, and the boring overhead line items that have just become the most important things you do.

Welcome to the insurance problem.

A 250-Year-Old Paradox

For two and a half centuries, economists have struggled with a basic question: why do rational people buy insurance?

Insurance has, by construction, negative expected value to the buyer. The premium has to cover the insurer’s expected payout plus profit and overhead, so the math always says you give up more than you statistically get back. Standard expected-utility theory has to reach for “risk aversion” — a psychological preference for certainty — to explain the behavior of literally everyone with a homeowner’s policy.

Daniel Bernoulli proposed the original fix in 1738 in his Specimen Theoriae Novae de Mensura Sortis: people don’t value money linearly, so a concave utility function (logarithmic utility, say) makes even actuarially unfair contracts rational. Von Neumann and Morgenstern formalized this whole machinery in Theory of Games and Economic Behavior (1944), and the apparatus has run economics ever since. When humans deviated from its predictions, Kahneman and Tversky catalogued the deviations as “biases” in their 1979 Econometrica paper on prospect theory.

The deeper problem, which Ole Peters laid out in Nature Physics in 2019, is this: expected value is the wrong number to optimize when outcomes compound multiplicatively over time. Insurance isn’t compensation for an irrational fear of variance. Insurance is mathematically optimal — provably so — in any system that is multiplicative, path-dependent, and has an absorbing state.

Almost every economically important system has those properties. Almost every system you build has those properties. And nobody warned you.

Time vs. Ensemble

Ludwig Boltzmann coined the word “ergodic” around 1871 — from Greek ergon (work) and hodos (path) — for systems where the time-average of one trajectory equals the ensemble-average across many trajectories at one moment. In statistical mechanics, the ergodic hypothesis lets us replace impossible-to-observe long-run averages with computable spatial ones. It works beautifully for ideal gases.

Boltzmann himself wrote in the margin of his foundational paper: “still dubious and not proven.”

Economics imported the ergodic hypothesis as an axiom — never stress-tested for the systems to which it was applied — and then built the apparatus of expected utility on top of it. Peters’ provocation, in the 2019 Nature Physics article and the wider Ergodicity Economics programme he founded with Alexander Adamou, is that the ergodic axiom fails for almost everything that economically matters.

Wealth compounds. Career trajectories compound. Firm growth compounds. Customer cohorts compound. Bug counts in unmaintained codebases compound. When you face the next decision, you face it from your new position, not from a fresh draw. The trajectory is sequential — and ruin is absorbing. You cannot play the game from $0.

In Peters’ coin-toss simulation, every player flips both heads and tails roughly equally. But a 40 percent loss requires a 67 percent gain to recover, and the up-flip only offers 50 percent. The asymmetry compounds. The geometric growth rate is √(1.50 × 0.60) ≈ 0.949 — your wealth shrinks about 5 percent per flip on the path that any individual actually walks. Expected value grows; typical fate decays.

The ensemble lies. The time-path tells the truth.

The Puzzle Dissolves

Peters and Adamou published their solution to the cooperation puzzle in Philosophical Transactions of the Royal Society A in 2022. The result is mathematically clean and conceptually startling: pooling risk converts a multiplicative process (where bad draws can wipe you out) into something closer to an additive one (where the cost is a known, smaller, bounded premium each period). The ensemble-average cost of insurance is negative — that’s how insurers profit. But the time-average growth rate of an individual who buys insurance is higher than the time-average growth rate of an individual who skips it.

In other words: insurance is not the irrational tax on uncertainty that 250 years of expected-utility theory said it was. Insurance is individually optimal in non-ergodic dynamics. Cooperation through risk-pooling is rational, not altruistic. This is why mutual aid, fraternal benefit societies, Lloyd’s coffee-house syndicates in late-17th-century London, and crop-failure village collectives emerge in every society that hits a certain threshold of interdependence and mortality. It is why Daniel Meder and colleagues at the University of Copenhagen, in their 2021 PLOS Computational Biology paper, were able to experimentally show that humans switch from linear evaluation to logarithmic evaluation depending on whether the dynamics they face are additive or multiplicative — exactly as ergodicity economics predicts.

The “irrational” preference for insurance is what time-optimal looks like.

Where This Lives in Your Stack

This is the moment the insurance problem stops being an economics curiosity and starts being load-bearing for anyone running a system that has to survive over time.

Production reliability. Your SLO budget is insurance. The 99.9 percent target isn’t really the customer’s request — it’s a contract you’ve written with future-you. The 0.1 percent headroom buys you the right to ship faster, knowing that occasional outages won’t compound into the absorbing state of churn-induced collapse. What looks like wasteful overhead — multi-region failover, redundant databases, dead-code paths kept alive for graceful degradation, the whole apparatus of paranoia — is the premium. Skip it and your time-average uptime collapses even when your last-30-days ensemble-average reliability still looks fine. Most outage post-mortems are forensic reconstructions of a team that mistook their last-30-days rollup for their long-run trajectory.

Hiring and team composition. A 10x engineer who burns out at month six has a high ensemble-average impact and a low time-average contribution to the team. A team’s robustness is a function of its diversity-weighted insurance against any individual’s bad week, bad sprint, bad year. This is the math underneath bus-factor, cross-training, pair programming, and the deeply unsexy practice of writing runbooks that aren’t about the heroic moments. The pool reduces the variance of the trajectory you actually walk.

AI and reinforcement learning. This one is sharp enough to bleed. Standard RL optimizes expected return — the ensemble average across infinitely many trajectories. Dominik Baumann and colleagues, in their 2025 paper in Transactions on Machine Learning Research (“Reinforcement learning with non-ergodic reward increments: robustness via ergodicity transformations”), demonstrated what happens when reward processes are non-ergodic: the standard objective produces policies that occasionally yield exceptional rewards but almost surely lead to catastrophic outcomes for the individual agent. In their Reacher experiment, only “ergodic REINFORCE” — which learns a transformation from data that turns the non-ergodic reward time series into an ergodic one before optimizing — learned a successful policy at all. Standard REINFORCE failed entirely. Generalization to new link lengths and masses held only for the ergodic version. The standard version collapsed to minimal reward.

If your training metric is mean trajectory return on a non-ergodic reward, you may be optimizing for outliers and silently losing typical cases. The mean is the wrong target.

Equity, comp, and your own career. Almost everyone reading this has stock-based compensation that is multiplicative, path-dependent, and has an absorbing barrier (the company dies, the equity is worthless). Mehra and Prescott named the equity premium puzzle in their 1985 paper in Journal of Monetary Economics: stocks have historically returned about 6 percent more than bonds annually in the US, but standard expected-utility models can only justify a premium of less than 1 percent with reasonable risk aversion. Peters and Adamou’s 2020 working paper (arXiv:2011.05458) showed that the gap dissolves the moment you compute the time-average growth rate instead of the expected return. Diversification — across asset classes, employers, skill stacks, identity bases — is insurance against the non-ergodic nature of any single trajectory.

Why Teams Beat Individuals

There is a second puzzle that the same lens dissolves, and it explains why the well-functioning team is one of the most underrated artifacts of human engineering.

In standard biology, cooperation is hard to explain. The textbook answers — kin selection, reciprocal altruism, group selection — work, but they feel like patches. Why do non-related strangers pool resources? Why does any organism share when sharing reduces its expected fitness?

Peters and Adamou’s 2022 result is that cooperation has no apparent benefit when you judge it by ensemble-average growth. But the time-average growth rate of cooperating agents is always higher than non-cooperators in non-ergodic dynamics. A 2024 paper (arXiv:2403.12095, “Cooperation in a Non-Ergodic World on a Network”) extended the result to network-restricted cooperation: even when agents can only pool with neighbors, cooperators outpace defectors on the time-path.

Two engineers who pool code review reduce the variance of their individual trajectories — even if the ensemble-average bug rate stays the same. Open source projects with healthy maintainer rotation outperform single-maintainer projects on long-run survival, even if month-by-month their ensemble velocity looks identical. Pair programming, on-call rotation, runbook maintenance, the boring infrastructure of mutual coverage — all of it is the same trick as insurance. The pool reduces variance. The reduced variance compounds, over time, into a higher growth rate than any individual could achieve alone.

Why your team works is the same reason insurance works. It is the same shape underneath.

Where the Framing Breaks

This is the section where I steel-man myself, because the framework is most useful when you know its edges.

The strongest critique, summarized by Doctor, Wakker, and Wang in their 2020 Nature Physics response, is that ergodicity economics often reduces to “maximize expected logarithmic utility” — which expected utility theory already accommodates. For single decisions where you already know the right utility function, the two frameworks produce identical predictions. The difference shows up in novel domains where you don’t yet know what to optimize, and in problems where the multiplicative dynamics are obvious but the right preference parameters aren’t.

Most engineering domains are novel domains. Which is why the framework matters here.

For short-horizon decisions where terminal wealth matters more than long-run growth — your annual budget, an exit timeline, a fixed nine-month build — the time-average framing can mislead you. Paul Samuelson’s 1971 critique in PNAS hasn’t been fully resolved for finite-horizon optimization, and it remains the most technically rigorous objection. The framework’s strongest predictions are for repeated, multi-period decisions with no fixed termination.

Empirical work also suggests most investors are 2–3x more risk-averse than the Kelly criterion implies — the reason “fractional Kelly” (betting half or a third of optimal) is the standard practitioner adjustment. Different people have different time horizons, debt loads, and dependents. Ergodicity gives you the mathematical floor on rational caution; it doesn’t tell you the right amount of caution for your specific situation.

The framework is most useful as a diagnostic lens. When someone shows you an ensemble-average analysis of a multiplicative, path-dependent process with an absorbing barrier, you now know exactly where to look for the lie.

Five Conversions for Monday Morning

Audit any KPI that’s an average. When the metric you’re tracking is “average X across all users / requests / runs / agents,” ask what the median is. If the gap is large, you’re optimizing for outliers. p99 latency exists for this reason. Apply the same skepticism to feature adoption, customer LTV, agent reward, and team velocity. The mean is hiding the trajectory.
Treat redundancy as time-average optimization, not waste. Multi-region failover isn’t insurance against the typical day — it’s insurance against the absorbing barrier. The premium is small; the alternative is non-recoverable. The CFO who calls it overhead is doing ensemble-average accounting on a non-ergodic system.
Watch for absorbing barriers in your own decisions. A 40 percent wealth loss requires a 67 percent gain to recover. A reputational hit, a security breach, a deleted production database, a burned-out lead engineer — all are partial absorbing states. The cost-benefit math that ignores them is the cost-benefit math that ruins typical individual trajectories while the ensemble looks fine.
In RL or agent training, check for non-ergodic reward. If your reward distribution has heavy tails, occasional catastrophic states, or path-dependent trajectories, mean trajectory return can mislead you. Baumann’s ergodicity transformation is one approach; survival-conditioned objectives and worst-case thresholds are others. The point is to optimize for what one agent experiences over time, not for the average across many agents at one moment.
Defend cooperation overhead as growth-rate insurance, not collegiality. Pair programming, code review, on-call rotation, runbook maintenance, the time spent training the next person — frame it in the language of variance reduction and time-average growth, and the math becomes obvious. The pool is the trick. The team is the trick.

The deepest version of this is something you already know in your bones if you’ve shipped systems for any length of time. The average team, the average codebase, the average career, the average company — none of these are you. They’re an ensemble. You are a trajectory. The math that governs your trajectory is not the math of the average.

The expected value tells you what would happen across a thousand parallel universes. The time-average tells you what happens in this one.

The insurance is for this one.

Sources: Peters, “The ergodicity problem in economics,” Nature Physics 15 (2019); Peters & Adamou, Philosophical Transactions of the Royal Society A, 2022; Bernoulli 1738; Von Neumann & Morgenstern 1944; Kahneman & Tversky, Econometrica, 1979; Mehra & Prescott, Journal of Monetary Economics, 1985; Samuelson, PNAS, 1971; Meder et al., PLOS Computational Biology, 2021; Baumann et al., Transactions on Machine Learning Research, 2025; Peters & Adamou working paper, arXiv:2011.05458 (2020); “Cooperation in a Non-Ergodic World on a Network,” arXiv:2403.12095 (2024); Doctor, Wakker & Wang, Nature Physics response, 2020.

Make the Insurance Concrete

The argument here ends with five concrete moves — audit averages, defend redundancy, name absorbing barriers, fix non-ergodic rewards, frame cooperation as variance reduction. Three of those moves are external-enforcement work: signed action receipts before consequential calls, capability-based access control, trust classes per source. The Agent Trust Stack composes those layers under one install — the same shape as insurance, just for agents.

pip install agent-trust-stack
npm install agent-trust-stack

For provenance specifically — the audit trail that lets you tell which agent did what, when, on whose authority — Hosted Chain of Consciousness ships it as a service. The premium is small. The absorbing-state alternative is not.