Evolvability

The winning trait isn't current fitness, it's the capacity to change fast. Fitness wins the quarter; evolvability wins the decade.

Published June 2026 · 10 min read

On the 24th of February, 1988, the biologist Richard Lenski put twelve genetically identical populations of E. coli into twelve identical flasks of glucose broth and started what is now the longest-running evolution experiment in history. The bacteria have been dividing ever since, more than 75,000 generations, roughly the equivalent of a million-plus years of human evolution, all under his lab's careful watch. Twelve lines, same ancestor, same food, same world. By any reasonable theory of "current fitness," they should have stayed more or less interchangeable: twelve solutions to the same fixed problem, converging on the same answer.

For about thirty thousand generations, they roughly did. And then one of them did something extraordinary.

Around generation 31,500, a single population, line Ara-3, evolved the ability to eat citrate, the compound Lenski had been using merely as a buffer in the broth. This is a big deal, because the inability to metabolize citrate aerobically is practically a defining trait of E. coli; it's on the diagnostic checklist. One lineage broke the rule and unlocked a whole new food source the other eleven were swimming in and couldn't touch. The obvious explanation is luck, a rare mutation that happened to land in Ara-3. But when Zachary Blount and Lenski did the painstaking work of replaying the tape from frozen samples, they found something far more interesting. The Cit+ innovation wasn't a single lucky hit. It was contingent on earlier "potentiating" mutations, changes that produced no citrate-eating on their own but quietly rebuilt the lineage's genetic architecture so that the final mutation could work. Only Ara-3's later genotypes could get there. The others, replayed again and again, could not.

Here is the punchline that should reorganize how you think about competition: those twelve lines were equally fit for thirty thousand generations, and they had wildly unequal futures, and the thing that differed was not their fitness. It was their evolvability: their capacity to become something new. The other eleven are still, after 75,000 generations, exactly as good at eating glucose and exactly as helpless with citrate. One of them rebuilt itself into a lineage that could invent. The unit of long-run success was never how fit you are today. It's how fast you can change.

What evolvability actually is

In 1996, the evolutionary biologists Günter Wagner and Lee Altenberg gave this idea its rigorous form in a paper called "Complex Adaptations and the Evolution of Evolvability." Their argument: for adaptation to happen at all, a lineage needs evolvability, "the ability of random variations to sometimes produce improvement," and that ability is not a given. It depends entirely on how genetic variation maps onto phenotypic variation. Two organisms can be equally fit right now and have radically different abilities to generate useful change, because their underlying architecture translates mutations into outcomes differently.

This is the meta-level insight that quietly overturns naive Darwinism. We picture natural selection as a contest of current fitness: the fastest gazelle, the strongest lion, the most optimized organism wins. But selection acts on the present while the future keeps moving, and the lineages that persist across deep time are not the ones that were most fit at any single moment; they're the ones that could keep re-fitting as the moment changed. Fitness is a snapshot. Evolvability is the capacity to take the next snapshot, and the one after that. And it lives not in the phenotype you can measure today but in the architecture underneath it.

Modular genomes vs. tangled ones

So what makes one architecture evolvable and another a dead end? The axis is modularity versus pleiotropy.

Pleiotropy is when one gene affects many traits at once. A genome saturated with pleiotropy is one where everything is entangled with everything, and that sounds efficient until the environment shifts and you need to change one thing. You can't. Every mutation that improves trait A also wrecks traits B, C, and D, because the same genes underlie all of them. Selection's hands are tied; the lineage is perfectly fit for the world it's in and structurally incapable of tracking the world it's moving toward. Wagner and Altenberg's definition of a modular architecture is precisely the opposite: a genotype-phenotype map with few pleiotropic effects between characters serving different functions, where the entanglements that remain are kept within a functional unit, not smeared across all of them. A modular genome can vary one function without disturbing the others. It can experiment cheaply.

The engine that builds modularity is one of evolution's most elegant tricks: gene duplication. When a gene is accidentally copied, the spare is freed from its job, the original still does the work, so the copy can mutate, drift, and explore without breaking anything. The vertebrate globin family is the textbook case: hemoglobin (which carries oxygen in your blood), myoglobin (which stores it in your muscle), and their cousins all descend from duplications of a single ancestral oxygen-binding gene. Duplication let one function split into a division of labor, carrier here, store there, each free to specialize because each had its own module to evolve in. A duplicated, modular genome is a re-combinable toolkit. A tangled, pleiotropic one is a house of cards where you can't move a single card.

You measure fitness. You don't measure this.

Every codebase, every architecture, every organization sits on exactly this axis, and here is the uncomfortable part: we obsess over fitness and almost never measure evolvability. We track performance, feature completeness, uptime, this quarter's revenue, the velocity chart, all snapshots of current fitness. We have dashboards for them. We have almost nothing that answers the question that actually predicts the decade: how cheaply can this system generate and absorb change?

And the two foundational texts of software architecture are, it turns out, describing modularity and pleiotropy in exactly the biologist's terms. In 1972, David Parnas published "On the Criteria To Be Used in Decomposing Systems Into Modules," and its core idea, information hiding, is the engineering definition of a modular genome: hide inside each module the design decisions most likely to change, so that when one of them changes, you alter one module and nothing else. That is "vary one trait without breaking the others," written for software a quarter-century before evolvability had a name. It is the loosely-coupled, re-combinable codebase: the duplicated genome that can specialize a part without shattering the whole.

And its opposite has a name too. In 1997 Brian Foote and Joseph Yoder described the Big Ball of Mud, "a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle." Read their diagnosis of why it can't change and you are reading the definition of pleiotropy: "information is shared promiscuously among distant elements of the system." That is a genome where every gene affects everything. The tightly-coupled codebase is the pleiotropic genome: perfectly capable of shipping features today, and structurally unable to vary one part without rippling damage through ten others. It is fit, and it is doomed the moment its environment moves, because it cannot re-fit.

The proof at company scale

The clearest real-world demonstration is organizational. Around 2002, Amazon issued a now-famous internal mandate: every team would expose its data and functionality only through hard service interfaces, and teams would communicate only through those interfaces, no shared databases, no back-door reads into another team's guts, no exceptions. Strip the legend away and it is Parnas's information hiding applied to an entire company: each team became a module, its internal decisions hidden behind a contract, free to change as long as the interface held.

What that bought Amazon was not a better 2002. It was evolvability. A modular, interface-bounded organization can generate and absorb change, new services, new lines of business, at a cost a tangled monolith-org cannot match, because a change in one team no longer ripples through all the others. AWS, a business almost no one foresaw in 2002, grew directly out of that modular substrate: once everything was already a clean service behind an interface, exposing those services to the outside world was a recombination, not a rebuild. A competitor who was more fit than Amazon in 2002, more revenue, more features, a slicker product, but architecturally pleiotropic could not track the decade that followed. Fitness won that competitor the quarter. Evolvability won Amazon the decade.

Why this is invisible until it's fatal

The cruelest property of evolvability is that it generates no current-quarter signal. The pleiotropic, tangled system looks great right now. It ships features, it hits the metrics, it satisfies every fitness measurement you have, often more impressively than the modular alternative, because all the energy that a modular system spends on clean interfaces and loose coupling, the tangled one spends on raw output. So every dashboard you own rewards the architecture that will kill you. You cannot distinguish fit-and-evolvable from fit-and-doomed by looking at this quarter; the two are indistinguishable right up to the moment the environment moves, a new platform, a new competitor, a new regulation, a new scale, and one of them smoothly tracks the change while the other discovers it cannot vary a single trait without shattering the rest. The bill for low evolvability is real, large, and always paid later, which is exactly why it never wins the argument in the planning meeting. The over-specialized lineage is magnificent in its stable flask and helpless the day the medium changes.

The honest exception

This is not a blanket sermon for modularity, because evolvability has a cost, and in the wrong environment that cost is waste. In a genuinely stable environment, fitness is the right thing to optimize, the specialist beats the generalist in a niche that doesn't move, and the bamboo-eating panda is exquisitely, profitably adapted right up until the bamboo is in trouble. In software, premature modularity is a real and common failure: the speculative abstraction layer, the microservices split that buys flexibility you never use while you pay its coordination tax every day, the "what if we need to swap the database" interface for a database you will never swap. YAGNI, you aren't gonna need it, is the engineering name for over-investing in evolvability you have no environment to spend.

So the actual discipline is reading which environment you're in. In a stable one, optimize fitness and don't apologize. In a changing one, optimize evolvability even at a fitness cost. The catch, the reason the lean should usually tilt toward evolvability, is that most software environments are not stable. Your platform shifts, your scale shifts, your competitors shift, the very expectations of users shift, and they do it on a timescale shorter than the lifespan of your codebase. You are, almost certainly, in a changing environment. You are an E. coli line in a flask whose medium will not stay the same.

What to do on Monday

Start measuring the thing you've been ignoring. You already measure fitness obsessively; add a few cheap proxies for evolvability and watch them like you watch latency:

Blast radius. For a typical change, how many modules must you touch? That number is your pleiotropy index. If a one-line behavior change forces edits across eight files in five subsystems, your genome is tangled, and no amount of current velocity is buying you a future.
Lead time for change. The DORA research has been telling us for years that how fast you can safely ship a change is among the strongest predictors of long-run performance. That's not a productivity metric. It's an evolvability metric.
Stranger's reach. How long before a new engineer can safely change a part of the system they've never seen? Modularity is what makes a part comprehensible in isolation; if every change requires knowing everything, you have a pleiotropic genome.
Coupling failures. How often does a change in module A break module B? Each such surprise is information shared too promiscuously among distant elements, mud accumulating.

Then budget for evolvability the way evolution does, by investing in the duplication-and-divergence, the clean interfaces, the information hiding, that produce no feature this quarter and determine everything next year. The work will never win against a feature in a fitness comparison, because it isn't a fitness improvement; it's an evolvability improvement, and you have to value it on a different axis or you will always, rationally, defer it into a Big Ball of Mud.

The deepest reframe is to stop asking "is my system winning right now?" and start asking "if the world changes next year, and it will, can my system change with it, cheaply, without breaking?" The competitor who is slightly less fit than you today but twice as evolvable is not your equal. They are your successor, because they will out-adapt you to the world neither of you has seen yet. Fitness wins the quarter. Evolvability wins the decade. Lenski's flasks have been making that case, one generation at a time, since 1988, and eleven of the twelve are still waiting for a future the twelfth already built.

Sources: Richard Lenski's Long-Term Evolution Experiment (12 E. coli populations, begun 24 Feb 1988; 75,000+ generations); the Cit+ citrate-metabolism innovation in one population (~generation 31,500), shown to be contingent on earlier potentiating mutations (i.e., evolvability), not a lone lucky hit; the other 11 lines never evolved it (Wikipedia, "E. coli long-term evolution experiment"; Z. Blount et al., 2008, 2012; Wikipedia, "Zachary Blount"). G. Wagner & L. Altenberg, "Perspective: Complex Adaptations and the Evolution of Evolvability," Evolution 50(3):967–976 (1996): evolvability; modularity (few pleiotropic effects between characters of different functions) improves it; pleiotropy constrains it. Gene duplication and the globin gene family (hemoglobin/myoglobin/cytoglobin from ancestral-gene and whole-genome duplications, a division of labor): Gene Duplication and Evolutionary Innovations in Hemoglobin-Oxygen Transport (PMC); Storz et al., "Functional diversification of vertebrate globins." D. Parnas, "On the Criteria To Be Used in Decomposing Systems Into Modules," Communications of the ACM (Dec 1972): information hiding. B. Foote & J. Yoder, "Big Ball of Mud" (1997): the tangled antipattern, "information is shared promiscuously among distant elements of the system" (pleiotropy in code). Amazon's ~2002 internal service-interface mandate (teams communicate only through hard interfaces), the modular substrate from which AWS emerged, widely reported (e.g., Steve Yegge's 2011 "platforms" account). DORA / Accelerate research: lead time for change as a key predictor of long-run software-delivery performance (an evolvability proxy).

Build the modular trust substrate now; the decade recombines out of it.

An agent that tops this quarter's benchmark is fit. Whether it still earns trust when the platform, the scale, and the rivals all move is evolvability, and that lives in the architecture, not the score. The Agent Trust Stack is that architecture as separable modules behind clean interfaces: portable identity, reputation built from verified outcomes, and a tamper-evident record of what the agent actually did, each able to change without breaking the others. It is Parnas's information hiding for trust, the modular substrate the next capability recombines out of instead of rebuilds.

See a verified action chain

pip install agent-trust-stack · npm install chain-of-consciousness agent-rating-protocol

← Back to all posts