Reynolds Numbers for Software: The Missing Dimensionless Groups

Physics found the ratios that let a tank model predict a battleship. Software has the Pi theorem, the data, and the toolkit. Nobody has done the experiment.

Published May 2026 · 11 min read

In 1883, Osborne Reynolds set up a glass-walled tank of water with a horizontal glass tube running through it. At the inlet he injected a thin stream of dye. At low flow speed, the dye drew a straight line down the tube — a single dark filament suspended in clear water. He opened the valve. At a certain speed the line started to wobble. A little more, and it broke into eddies. A little more, and the dye mixed with the water so completely that the inside of the tube went uniformly cloudy.

What Reynolds discovered was not that water gets messy at high speed. Anyone who has held a thumb under a faucet knew that. What he discovered was that the transition didn’t depend on velocity alone, or pipe diameter alone, or viscosity alone. It depended on a single ratio of all three. Halve the pipe diameter and double the speed; you get the same flow regime. The dye stayed laminar below a critical value of that ratio. It went turbulent above. The number you needed to know was not any of the measurements you took — it was the dimensionless combination of them.

That number is now called the Reynolds number, after Sommerfeld named it in 1908. Physics has been finding more of them ever since: the Froude number for ship hulls and wave-making resistance, the Mach number for the sound barrier, the Prandtl number for heat transfer, the Nusselt and Peclet numbers for convection and advection. There are dozens. Each one names a boundary between regimes. Each one transfers across scale.

Software has been measuring itself for fifty years. We have commits per day, change failure rate, mean time to recovery, cyclomatic complexity, test coverage, deployment frequency. Every one of those has units. Every one of those refuses to transfer between organizations. None of them is a Reynolds number.

The Buckingham Pi theorem says they should exist. The fact that we don’t have them is the loud part of this essay.

What the Pi theorem actually promises

The Buckingham Pi theorem is the result that turned dimensional analysis from a heuristic into a method. Stated bluntly: if a physical relationship involves n variables drawn from k independent fundamental dimensions, you can always rewrite the relationship in terms of p = n − k independent dimensionless groups. Joseph Bertrand proved the core idea in 1878. Edgar Buckingham formalized and popularized it in a 1914 Physical Review paper, giving the theorem its name and the “Pi” notation that stuck.

The theorem buys three things at once. First, it shrinks problems: seven variables in three dimensions collapse to four dimensionless groups, and you only have to vary four things in your experiment instead of seven. Second, it lets you scale: if every dimensionless group matches between a model and a full-size prototype, the physics is identical, regardless of the absolute scale. This is why a wind tunnel works at all. Third, it lets you estimate when you have no equation. G.I. Taylor famously calculated the Trinity bomb yield at about 22 kilotons from declassified photographs of the fireball, using only the air’s density and dimensional analysis. The classified value was around 20. Military authorities were initially alarmed at the apparent leak, then quietly horrified that high-school physics could derive a state secret from public images.

There is a catch. The theorem tells you how many dimensionless groups exist. It does not tell you which groups are the physically meaningful ones. That part is empirical. Reynolds had to do the experiment. The Pi theorem said a dimensionless group existed in the pipe-flow problem. Reynolds’ dye told us which one mattered and where it broke.

This split — the theorem is mathematical, the discovery is experimental — is the key to the rest of the essay. Software has the mathematical part. It has not done the experimental part.

The towing tank, and why this argument is economic

William Froude was an English engineer who, beginning in 1867, towed wooden ship-hull models named Swan and Raven through a tank. He built three sizes of each, and he varied the speed. What he found was that when models of the same shape were towed at speeds proportional to the square root of their length, they generated near-identical wave patterns and the wave-making resistance scaled cleanly with model weight. By 1872 he had a state-funded test tank at Torquay, and the Royal Navy stopped guessing about new hull designs.

Before Froude, every new ship was a full-scale gamble. After Froude, hull resistance for a 400-foot battleship could be predicted from a six-foot tank model at a cost of weeks instead of years. The dimensionless group involved is now called the Froude number, Fr = v/√(gL), and the navies that adopted it stopped losing money on bad hulls.

This is the part of the story software writers underrate. The point of a dimensionless group is not academic elegance. The point is that a startup with five engineers cannot afford to learn at full scale what an enterprise with five hundred is going to break. If the right dimensionless groups for software existed, a five-person team could measure them and predict, with calibrated uncertainty, which of its current practices will still be working at fifty people and which will explode. That is what Froude gave the Admiralty. The reason every new engineering org rediscovers the same scaling pains, with the same surprise, is that we don’t have the equivalent.

It gets worse. By 1921, more than twenty wind tunnels had been built around the world. They were producing huge amounts of data. The data didn’t transfer, because the Reynolds numbers of the small models at atmospheric pressure were a factor of twenty too low compared to actual aircraft in flight. NACA solved this in 1922 by pressurizing the air in the tunnel, raising the density, and matching the Reynolds number at model scale. Twenty tunnels, a decade of work, results that didn’t transfer until somebody noticed they were all measuring at the wrong place on the dimensionless axis.

Now reread that paragraph with “wind tunnel” replaced by “DORA dashboard.”

The state of software metrology

DORA — DevOps Research and Assessment, the framework that grew out of Accelerate (Forsgren, Humble, and Kim, 2018) — is the most measured, peer-reviewed body of software-performance data we have. The five metrics, expanded slightly in 2024, are these:

Deployment frequency: deploys per unit time (events / time).
Change lead time: hours or days from commit to production (time).
Failed-deployment recovery time: hours to recover (time).
Change failure rate: fraction of deploys needing immediate intervention (dimensionless ratio).
Deployment rework rate: fraction of deploys triggered by incident rather than plan (dimensionless ratio).

The DORA team are not naive about this. Their own published guidance says: “Context matters,” the metrics are “best suited for measuring one application or service at a time,” and direct cross-organizational comparison is discouraged. DORA’s also-published finding that speed and stability are not tradeoffs — top performers are good on every metric, low performers are bad on every metric — is, read with a dimensional analyst’s eye, exactly the shape of a result that begs to be re-expressed as a single dimensionless group. In laminar pipe flow, velocity doesn’t trade off against stability either. The tradeoff appears in the turbulent regime. The fact that the speed/stability axis is not a tradeoff for some teams and is a brutal one for others is the kind of phenomenon a Reynolds-like number would name.

Brooks’ law, from The Mythical Man-Month in 1975, is the other foundational empirical result we have. Fred Brooks observed that adding people to a late project makes it later, and isolated three mechanisms: ramp-up time, indivisible tasks, and communication overhead that scales as n(n − 1)/2. A team of three has three channels. A team of six has fifteen. A team of fifty has more than a thousand. Brooks’ law is dimensional analysis already half done — it relates a count (team size), a count squared (channels), a time (ramp), and an output rate (productivity). What it does not give you is the dimensionless combination of those four that predicts the inflection point. The 2022 ICSE study by the GitHub-data group found that larger teams are, on average, less productive per person; other studies have found super-linear effects. The two are not contradictory. They are signs that the unknown dimensionless group is in different regimes in the two data sets and nobody is reporting it.

Goodhart’s law is the third corner of this room. Charles Goodhart’s 1975 observation — that any measure used as a target stops being a good measure — is what makes raw software metrics fragile. Velocity becomes a target; the 3-pointer becomes a 5. Coverage becomes a target; tests get written that touch lines without exercising behavior. Lines of code becomes a target; functions inflate. A dimensional metric is easy to game because you only have to push one variable. A correctly constructed dimensionless ratio is harder to game because numerator and denominator both have to move, and inflating one without the other changes a separately observable quantity. Not impossible to game — nothing is — but structurally tougher.

Halstead, and why this is harder than it looks

Software has tried this once already, and the attempt is instructive.

Maurice Howard Halstead, in Elements of Software Science (1977), set out to do exactly what this essay is calling for. His early reports use the phrase “software physics” without irony. He counted distinct operators (η₁) and distinct operands (η₂), and total operators (N₁) and total operands (N₂), and from those four primitives he derived a set of secondary metrics with confident physics-sounding names:

Program volume: V = N × log₂(η)
Difficulty: D = (η₁/2) × (N₂/η₂)
Effort: E = D × V
Time to program: T = E / 18 (in seconds)
Delivered bugs: B = E^2/3 / 3000

This shipped. It was taught. Implementations are still in static-analysis tools today. In 2019, Derek M. Jones at Shape of Code did the thirty-second dimensional analysis Halstead never did, and the entire edifice came apart in a paragraph. Volume — an actual volume would have units of length cubed — in Halstead’s formula has the dimensions of length. The name is wrong. Difficulty, derived from operator counts and operand counts, also has the dimensions of length. Effort, the product of those two, has dimensions of length squared, which is an area, and is then declared without comment to be measurable in units of mental effort. The system has one underlying dimension — program length — relabeled three different ways and treated as if those labels were independent quantities. The Pi theorem, applied to this, would yield zero meaningful dimensionless groups, because there is only one true variable in the system. The NIST Technical Note that proposed “rationalizing Halstead’s system using dimensionless units” was confirming the criticism from the highest metrological authority in the United States.

What Halstead got wrong is worth naming carefully, because the essay’s thesis is going to be that we should try again, and we need to not make these mistakes:

He skipped the empirical step. Reynolds observed regime transitions. Halstead defined ratios at a desk and called them physics.
He used a single independent dimension. The Pi theorem requires several. You can’t make a Reynolds number from velocity alone; you need velocity, length, and viscosity. Halstead had token-count, repackaged.
He never checked dimensional consistency. The GUM framework — the Joint Committee for Guides in Metrology’s Guide to the Expression of Uncertainty in Measurement, first published in 1993 and extended through supplements into the 2020s — gives a working engineer the toolkit to do this in an afternoon. Software engineers don’t learn it.

Halstead’s failure is not a reason to abandon the program. It is the cautionary tale that tells you how to start.

What a correct attempt would look like

Sketch the variables we actually measure in modern software engineering. There are roughly ten that show up across DORA, SPACE, DevEx surveys, and tooling: team size (count), commit cadence (events / time), deployment frequency (events / time), review latency (time), test coverage (ratio), architecture surface area as count of services or modules (count), tech debt ratio (ratio), change failure rate (ratio), recovery time (time), and some measure of code complexity (typically dimensionless).

If you call the independent dimensions time, events, and entities, that’s ten variables and three dimensions, and the Pi theorem predicts roughly seven independent dimensionless groups. Not one. Seven. The same way fluid mechanics did not stop with the Reynolds number but added Froude, Mach, Prandtl, Nusselt, and Peclet.

It would be honest to say what happens when you try to write the first one down. The natural candidate — call it a software Reynolds number — would mix “forces” that move a system toward instability against “forces” that damp it. The shape would be something like:

Re_sw ≈ (active developers × commit cadence × architecture surface area) / (review latency × test coverage × debt-recovery rate)

And the moment you check dimensions, you see the problem. Active developers is a count. Commit cadence is events per time. Surface area is a count. Numerator: count squared times events per time. Review latency is time. Test coverage and a debt-recovery rate are mixed dimensionless and inverse-time. The denominator doesn’t balance the numerator. The ratio is not dimensionless as written. It is wrong in exactly the way Halstead was wrong, and we caught it in one minute instead of fifty years because the discipline of writing the dimensions next to the symbols is so cheap.

This is the work. The right next move is not to retreat to the prose version of the metaphor — “teams are like rivers” — but to do the Buckingham Pi procedure properly: choose repeating variables, form products with each remaining variable, solve for exponents that make each product dimensionless. The output is a set of candidates. Then you need the empirical step. You need the Reynolds dye. You need a regime transition observable across many organizations, and you have to vary the candidate group while holding the others fixed, and you have to see whether the transition moves with it.

Candidate regime transitions that are worth investigating:

Candidate group	Roughly	What it might name
Brooks number	communication channels × ramp time over productive throughput	The crossover where adding engineers slows the team
Deployment Mach	deploy frequency × change size over recovery capacity	The point where shipping outruns the org’s ability to recover
Tech-debt Froude	feature velocity × debt load over refactor capacity × code age	The wave-making resistance of legacy
Coverage Prandtl	test coverage × churn over defect-discovery rate × review depth	Whether testing or reviewing finds more of the real defects

These are hypotheses, not results. Reynolds’ 1883 paper was a hypothesis too, until the dye showed up.

Why the field is still empty

Nine search rounds across the obvious phrasings — “Buckingham Pi software engineering,” “Reynolds number software development,” “dimensionless software metrics” — turn up Halstead, the Shape of Code dimensional critique of Halstead, and adjacent things like SPACE and DORA. They do not turn up anyone arguing that the Pi theorem makes a quantitative prediction about software and that the prediction is testable. (I am happy to be told this is wrong; absence of a search hit is not absence of the idea.)

There are reasons. Physics had three centuries of dimensional analysis before Reynolds; software has had something like seventy years of professional existence. Metrologists are trained in uncertainty propagation; software engineers, with rare exceptions, are not. The visible measurement community in our field — DORA, SPACE, DevEx — focuses on benchmarking, which compares raw values, not on nondimensionalization, which constructs scale-invariant ratios. Goodhart’s law makes any new metric immediately suspect, because the moment one becomes a target, it stops being a measurement. And the dimensions of software are weird in a way the dimensions of fluids aren’t. The seven SI base quantities have referees. “Is team size the same dimension as count of modules?” doesn’t.

None of that is fatal. They are reasons the work is hard, not reasons the work cannot be done.

Where the analogy breaks

I owe an honest section on the limits.

Fluids are stationary — the equations governing water at 19°C don’t shift between Tuesday and Wednesday. Software engineering changes underneath you. The relevant variables in 1995 (lines of code, function points, KLOC) are not the relevant ones in 2026 (services, deploys, MTTR). A dimensionless group derived in one era might evaporate in the next, not because the math was wrong but because the underlying activity stopped being the same thing. Reynolds did not have this problem.

Fluids are also homogeneous in a way humans aren’t. The Pi theorem cleanly handles ideal cases and degrades gracefully in non-ideal ones; sociotechnical systems are aggressively non-ideal. The fluid analogue of a star engineer joining your team is hard to write. The closest physics analogy is probably non-Newtonian fluids, where viscosity itself depends on shear rate, and even those don’t change their constitutive equations when the fluid attends a conference.

Lastly, the empirical step is genuinely expensive. Reynolds had a glass tube and a few hundred trials. A controlled noise audit across a real engineering org runs into politics, ethics, and IP boundaries that physical experiments don’t. The candidate groups in this essay would need cooperative consortia — something like a DORA Working Group, with stronger statistical machinery and explicit dimensional discipline.

None of that says the program is wrong. It says the program is ambitious. The competing program is “keep measuring deploy frequency in deploys per day forever.”

What this is worth, if it works

Imagine an engineering leader at a 30-person company with three Reynolds-equivalent groups computed weekly off their git history and incident ledger. The first group is in the laminar range — their current testing-and-review practice should still work at 200 people. The second is in the transition range — their on-call rotation is going to start fraying somewhere between 80 and 120 engineers. The third is already turbulent — their architecture surface area has crossed a threshold and they are going to spend the next year fighting incidents whose root cause is “too many services touching too many other services with too little versioning discipline.”

Today, all three of those facts arrive as a surprise. They show up as an outage, a quit, a postmortem, a re-org. Tomorrow, in this hypothetical, they show up as a number on a dashboard, with a known critical value drawn on the y-axis. That dashboard does not exist. It could.

It is worth remembering what the equivalent dashboards bought their fields. Naval architects stopped guessing. Aeronautical engineers stopped flying test models that didn’t correspond to flight conditions. Heat-transfer engineers stopped sizing radiators by trial and error. G.I. Taylor stood on the lawn at Cambridge with a magazine photograph and a slide rule and derived a state secret. Dimensional analysis is the closest engineering has ever come to free intelligence about the world.

Software has the math. The data is collected. The toolkit (GUM) is older than the modern web. The first attempt (Halstead) failed in a way the second attempt could specifically avoid. The field is open. Someone should walk in and do Reynolds’ experiment.

If you do, please write down the dimensions next to the symbols.

This essay draws on the Buckingham Pi theorem (Bertrand 1878; Buckingham, Physical Review, 1914); Reynolds’ 1883 pipe-flow experiments and Sommerfeld’s 1908 naming; William Froude’s 1867–1872 towing-tank work and the Admiralty test tank at Torquay; the pre-NACA wind-tunnel Reynolds-number problem and the 1922 variable-density tunnel; the DORA framework as elaborated by Forsgren, Humble, and Kim, Accelerate (2018); Brooks, The Mythical Man-Month (1975); Goodhart’s 1975 observation; Halstead, Elements of Software Science (1977); Derek M. Jones’s 2019 dimensional analysis of Halstead at Shape of Code; the JCGM Guide to the Expression of Uncertainty in Measurement (1993, with supplements through the 2020s); and the 2024 Cambridge Core re-analysis of G.I. Taylor’s Trinity calculation. Specific figures and the “NIST TN” reference to dimensionless rationalization of Halstead are flagged for re-verification in the source dossier.

If you want the data, sign the data

Reynolds’ experiment worked because the tank had a glass wall and the dye had a colour. The first move on any program like this one is to make the underlying record legible: signed, ordered, and outside the system that produced it. Chain of Consciousness is the small primitive for that — a tamper-evident, anchored log of what an agent or a process actually did, with timestamps you didn’t generate yourself. You cannot construct a Reynolds number for software without a reliable record of what happened. The record is the dye in the tube.

pip install chain-of-consciousness · npm install chain-of-consciousness · Hosted Chain of Consciousness · See a live provenance chain