What a 1973 archaeologist with one pencil figured out about your tech-debt backlog — and why teams keep trying to solve a graph problem by sorting a list.
One evening in February 1973, in Winchester, England, an archaeologist named Edward Cecil Harris sat down with the field notes of a 1960s excavation he could not make sense of. The site had generated the kind of record that was normal for its time: one-dimensional physical sections, profiles drawn on graph paper — depth of soil on the page, time flowing downward by assumption. Read the drawings carefully and the site still refused to resolve. Which wall was built before which floor? Which pit cut through which midden? He had the drawings. He could not get from the drawings to the story.
By morning he had invented the Harris Matrix.
What he did that evening was not fieldwork, and it was not a better drawing. It was a refusal — the refusal to let the answer live inside the two-dimensional profile at all. He threw away the section and drew, instead, a graph: one node per stratigraphic unit, one edge for every “this sits above that” contact, and only for the immediate contacts. Any wider ordering would emerge on its own. What looked like a drawing problem had always been a graph problem. No one before him had made the move.
That is the kind of move I want for software debt.
By 1979 the method had a book — Principles of Archaeological Stratigraphy — and by the mid-1980s it had become the UK’s recording standard through the Museum of London’s single-context planning method. The machinery is embarrassingly simple. Harris laid out four laws:
Law four is the one that matters. It is the same insight that makes Hasse diagrams work in order theory: if you have ordered pairs A<B and B<C, you do not need to draw A<C. It falls out of the graph for free. An excavation that once looked hopeless — thousands of context units in a city-centre site — becomes tractable because you only record neighbouring relationships, and the full ordering computes itself.
There is a second Harris insight that lands as harder for software readers to hear. The principle at the heart of his recording method is that surfaces, not deposits, are the load-bearing unit — the moment one layer meets another is what tells you the story. Soil persists; you can put it in a bag and bag-number it. An interface is transient. It exists only until the trowel goes through it. If nobody records what it looked like before it was destroyed, that piece of the story is gone.
Hold that thought. It will come back.
Software’s version of this problem is fifty years younger and about forty-four years behind on method.
The phrase technical debt was coined by Ward Cunningham in 1992, in his OOPSLA experience report on the WyCash portfolio system, after reading Lakoff and Johnson’s Metaphors We Live By. The argument was financial: shipping first-time code is like going into debt — a little debt speeds development so long as it is paid back promptly with a rewrite. Interest accrues in the form of compounding friction. Miss enough payments and eventually all your effort goes to servicing the debt and none to building.
Martin Fowler upgraded the frame in 2009 with the Technical Debt Quadrant — a 2×2 of (deliberate vs. inadvertent) × (prudent vs. reckless). It was a lovely diagnostic. It said: this category of debt is the kind a competent team takes on knowingly; that category is the kind you accidentally ship because you did not know any better. Prudent deliberate debt is often wise. Reckless inadvertent debt is how companies die.
What Fowler’s quadrant does not do — what no mainstream debt framework does — is tell you the order in which to pay the debt down. The quadrant describes each item in isolation. Two items of prudent-deliberate debt look identical on the diagram even when one is blocking the other. You still need to know: if I take the afternoon to rewrite the legacy auth middleware, will that unblock the permissions refactor I’ve been avoiding for two release cycles? Does the permissions refactor in turn unblock the multi-tenant work the sales team keeps asking about?
Every engineering team I have ever watched has answered that question by scrolling through a flat list in Jira. A priority score is a number. A number is one-dimensional. Dependencies between debt items are a graph. Teams keep trying to solve a graph problem by sorting a list.
The tooling that claims to help mostly does not. Debtmap, an open-source analyzer that has been gaining attention since 2024, calls itself a “tiered prioritization” tool and surfaces architectural issues above testing gaps — a real improvement over ranked severity, but still a ranking. CodeScene does behavioural code analysis, weighting hotspots by developer activity from git history. NDepend draws handsome dependency graphs of code and stops short of linking those graphs to the debt list itself. None of them render debt as what it actually is: a directed acyclic graph where an edge from A to B means “you have to deal with A before B becomes tractable.”
The gap is the shape of the data structure, and no amount of ranking fixes it.
Here is what the correspondence looks like when you put archaeology and software side by side rather than inside each other:
| Archaeology | Software |
|---|---|
| A stratigraphic unit (a layer, a cut, a fill) | A piece of technical debt |
| “This layer sits on top of that one” | “This piece of debt sits on a cruftier piece of debt underneath it” |
| A cut — a later feature that sliced through older material | A refactor that modernised part of a system and left the rest stranded |
| Correlation of two fragments that were once one deposit | Two modules that were once one file, split during a rushed migration |
| A surface (transient, must be recorded in the moment) | The decision moment — why this debt was taken on |
| Pre-1973 section drawings | The flat Jira backlog ranked by priority score |
| The Harris Matrix DAG | A tech-debt DAG where edges mean “fix A before B” |
| Law of Stratigraphic Succession | Only immediate dependencies matter; transitive ones compute themselves |
Each row does specific work. Read down the column and an engineering team has, for free, the vocabulary they have been reaching for.
Take a shape of the kind most teams have. Imagine the team still owns a handwritten auth middleware written in a hurry when the company had six employees. Above it, grafted on over four years, is a permissions system that depends on quirks of the middleware (“users are always in exactly one org, because that’s how the old middleware parsed the JWT”). Above that sits the multi-tenant feature sales keeps asking about — which cannot ship because permissions are single-tenant-shaped, which in turn are the shape they are because of the auth middleware below. Three debt items. Ranked by business value, multi-tenant is on top. Ranked by Fowler’s quadrant, all three might be “prudent deliberate” and tied. Drawn as a Harris Matrix, the ordering is unambiguous: the auth middleware is the lowest stratum, and nothing above it is fully tractable until it is handled.
Starting at the top layer — the “highest-value” one by priority score — is the archaeological equivalent of trenching downward through three centuries of wall to get to a coin you can see glinting through a crack. You will find the coin. You will also destroy the record of everything above it.
I should say, because it would be dishonest not to: the observation that software stratifies like an archaeological site is not original to this essay. In 2018, Andrew Reinhard of the Centre for Digital Heritage at the University of York published “Adapting the Harris Matrix for Software Stratigraphy” in Advances in Archaeological Practice (6:2, 157–172). He used No Man’s Sky — the 2016 Hello Games release that patched aggressively post-launch — as his test case and argued, persuasively, that software obeys all four of Harris’s laws. If you’re already thinking “this analogy has been drawn,” you are right, and Reinhard drew it eight years ago.
What Reinhard did is backwards-looking. His frame is archaeology of the software artefact: given a released build, reconstruct the version history the way you’d reconstruct a buried settlement. He was documenting code that had already shipped — frozen strata.
The territory that is left — the territory this essay is actually staking — is forward-looking. Not: reconstruct what was done. But: decide what to do next. Reinhard’s move is to treat No Man’s Sky as a site. The move I’m proposing is to treat your current codebase, this week as a live dig where you are the one with the trowel, and the question isn’t “what happened here?” but “what do I cut through next without destroying the context for the cut after that?”
There is also a nice return trade worth naming. Git is, for any team that uses it, already a near-perfect stratigraphic record — every commit is a dated, signed cut, with the surface (the diff, the message, the PR description) captured at the instant of deposition. Archaeologists would kill for this data on their sites. The matrix view over git history is almost free to compute; what’s missing for software isn’t recording discipline, it’s the habit of asking graph questions of the record that already exists. Software handed archaeology a lesson in how to record perfectly. It hasn’t yet used its own record.
I want to be careful not to do the thing where a clever mapping is asserted and never pressure-tested. Three ways this one breaks, in order of severity.
It breaks worst on who did the depositing. Archaeological strata are deposited by unrelated actors over centuries, with no shared institutional memory. Software debt is deposited by the same team, often the same engineer, usually within living memory. That cuts both ways. You have access to witnesses — Slack threads, PR descriptions, the person who wrote the auth middleware still answers their DMs — where an archaeologist does not. Which means the “surfaces are transient” insight has even more force in software: the interface between versions can be recorded, cheaply, at the moment it is created, and a team that does so has information an archaeologist would dream of. Teams that don’t are voluntarily throwing away data that would cost nothing to preserve.
It breaks in its middle on reversibility. Harris’s matrix is strictly monotonic — once a layer is disturbed by a later cut, the original continuity is gone. Software is not so strict. You can, in principle, restore a lost abstraction by extracting it back out of the call sites. In practice, not often — the cost grows with every commit that depends on the lost shape — but often enough that the monotonicity claim is rhetorical rather than literal. The matrix is a good model for the debt graph as it usually is, not a law of nature.
It breaks weakest, but worth naming, on granularity. Archaeological units have natural boundaries — you stop excavating when the soil changes. Debt items don’t. One engineer’s “the auth middleware” is three items to another and one to a third. The matrix is only as good as the unit definitions you bring to it, and bad unit definitions produce a matrix that looks rigorous and isn’t. Archaeologists spent decades arguing about context definitions before the method stabilised. Software teams will probably have to do the same.
None of these breaks kill the analogy. They sharpen where to apply it.
Pull your debt list. Ignore the priority score for now. For each item, ask one question: what other item on this list, if I paid it down, would make this one materially easier to handle? Draw an arrow. You are looking for immediate blockers only; transitive ones you do not need to think about, because by Harris’s fourth law they compute themselves.
Half the list will have no edges in either direction — these are independent. Sort those by whatever priority score you like. A smaller group will form chains, and a smaller group still will form genuine forks. The chains tell you where the sequencing is pre-determined; the forks tell you where you actually have a choice; the independent items tell you what you can hand to whoever has a spare afternoon and a half-working build.
Then, and only then, ask the usefulness question. Not “what is the highest-priority debt” — that is a priority-score question, and a priority score is one-dimensional where the actual landscape is a graph. Ask instead: “of the items with nothing beneath them — the bottom stratum, the load-bearing layer — which would most unstick the things piled on top?” That is the question the Harris Matrix was invented to answer, and it answers cleanly.
You will probably find, as archaeologists did in the 1970s, that most of what you thought was pressing is sitting on top of one or two items nobody had named as debt at all. The foundation is almost always older, lower, and more boring than the feature work above it. The matrix does not make that fact politically easier inside your organisation. It makes it impossible to keep pretending it isn’t true.
The thing to notice about that evening in Winchester is how little equipment was involved. One archaeologist. One pencil. One evening. No new tool, no new theory — just a refusal to flatten time into a section drawing, and a graph drawn in its place.
Software has been managing debt in a flat list for the thirty-four years since Cunningham named it. In that time we have built dependency graphs for everything else: package managers, build systems, module imports, type hierarchies, data lineage, CI/CD pipelines. We know how to draw DAGs. We just haven’t drawn this one.
There is no reason the Winchester moment for technical debt requires a tool, a vendor, a framework, or anyone’s permission. It requires a team willing to spend an afternoon asking, for each piece of debt on their list, what is underneath it. That is a small ask for a useful answer.
The matrix has been waiting. It is not a novel idea. It is just, like any surface Harris ever recorded, there only as long as somebody bothers to draw it.
A debt list isn’t a list. It’s a graph somebody hasn’t drawn yet.
The Harris Matrix move — record only immediate dependencies, let the rest compute itself — is the same move Agent Rating Protocol makes for trust. Every signed agent record names only the agents it directly depends on; the wider trust DAG falls out for free, the same way Harris’s fourth law makes the full stratigraphic ordering fall out of pairwise contacts. You can verify any agent’s upstream stratigraphy without anybody flattening it into a leaderboard score.
Verify an agent’s upstream stratigraphy · See a signed dependency record · pip install agent-rating-protocol