Ptolemy's map looks like a catalog of mistakes until you correct for the globe he was using. The same move tells a real bug from a convention you forgot you chose.
Every time you "get oriented," you're invoking a dead convention. To orient once meant, literally, to face the Orient, the east, because that is where medieval European maps put the top of the world, toward the earthly Paradise. North-up, the arrangement you treat as simply correct, is a later habit, not a law of nature. There is no true "up" on a sphere. Which way you point the map is a choice, and for most of history people chose differently than you do, and were not wrong to.
I start there because it's the whole essay in one word. The most common mistake people make about old maps is the same mistake they make about old systems, old code, and old decisions: they grade the artifact against a frame it never used, then call the mismatch an error. The map isn't upside down. You are holding a different map than the one you think you're holding.
Around the second century CE, Claudius Ptolemy compiled the Geography, a list of roughly 8,000 places with latitude and longitude. Lay his coordinates over a modern map and they look like a catalog of mistakes. The Mediterranean is stretched grotesquely long. Asia sprawls far too far to the east. The known world spans a full 180 degrees of longitude when the real figure is closer to 120. The lazy verdict writes itself: the ancients couldn't measure.
The lazy verdict is wrong, and how it's wrong is the point. Ptolemy's errors are not random noise. They are systematic: predictable, directional, almost clean. The distortion grows the farther east you go: a place is off by around 20 degrees in France, 25 to 30 in Greece, 35 to 40 by the Red Sea. Random sloppiness doesn't do that. A consistent frame-offset does.
Here is what actually happened. Ptolemy had reasonable raw data: distances compiled from travel itineraries, road and sea legs, plus a handful of astronomical fixes. The problem wasn't the data. It was that he mapped good distances onto a globe of the wrong size. Inheriting an estimate of the Earth's circumference from Marinus of Tyre, a figure roughly a third too small, well below the famously accurate value Eratosthenes had calculated centuries earlier, Ptolemy was forced to wrap a sensible amount of real-world distance around a planet that was too little to hold it. So the distances spilled outward, stretching the known world across more degrees than it should occupy. Add a different prime meridian (his ran through the Fortunate Isles, today's Canaries, not Greenwich) and an inconsistent stadion, the unit of distance whose length varied between sources, and you get exactly the distortion we see. One historian of cartography put it bluntly: the problem "is not the poor determination of positions in Ptolemy's time; it is Ptolemy's attempt to map the available distances onto a sphere of the wrong size."
Now do the thing that dissolves the illusion: correct for his frame. Adopt his Earth-size, prime meridian, and stade, and the coordinates snap into a coherence that is genuinely impressive for a man working seventeen centuries before satellites. Judged as GPS, Ptolemy is a failure. Judged in his own coordinate system, he is a rigorous, internally consistent achievement. The wrongness lived in the reader's frame, not in the data. That sentence is the load-bearing beam of everything that follows.
If Ptolemy shows a frame error of units, the Hereford Mappa Mundi shows a frame error of genre, and it's the more dangerous one, because it hides better.
The Hereford map, made around 1300 and the largest medieval world map to survive, is by navigational standards a catastrophe. It's a circle with Jerusalem planted at the dead center, east at the top, the three known continents carved out by a T of water, and some 500 drawings scattered across it: biblical scenes, monstrous races, animals real and imagined, the journey of salvation history from Creation to the Last Judgment. Try to sail by it and you will drown. As a route-finder it is worse than useless.
It was never a route-finder. The Hereford map is a visual encyclopedia: a theological and historical summa you read the way you read a cathedral, not a chart you navigate. Score it on navigation and it fails completely; score it on the thing it was actually built to be, a structured map of meaning, and it is extraordinarily rich. Same object, opposite grades, and the only variable is which genre's yardstick you pick up.
And now the fact that turns this from a nice point into an airtight one. The very same medieval culture that produced the "primitive" Hereford map was, at the same time, producing portolan charts: coastal navigation maps of startling accuracy, all rhumb lines and faithful harbor outlines, used by real sailors to make real landfall. When that culture wanted accuracy, it achieved accuracy. The mappa mundi's "failure" was therefore not a failure of skill but a deliberate choice of genre. Calling it primitive is wrong twice over: wrong about what it was for, and wrong about what its makers could do. They had the accurate map. They chose, for this artifact, to make a different kind.
Historians of cartography have a name for the mistake of missing all this. The great History of Cartography project, launched in the late 1970s by J.B. Harley and David Woodward, warns against what amounts to a characteristic anachronism: assessing early maps "by a modern yardstick of accuracy," a habit that doesn't merely mis-score them but quietly excises them from the canon, disqualifies them from being studied as the sophisticated things they are. (Harley's stronger philosophical claims, that accuracy is just one convention among many, are genuinely contested, and you don't need them. The narrow warning, don't grade an artifact by a genre it never claimed, is broadly accepted and sturdy enough to build on.) It's the cartographer's local instance of a general fallacy historians call presentism: judging the past by the standards of the present and mistaking the resulting culture-clash for incompetence.
Lift this out of cartography and it lands, one-to-one, on how engineers evaluate systems. The anachronisms even rhyme:
Reading Ptolemy as GPS is scoring a research prototype by production SLAs: condemning a spike for lacking the monitoring, error handling, and uptime it was never trying to have. Calling the Hereford map a failed navigation chart is calling a proof-of-concept a failed product, a genre error wearing the mask of a quality judgment. Judging an old map by modern accuracy is reviewing a 2015 architecture by 2026 norms: sneering at a service for not being event-driven, or for using a pattern that hadn't been invented when its authors solved their problem with the tools they had. And north-up versus east-up is your paradigm's metrics applied to another paradigm's artifact: the functional programmer recoiling at the object-oriented codebase, each measuring the other against a convention they've forgotten is a convention.
The most expensive version is everyday code archaeology. You open a module nobody has touched in six years, find a decision that looks insane by today's lights, and reach for the word "wrong." Maybe it is. But the seasoned move is the cartographer's move: before you condemn it, reconstruct the frame it was built in. What were its constraints: library versions, latency budget, team size, the one thing in production that would have broken if they'd done it the "right" way? What was its prime meridian, the reference point everyone on that team shared and no one wrote down? An astonishing share of the "bugs" you find in unfamiliar systems are not bugs. They are frame-mismatches: the code answering a question you've stopped asking, in units you've stopped using.
This is Chesterton's fence rebuilt out of latitude lines. The fence across the road looks pointless precisely because you can't see the frame in which someone took the trouble to build it.
Here is where a weaker version of this argument falls off a cliff, so here is a guardrail. The lesson is not "never judge," and it is not "every artifact is excellent on its own terms if you squint." That's relativism, and it's an excuse factory: it turns every unfalsifiable mess into a misunderstood masterpiece. A prototype absolutely should be judged by production criteria the moment it claims to be production. The fallacy is not evaluation itself. It's importing criteria from a genre the artifact never entered.
And the beautiful thing, the reason this is a discipline and not a dodge, is that adopting the artifact's frame is also how you find its real faults. Correct Ptolemy for his Earth-size and prime meridian and most of his "errors" evaporate. The ones that don't evaporate are now genuine errors, correctly located, no longer hidden in the fog of the frame-mismatch. You could not even see them before, because they were drowned out by the systematic offset. Reconstructing the frame isn't the thing you do instead of evaluating. It's the precondition for evaluating accurately: the move that lets you tell a real defect from a convention you forgot you chose.
We have a famous modern monument to skipping that step. In 1999, NASA lost the Mars Climate Orbiter, a $125 million spacecraft, because one team supplied a value in pound-force-seconds while the software expected newton-seconds. Nobody was incompetent. The numbers were right in their own frame. The catastrophe lived entirely in the unexamined gap between two coordinate systems, exactly as Ptolemy's "errors" live in the gap between his stade and ours. Units are a frame. Skip the reconciliation and you don't just misjudge the artifact, you fly it into a planet.
Strip it to something you can run in a review. Before you call a system, a model, a codebase, or a past decision "wrong" or "primitive," ask two questions.
First: what frame was this built in? Reconstruct its units, goals, constraints, and the unspoken prime meridian it shared. What was it actually trying to be, a product, a probe, a teaching example, a stopgap that outlived its season? You cannot grade an answer until you know its question.
Second: does the "error" survive the frame correction? Translate the artifact into its own coordinate system and look again. If the wrongness dissolves, it was never in the artifact: it was in your yardstick, and you've just saved yourself from "fixing" something that was right. If it survives, congratulations: you've found a real fault, now precisely framed and genuinely worth your time.
There's a final cost worth naming, because it's the one that compounds. Anachronism doesn't just grade unfairly, it deletes. Harley's sharp observation was that judging old maps by modern accuracy didn't merely under-rate them; it removed them from the canon, so people stopped studying what those maps actually knew. The engineering version is quieter and more expensive. A team that can only see prototypes through production criteria doesn't just under-rate the prototype, it never builds the thing the prototype was exploring. A culture that reads all legacy code as "tech debt to be deleted" throws away the hard-won context encoded in its strangest decisions. The anachronism trap doesn't only produce wrong scores. It produces lost knowledge, which is worse, because you don't notice it's gone.
Ptolemy wasn't bad at GPS. He wasn't doing GPS. The old code is not a failed draft of your code. It is a map of a territory you may not have finished surveying. So before you grade it, do the unglamorous thing the cartographers learned the hard way: adopt its frame first. The point is not to excuse the artifact. Its own coordinate system is the only vantage from which you can finally tell a genuine error from a convention you forgot you chose.
Before you score it, adopt its frame.
If "don't grade an artifact by a frame it never used" is the discipline, scoring agents needs the same care: a rating means nothing without the frame it was earned in, and a bare number you can average and shop is a Ptolemy coordinate read as GPS. The Agent Rating Protocol makes the frame explicit, reputation as a rank of relative track record earned from what an agent actually did, scale type stated, so you compare maps drawn to the same convention instead of reading one as another.
pip install agent-rating-protocol · npm install agent-rating-protocol
vibeagentmaking.com → · See it in action