Die-Link Networks: Fingerprinting Shared Origin From the Tool That Stamped It

A piece of Olympic malware was caught faking its own fingerprint by a method two hundred years old, from people who study ancient coins. The same move secures your build pipeline.

Published June 2026 · 9 min read

In February 2018, as the opening ceremony of the Winter Olympics lit up the sky over Pyeongchang, a piece of malware was quietly tearing through the Games' IT backend, taking down Wi-Fi, the official app, the press center, the ticketing site. It was later named Olympic Destroyer, and when investigators pulled the binary apart, they found a fingerprint buried in it that pointed squarely at a known culprit: the Lazarus Group, North Korea's state hacking unit. The fingerprint sat in one of the most obscure corners of a Windows executable, a place almost nobody looks. It looked like the attackers had slipped up.

They hadn't. An analyst at Kaspersky noticed the fingerprint was lying.

The marker in question was the PE Rich Header, an undocumented little structure that Microsoft's Visual Studio silently stamps into every binary it builds, recording the product IDs, build numbers, and use-counts of each compiler and linker component in the toolchain. The Olympic Destroyer header had been copied wholesale from a real Lazarus sample to frame North Korea. But the forgery didn't fit. The planted header claimed the code had been built with Visual Studio 6.0, a toolchain from 1998, while the actual binary had been compiled with something far more modern, around Visual Studio 2010. A coin cannot be struck by a die that didn't exist yet. The fake fingerprint betrayed itself, and not by any match to a database: it was internally impossible, claiming an origin the universe of real tools could not have produced.

Here is the thing worth knowing: the method that caught it is roughly two hundred years old, and it comes from people who study ancient coins.

What a coin remembers about the tool that made it

An ancient or medieval coin was struck by hand, between two engraved dies: a lower obverse die set in an anvil, and an upper reverse die driven down by a hammer. Each die was cut by hand, so each carried its own idiosyncrasies: a slightly crooked letter, a quirk in the portrait, a tiny flaw in the field. And each die had a finite working life. Bronze and iron crack and wear under thousands of hammer-blows; the flaws progress, so a die in its later life carries cracks a fresh die doesn't. Numismatists exploit all of this. Because every die leaves its own marks on every coin it strikes, you can look at a surviving specimen and decide, often with real confidence, which specific die struck it.

Do that across a whole hoard of coins and something powerful emerges. You record which obverse dies were paired with which reverse dies, knowing that dies got swapped in and out as they failed mid-production. The pairings form a die-link network: a graph where coins cluster by shared dies, and dies connect through the coins they jointly produced. That graph hands you three things you could not get from any single coin.

The first is production history. Because a worn obverse die might be paired first with reverse die A, then B as A cracks, then C, the chain of pairings recovers the sequence of striking: which die came online when, in what order the mint actually worked.

The second is scale. You can estimate the original number of dies, including the ones you've never found, from how often your sample repeats a die. This is the work of Warren Esty, whose formulas (2006 and 2011) estimate total die counts from the ratio of dies seen only once to dies seen many times. It's the same statistical instinct as estimating how many species live in a forest from how many you've spotted exactly once, a capture-recapture move, a cousin of the Good-Turing frequency estimates that came out of Bletchley Park. The flagship of this work, the Roman Republican Die Project at the American Numismatic Society, was built on Richard Schaefer's archive of more than 300,000 specimen images (images, note, far fewer distinct dies than that), linked to Michael Crawford's 1974 catalog Roman Republican Coinage and published with die-link graphs and downloadable pairing data. And the estimates can be ground-truthed: for some issues, the mint stamped a sequential control number on each die, so the highest number observed gives the true die count, a way to check Esty's estimate against reality. (It's an estimate with assumptions rather than a guarantee on every issue, though where you can test it, it holds up.)

The third thing the network gives you is the one that caught Olympic Destroyer: forgery detection. A genuine coin must come from a documented die. A specimen that fits no known die, or whose "die" betrays the tell-tale softness of a cast rather than a strike, sits outside the die universe, and that, not a stylistic hunch, is the rigorous basis for calling it fake. Modern computer-vision pipelines now do the die-matching automatically, collapsing studies that took years into weeks, and they target exactly this: provenance and forgery.

The same move, in software

Now look back at the binary with a numismatist's eye, and the bridge is almost embarrassingly direct.

A compiled artifact inherits the marks of the tool that stamped it, whether its makers wanted it to or not. The compiler family, version, and optimization level leave a recoverable signature, what researchers call compiler provenance, and there are whole systems (BinComp and others) built to read it. Build paths get embedded and quietly leak usernames and directory trees. Timestamps leak build order. And the Rich Header, the very thing at the center of the Olympic Destroyer story, is the richest of all, a near-perfect die mark of the build environment, precisely because Microsoft never documented it as an identifier. Nobody designed it to be a fingerprint. That's exactly why it's such a good one: the tool can't help signing its work. A die crack is not something the engraver added on purpose. It is an involuntary mark that nonetheless identifies origin. So is the Rich Header.

Cluster artifacts by these signatures and you've built a die-link network for software. Binary-fingerprinting and malware-family clustering do exactly this: group samples by shared toolchain and code artifacts to recover families, lineage, and likely shared origin. Two binaries carrying the same involuntary build-signature probably came from the same place, the same way two coins sharing an obverse die came from the same mint table.

And the defensive endgame is the coin-forgery move, formalized. The SLSA framework (Supply-chain Levels for Software Artifacts, grown out of Google's internal "Binary Authorization for Borg" and contributed to the OpenSSF in 2021) defines signed build provenance: the builder's identity, the build instructions, the parameters, the environment, the digests of every dependency, all cryptographically attested. That is nothing less than a documented die universe. Once you have it, the question for any artifact claiming to be yours becomes the numismatist's question: does this fit a die we have on record? Anything that can't be tied to your attested builds is the software equivalent of a coin from no known die: an injection, a forgery, a thing that should not exist.

Two ideas worth stealing

The first transferable idea is the network: cluster things by the signature of the tool that stamped them, and you recover a production history and expose anything outside it. This generalizes far past malware. Template metadata fingerprints the document generator; sensor noise fingerprints the camera; toolmarks fingerprint the machine. Wherever a manufacturing tool touches its output, it leaves a die mark, and shared marks mean shared origin.

The second is subtler and more durable, and it's the one I'd tattoo on the wall of any security team: forgery is caught by inconsistency, not by a lookup. Olympic Destroyer's fake header wasn't unmasked because someone recognized it. It was unmasked because it claimed a toolchain that could not have built the code it was attached to. The robust question is never the seductive one, who made this?, because that question is hard, and worse, the answer is forgeable. The whole point of the false flag was to make "who made this?" return the wrong name. The robust question is does this fit the universe of things that could legitimately exist? A numismatist doesn't need to know which forger cast a fake denarius to know it's fake; they need to know it fits no real die. You don't need to attribute an artifact to reject it; you need a documented universe and a consistency check.

That distinction has teeth, because it tells you where to spend your effort. Attribution is the forgeable, contested, often-unwinnable game: the Olympic Destroyer header was built to win it for the wrong side, and confident attribution of that attack stayed contested long after the forgery was exposed. Consistency against your own documented universe is the winnable one. You control your attested die set. You can say, with cryptographic confidence, "this binary did not come from my pipeline" without ever solving "and therefore the GRU did it." Defense beats attribution because defense only requires you to know your own dies.

There's a beautiful corner case the coins teach, too. Two genuine coins struck from the same die are never identical: each blank, each hammer-blow, leaves microscopic differences. So when two specimens are impossibly identical, that sameness itself betrays a cast or transfer forgery: real production has variance; copying does not. The software mirror is exact. Two "independently built" binaries that are byte-identical in the places that should vary (timestamps, randomized layout, nonces, build IDs) haven't been independently built. They've been copied or replayed. (The same fact, flipped, is why reproducible builds are such a powerful integrity tool: there, you deliberately strip the involuntary variance so that legitimate independent builds can be byte-identical, and any difference becomes the signal.)

I'll keep the population-estimation thread short, because it deserves its own essay: the singleton-to-repeat statistic that estimates unseen dies is a capture-recapture you can run on any sampled population of tool-stamped artifacts: how many build sources, templates, or campaigns are really out there, including the ones we haven't sampled yet? The same math applies whether you're counting Roman dies, forest species, or distinct droppers in a malware campaign. Worth knowing it's there; not the point today.

The practical version

If you ship software, the lesson resolves into something you can act on this quarter, and it has two halves.

Build your documented die universe. Adopt provenance attestation, SLSA-style signed build metadata, so that legitimacy becomes a membership test rather than a vibe. The goal isn't to catch attackers in the act; it's to make everything outside your attested builds self-evidently suspect, the way a coin from no recorded die is self-evidently suspect. You don't have to be clever about the adversary if your universe is documented enough that they can't fit inside it.

And read the marks you're already leaving. Your binaries, documents, and build logs are stamped with involuntary fingerprints right now: Rich Headers, build paths, timestamps, compiler quirks. Defenders can read them to cluster and trace; attackers have to scrub or forge them to hide, and forging is exactly where they slip, because a planted fingerprint has to be consistent with everything else, and consistency across an entire fabricated provenance is brutally hard to fake. That inconsistency is your friend. It's the crack in the die that the forger forgot to copy.

The deepest version of the idea is older than computers and will outlast them: provenance is written in the marks the stamping tool can't help leaving. Group by those marks and a production history becomes legible; check new objects against the universe of real tools and every forgery outside it lights up. The Romans didn't know they were leaving a forensic trail in the cracks of their coin dies. Microsoft didn't intend the Rich Header as an identifier. It almost never matters what the toolmaker intended. The tool signs its work anyway, and if you know how to read the signature, the fakes have nowhere to hide.

Sources

Roman Republican Die Project (RRDP), American Numismatic Society (numismatics.org)
numishare, "Esty's die calculations and frequency visualizations added to CRRO" (numishare.blogspot.com)
"The Roman Republican Die Project (RRDP): Methods and Preliminary Findings" (PDF)
"Automatic Die Studies for Ancient Numismatics" (arXiv preprint)
"From Coin to Data: Object Detection in Digital Numismatics" (arXiv preprint)
Securelist (Kaspersky), "The devil's in the Rich header" (securelist.com)
Virus Bulletin, "Rich Headers: leveraging this mysterious artifact of the PE format" (PDF)
Darknet Diaries, "Olympic Destroyer" (episode transcript)
SLSA, Provenance specification (slsa.dev)
SLSA overview, ActiveState (activestate.com)
"BinComp: A stratified approach to compiler provenance attribution" (ScienceDirect)
"A Survey of Binary Code Fingerprinting Approaches" (ACM Computing Surveys)

The tool signs its work. Make yours sign in a way you can check.

If "provenance is written in the marks the stamping tool can't help leaving" is the right instinct, the next step is making that signature deliberate and verifiable instead of accidental. Chain of Consciousness gives an agent's output a documented die universe: a signed, append-only record of what produced each step, so legitimacy becomes a membership test and anything outside it is self-evidently suspect, the way a coin from no recorded die is.

pip install chain-of-consciousness · npm install chain-of-consciousness
Hosted CoC → · vibeagentmaking.com · See it in action

← Back to all posts