← Back to blog

“From an Old European Collection”: AI Training Data Is at Numismatics’ 1970

Two technically-true sentences perform the same evasion. Numismatics spent fifty years learning to flag the first one. The AI industry is currently writing the second on every model card.

April 2026 · 11 min read

A particular kind of sentence still appears in coin auction catalogs: variations on “from an old European collection,” often paired with “acquired before 1970.” It looks like provenance — names a continent, gestures at a decade, hints at a story. Anyone who buys ancient coins for a living knows what the sentence actually means. It is the field’s recognized signal that no provenance exists. The dealer either does not know, cannot say, or would rather not say. The 1970 date sits squarely before a regulatory tripwire; the “old European” gesture flatters the buyer with the implication of pre-war refinement. The phrase is technically true. It is also the trade’s polite way of saying nothing at all.

Numismatics has spent half a century learning to distrust this sentence. The provenance taxonomy in active use among specialists today separates hearsay provenance (vague attributions backed only by “the good faith of the dealer”), named provenance (linked to a specific collection or person), and documented provenance (linked to specific auction records, invoices, or export licenses). The Cultural Property News overview of the field flatly classifies “from an old Swiss collection” as hearsay — i.e., not provenance at all (Cultural Property News, “Rediscovering Old Provenances for Ancient Coins,” 2024).

Now read the standard AI training disclosure: “trained on publicly available text.” It is technically true. It is also, structurally, “from an old European collection.”

What 1970 actually was

The 1970 UNESCO Convention on the Means of Prohibiting and Preventing the Illicit Import, Export, and Transfer of Ownership of Cultural Property was adopted on November 14, 1970. Almost every major museum in the Western world now refuses material that left its country of origin after that date without full documentation (J. Paul Getty Museum acquisition policy; Smarthistory).

The date is widely misunderstood. 1970 was not chosen because looting started in 1970. The looting was already a generational catastrophe. It was chosen because the prior fifty years had so degraded the trade that mandatory documentation was the only credible discipline that remained. A field that had spent decades not asking the question reached the point where it could not afford to keep not asking. The convention drew a line not because the world had become cleaner but because the pretense had become unsustainable.

What followed is the part the AI industry should be reading carefully. Major market countries did not rush to ratify. The United States waited thirteen years and implemented the Convention through the Cultural Property Implementation Act of 1983, which “took a much more lenient approach” than the Convention itself (Bowdoin College, Antiquities Law Primer). The United Kingdom waited until 2002. Switzerland until 2003. France until 1997. During that entire window — call it the decade of denial — the trade went on essentially unregulated, while everyone in the room agreed in principle that something should be done. Voluntary dealer codes followed in the 1990s. The first real consequences came in the 2000s, when Italy started prosecuting and the Getty had to return the Aphrodite of Morgantina in 2011, the Met had to return the Euphronios Krater in 2008, and the prices for documented material started to detach from the prices for undocumented material in measurable ways.

The field did not self-regulate through transparency. It hardened only when courts began to make absence of provenance expensive.


The structural parallel is exact

The two phrases — “from an old European collection” and “trained on publicly available text” — perform the same function. Both are technically true. Both are the recognized industry signal that no real provenance exists. Both allow the speaker to avoid lying while concealing the information that would actually matter (which specific objects? obtained how? with what permissions?). Both are intelligible to insiders as evasion and to outsiders as substance.

The numbers are not gentle.

The MIT Data Provenance Initiative audited 1,800+ widely-used text datasets in 2024 and found license miscategorization errors exceeded 50%, while license-information omission rates exceeded 70% (MIT Sloan, “Bringing Transparency to the Data Used to Train Artificial Intelligence,” 2024). The LAION-5B dataset — one of the most widely used for text-to-image models — was found to contain “thousands of Child Sexual Abuse Material images” before the dataset was withdrawn from HuggingFace (Longpre et al., arXiv 2404.12691, 2024). That is the AI equivalent of a museum opening its prized acquisition crate to discover the object was looted from an active conflict zone the prior month.

The most damning structural detail is the deliberate ignorance. As Oxford Academic’s Journal of Intellectual Property Law & Practice documented in 2025, “many companies involved in the development of AI do not even keep internal records of their training data because of fears that this could be used as evidence of copyright infringement.” Numismatic dealers spent decades doing the same thing for the same reason. Documentation that would have made the supply chain illegal would also have made it provable, so documentation did not happen. Ignorance is not an accident in either field. It is a strategy.


The four-model question is the interesting one

Here is the part the existing AI policy literature mostly misses. The numismatic field did not arrive at one regulatory model after 1970. It arrived at four, each with about fifty years of failure-mode evidence.

ModelAntiquities exemplarAI equivalentFailure mode
Regulated reportingUK Portable Antiquities Scheme (1997)Voluntary model cards, data cards, C2PADepends on goodwill; doesn’t prevent unreported export
Authorization-requiredFrance, Article L. 542-1 (€1,500 / €7,500)EU AI Act, Article 53 (€15M fines)Structure exists; enforcement currently unstaffed
ProhibitionItaly / Greece state ownershipStrict copyright with no training exceptionDrives market underground; smuggling incentive
Market-tolerant private propertyUS CPIA (1983)US fair-use framework post-Bartz/KadreyBecomes the destination market for unprovenanced material

The Portable Antiquities Scheme, run by the British Museum since 1997, allows private metal-detector finds while incentivizing voluntary recording through compensation and partnership with archaeologists. Reporting rates went up dramatically. The failure mode is that the scheme depends on goodwill and does not prevent export of unreported finds; opponents argue that legitimizing collection drives looting in source countries that lack equivalent infrastructure. The AI equivalent is voluntary model cards, data cards, and the C2PA provenance ecosystem — the “we’ll self-report what we used” approach.

The French authorization regime requires prefectural authorization for metal detecting under Article L. 542-1 of the Heritage Code; significant finds must be reported to the Ministry of Culture; penalties run to €1,500 for unauthorized detection and €7,500 if digging occurs. The failure mode is enforcement: per Connexion France’s 2024 reporting, finds are “generally not declared” in practice. The AI equivalent is the EU AI Act regime. Article 53 (effective August 2, 2025) requires general-purpose AI providers to publish a “sufficiently detailed summary about the content used for training” using a mandatory European Commission template, with fines up to €15 million. The structure exists. Whether the structure bites depends on enforcement, which is currently unstaffed.

Italy and Greece represent the prohibition model: the state automatically owns archaeological finds; finders may receive a percentage as reward but have no property right; export of significant material requires prior ministerial authorization. The success has been the strong legal framework for repatriation that drove the Getty and Met returns. The failure mode is that strict ownership drives the market underground and creates perverse incentives to smuggle rather than declare; archaeological tourism collapses where it would otherwise have flourished. The AI equivalent is strict copyright enforcement with no training exception — a possible outcome of aggressive litigation in jurisdictions that lack a fair-use doctrine.

The United States implemented the UNESCO Convention through the most permissive available reading. Private ownership of antiquities remains broadly legal; market self-regulation through dealer associations is the primary discipline. The failure mode is precise and well-documented: the United States became the destination market for unprovenanced material because its standards were the weakest in the developed world. The AI equivalent is the US fair-use framework, which Bartz v. Anthropic (June 23, 2025) and Kadrey v. Meta (June 25, 2025) suggest is hardening but which remains structurally more permissive than the EU regime.

The fifty-year evidence base says: the strict-enforcement jurisdictions set the floor, the market-tolerant jurisdictions converge upward over decades, and the convergence is driven by court rulings rather than voluntary codes. Beltrametti and Marrone’s 2016 study in the Journal of Law and Economics (Vol. 59, No. 4) used disaggregated auction data over twenty years to show that following punitive court rulings, the share of provenanced items at auction increased and the price premium paid for provenanced items also increased. Self-regulation through transparency alone did not move prices. Self-regulation through transparency plus court enforcement did.


Where AI is right now in the same arc

The Bartz v. Anthropic ruling of June 23, 2025 is the AI industry’s Aphrodite-of-Morgantina moment. Judge Alsup found Claude’s training “spectacularly transformative” and a fair use, but established a separate “clean hands” requirement: training material must be lawfully acquired, and the use of pirated works from “shadow libraries” constitutes a separate, non-fair-use violation. This is the move the antiquities field made when courts started distinguishing “object is in the museum’s collection” from “the museum’s acquisition was lawful.” Two days later, Kadrey v. Meta (June 25, 2025) granted summary judgment for Meta on fair use but introduced an “indirect substitution” theory that opens the door to future liability for crowding out the works the model trained on. In November 2025, the UK High Court in Getty Images v. Stability AI ruled that model weights are intangible “articles” under copyright law, expanding the surface area of liability. Thomson Reuters v. ROSS Intelligence (February 2025) had already established that the “training is transformative” defense fails when the AI directly substitutes for the rightsholder’s specific niche.

The legislative side is tracking the same timeline. The EU AI Act’s training-data summary obligation went live in August 2025 with the mandatory European Commission template. California AB 2013 takes effect January 1, 2026, requiring high-level summaries of training data sources, IP status, and personal information usage. The UK is required to publish its report on copyright and AI by March 18, 2026, with potential proposals for new “technical measures and standards.”

Plot it on the numismatic calendar and the alignment is unsettling. The EU’s mandatory template is the 1970 Convention itself: the binding instrument that establishes the documentation cut-off. California AB 2013 is the United States CPIA: arriving thirteen years later, written more permissively than the model regime. The UK’s March 2026 report begins what was, in numismatics, a thirty-year journey to ratification. Bartz’s “clean hands” requirement is the Getty trial — the moment courts start distinguishing the legal acquisition from the resulting work product. The current moment is approximately 1983, the CPIA implementation year. Full convergence in numismatics took roughly thirty more years. AI does not get thirty years, but it will probably get the same phases in compressed form: an enforcement gap that closes when courts make absence of provenance expensive, then the late hardening that follows.

The 2023 UK Treasure Act expansion — which added a new “significance-based” classification for objects more than 200 years old, the first such class in the Act’s history — is the late-hardening signal. Even the most market-friendly regulated system tightened. The AI equivalent is probably already drafting in Brussels.


Where the analogy breaks

Three honest disanalogies, because cross-domain claims that pretend to be exact deserve the audits they get.

Antiquities are physical, training data is informational. A Roman coin in Geneva exists in one place at one time; a copy of a New York Times archive exists in every model that ingested it. The market-clearing logic differs: antiquities provenance disputes are about who gets to keep the object, AI training disputes are about who gets paid for the use. This changes the remedy — repatriation versus licensing — but not the paradigm. Both fields converge on the same documentation requirement because both fields face the same problem of an opaque supply chain that the market will not voluntarily clean up.

The numismatic find-spot is irreproducible. The AI training source often is. If a coin has no excavation record, the archaeological context is gone forever. If a model has no manifest, statistical methods can frequently reconstruct enough to identify training-set membership; a substantial portion of the Times v. OpenAI litigation hinges on exactly that capability. The AI domain has a verification path the antiquities domain does not. This makes the AI documentation problem easier to solve in principle — and harder to evade by accident.

The case for governance is real, not strawman. The strongest version of the AI training-data argument is not “transparency obstructs innovation,” but “premature transparency at the wrong granularity will produce disclosure-as-theater while the underlying data practices do not change.” The FDA precedent has documented this exact failure mode: between 1998 and 2008, the FDA secured only two court injunctions for misleading food labels despite finding 48% of products misrepresented vitamin content in 1994 (cited in arXiv 2601.18127, “The Limits of AI Data Transparency Policy: Three Disclosure Fallacies,” 2026). The numismatic equivalent is the thirty-year run of voluntary dealer codes that did not change behavior until courts started prosecuting. Disclosure without enforcement is the failure mode the EU AI Act will inherit if its enforcement infrastructure does not catch up to the €15 million fine schedule on paper. The right response is not to abandon disclosure; it is to expect that disclosure alone will not work, and to build the enforcement layer in advance rather than after a decade of evasion.


What to read into the next two years

If you build, evaluate, or invest in AI systems, the numismatic record gives you a usable checklist for separating compliance theater from compliance.

Ask whether the training-data summary names categories or names content. The EU template’s “top 10% of domain names used” can be satisfied by listing reddit.com, wikipedia.org, and github.com — which is about as informative as “old European collection.” Watch which providers go beyond the template. The ones that do are pricing in the next phase of regulation; the ones that meet only the floor are pricing in regulatory drift. Beltrametti and Marrone’s work showed that this gap is where the price premium accumulates.

Ask whether internal records exist. The Oxford JIPLP finding — that companies deliberately do not keep training-data records because the records would be evidentiary — has a tell. Providers that publish dataset hashes, content-membership APIs, or licensing audit logs are building the equivalent of a documented chain of custody. Providers that publish narrative descriptions only are building the equivalent of “from an old European collection.”

Ask which jurisdiction is being designed for. Court enforcement, not voluntary disclosure, was what changed the antiquities market. Bartz, Kadrey, Getty, Thomson Reuters and the upcoming UK report are the inflection. A model whose training pipeline is structured to pass a Bartz “lawfully acquired” audit is a different artifact from a model whose training pipeline was structured to pass a 2023 voluntary code, even when the externally visible documentation looks similar.

And ask the inverse-revealing question: would your organization be comfortable if its training pipeline were described in the same neutral language an auction house uses on a coin? “Trained on a corpus assembled from publicly available text, web crawls, and licensed materials, acquired prior to 2024” — read it again. It is the sentence the antiquities market spent fifty years learning to flag. It is the sentence the AI industry is currently writing on every model card.

The interesting question is not whether AI training data has to live through the post-1970 arc. It is which of the four regulatory models the convergence settles on, and how much court time it takes to get there. The numismatic literature has already mapped each model’s failure modes for a generation. The AI industry could, in principle, arrive at the destination without re-running the experiments.

What it will probably do instead is exactly what the antiquities trade did: refuse to believe the convergence is happening until a courtroom makes it expensive, then converge in a hurry, then act as if everyone always agreed.


Sources: 1970 UNESCO Convention on the Means of Prohibiting and Preventing the Illicit Import, Export, and Transfer of Ownership of Cultural Property (adopted November 14, 1970). Cultural Property Implementation Act of 1983 (US). Bowdoin College, Antiquities Law Primer. Cultural Property News, “Rediscovering Old Provenances for Ancient Coins,” 2024. UK Portable Antiquities Scheme (British Museum, 1997–). UK Treasure Act 2023 expansion. France, Heritage Code Article L. 542-1; Connexion France, 2024. MIT Data Provenance Initiative; MIT Sloan, “Bringing Transparency to the Data Used to Train Artificial Intelligence,” 2024. Longpre et al., arXiv 2404.12691, 2024. Oxford Academic, Journal of Intellectual Property Law & Practice, 2025. Beltrametti & Marrone, Journal of Law and Economics, Vol. 59, No. 4, 2016. Bartz v. Anthropic (N.D. Cal., June 23, 2025). Kadrey v. Meta (N.D. Cal., June 25, 2025). Getty Images v. Stability AI (UK High Court, November 2025). Thomson Reuters v. ROSS Intelligence (February 2025). EU AI Act, Article 53 (effective August 2, 2025). California AB 2013 (effective January 1, 2026). UK report on copyright and AI, deadline March 18, 2026. arXiv 2601.18127, “The Limits of AI Data Transparency Policy: Three Disclosure Fallacies,” 2026. J. Paul Getty Museum acquisition policy; Smarthistory. Aphrodite of Morgantina return (Getty, 2011); Euphronios Krater return (Met, 2008).

Provenance is the documentation cut-off. Here is the chain.

If “trained on publicly available text” is the AI equivalent of “from an old European collection,” the corrective is the same one numismatics arrived at: a hash-linked, append-only chain of custody that travels with the artifact and can be verified by anyone holding the spec. Chain of Consciousness publishes that chain — every action a signed entry, every claim resolvable to its evidence, every step verifiable without trusting the producer. It is the “documented provenance” tier the numismatic taxonomy has used for fifty years, applied to agent-generated content.

Install: pip install chain-of-consciousness or npm install chain-of-consciousness

Hosted Chain of Consciousness · Verify a provenance chain · Follow a claim through its evidence