The Bilderatlas Mnemosyne Beats Your Vector Store

Hamburg, autumn 1929. Aby Warburg is in the last weeks of his life. He is working in the Kulturwissenschaftliche Bibliothek Warburg — the personal library he funded by trading his banking inheritance to his brother Max in exchange for “every book he ever wanted.” The room contains 63 wooden panels, each covered in black hessian cloth. Pinned to those panels: roughly 971 images — photographic reproductions of artworks, manuscript pages, newspaper clippings, postage stamps, advertisements, maps, and diagrams — ranging from Babylonian liver models to Weimar-era press photography.

The clusters are not chronological. They are not taxonomic. They are not topical. The arrangement is the thesis.

Warburg dies on October 26 of that year. He had planned somewhere between 79 and 200 panels; only 63 exist when he stops. The original wood-and-cloth objects do not survive — only black-and-white archival photographs, 18 × 24 cm each, in the Warburg Institute’s London holdings. Roberto Ohrt and Axel Heil reconstructed the Atlas from those photographs and the original source images for the 2020 exhibition Aby Warburg: Bilderatlas Mnemosyne — The Original at Berlin’s Haus der Kulturen der Welt; it travelled to the Warburg Institute in London the following year. Cornell University Library hosts a browsable digital edition with enhanced scans of ten panels, and the Warburg Institute publishes all 63 online.

Every time you write collection.query(query_embedding, n_results=5), you are doing the thing Warburg spent the last five years of his life trying to undo.

A Denkinstrument, not a database

Warburg called the Atlas a Denkinstrument — a thinking tool — and the distinction matters. He was not building a storage mechanism. He was building an interface that produced thought when its arrangement was contemplated.

Three terms carry the architecture. First, Pathosformeln (“pathos formulas”) — recurring expressive gestures that persist across centuries and media. The arched-back maenad, the dying figure, the windblown nymph in motion: each is a gesture that recurs as a kinematic signature, independent of what the gesture happens to mean in any given panel. Second, Nachleben (“afterlife”) — the survival, mutation, and unexpected return of ancient motifs in later periods. A 2nd-century Roman sarcophagus relief reappears in a 15th-century Florentine fresco; a Hellenistic coin type resurfaces in a medieval manuscript illumination. The connection is neither thematic nor stylistic. It is morphological recurrence across temporal gaps that no taxonomy would have predicted. Third, Gute Nachbarschaft (“good neighborliness”) — the principle that spatial adjacency generates unexpected intellectual connections. The interface itself is the argument. Meaning emerges from juxtaposition, not from classification.

Warburg added a fourth term: Zwischenräume, “the gaps between” the images, “in which thought happens.” That phrase is the part most often missed in summaries. The gap is not noise. The gap is information.

Now consider what a vector store does.

What your vector store actually does

The standard agent-memory recipe is to embed each chunk of text into a high-dimensional vector — typically 768 or 1536 dimensions — store the vectors in an index, and retrieve at query time by cosine similarity. The premise is that semantically similar chunks land in nearby neighborhoods.

The premise is shakier than the popularity of the technique suggests. In 2024, Harald Steck, Chaitanya Ekanadham, and Nathan Kallus (Netflix and Cornell) published “Is Cosine-Similarity of Embeddings Really About Similarity?” The paper demonstrates that, in matrix-factorization models with regularization, the choice of regularization introduces arbitrary dimension rescaling without affecting the model’s predictions — and that rescaling distorts cosine similarities in unpredictable ways. In full-rank cases the authors construct, item-item cosine similarities can be manipulated to become an identity matrix: every item ends up similar only to itself. The conclusion, as reported in a Shaped.ai summary I am working from rather than the original PDF, is that cosine similarity can yield “arbitrary results, potentially rendering the metric unreliable and opaque.”

Practitioner numbers track the theory. A DigitalOcean RAG-architecture survey cites a working engineer reporting that retrieval accuracy for the correct chunk is “usually below 60%” in production. In one of the same survey’s benchmarks, BM25 — the keyword-search algorithm formalized in the 1990s — needed eight results to a vector store’s seven to hit 85% recall. After the embedding-model fine-tuning, the chunking strategy, the reranker, the hybrid search, the query rewriting, the evaluation harness — your modern vector pipeline lands roughly a rounding error away from being beaten by 1990s-era keyword search.

But the deeper failure is structural, and it is the Warburg one. A vector store collapses gesture-level keys into content-level keys. It eliminates spatial arrangement, replacing the Zwischenräume with smooth, continuous distance functions where every point has a populated neighborhood. It retrieves what is already similar, which is the opposite of what Nachleben tracking requires: the same gesture appearing where you would not expect it.

The same pose, opposite emotions

Leonardo Impett and Sabine Süsstrunk at EPFL ran the most important computational test of Warburg’s intuition. Their 2016 ECCV Workshops paper “Pose and Pathosformel in Aby Warburg’s Bilderatlas” used crowdsourced pose annotation (on the CrowdFlower platform) to encode 2D human posture in roughly a third of the Atlas’s panels. They then clustered the poses by relative limb angles using a hierarchical model.

They found the Pathosformeln. Cluster structure mapped onto the recurring gestural families that Warburg had pinned together by eye.

They also found something more unsettling. Morphologically identical poses — the same arrangement of limbs, head, and torso — could carry opposite emotional valences depending on panel context. The arms-raised, head-back gesture might be ecstasy in one frame, grief in another, triumph in a third. The pose was a stable signature; the meaning was supplied by the juxtaposition.

This is the experimental confirmation of the embedding-collapse problem. A vision model trained to embed pose images on visual content will map ecstasy and grief to the same vector neighborhood, because they look the same. Cosine similarity will retrieve them as a cluster. The polarity that Warburg’s spatial arrangement preserved is gone. The signal lived in the panel, not in the image.

The taxonomy problem repeats every generation

Era	Technology	Classification	What it misses
1593	Print	Cesare Ripa’s Iconologia — ~700 alphabetically ordered personifications	Context, irony, subversion
1970s	Library science	Henri van de Waal’s Iconclass — 28,000 hierarchical definitions, 14,000-keyword index	Cross-category recurrence (Nachleben)
2020s	Transformer embeddings	CLIP-style vision-language vectors; cosine-similarity neighborhoods	Spatial arrangement; juxtaposition; gaps

Ripa’s 1593 Iconologia is the canonical Renaissance lookup table: each entry specifies the attributes an artist should give to an abstract concept (Justice with scales; Prudence with mirror and serpent), and the same book gives an art historian the means to identify those concepts three centuries later. It is alphabetical, finite, and unable to express why a concept appears here, now, this way. Iconclass, the 28,000-notation hierarchical system the Dutch art historian Henri van de Waal developed in the 1970s and that collaborators completed after his death, is the same lookup table at scale, with the same blind spot: a 2nd-century gesture and its 15th-century recurrence sit on different branches of the hierarchy. Wikipedia preserves the well-known practitioner complaint that Iconclass codes are “a bit too precise” — paintings get excluded from searches because the assigned code is slightly more or less specific than the search code.

Transformer embeddings are the newest version. The TIB Hannover team’s Iconclass classification pipeline runs YOLOv8 object detection, algorithmic mapping to Iconclass codes, rule-based inference for abstract meanings, and three recommenders. Multimodal vision-language pre-training has shown that CLIP-style embeddings can match Iconclass codes well. But the architecture is identical in principle to Ripa’s: a finite scaffolding, applied to images, with the same structural inability to surface the afterlife connection that wasn’t trained into the embedding space.

Each generation rebuilds the taxonomy in the technology available to it. Each generation finds the same blind spot. Vector embeddings are not an escape from Iconclass; they are Iconclass implemented in floating-point.

Vector embeddings are not an escape from Iconclass; they are Iconclass implemented in floating-point.

Pinterest is closer to Warburg than your RAG stack

The cleanest counter-example to the vector-store paradigm is not a knowledge management product. It is Pinterest. The 2018 WWW paper introducing the Pixie recommendation system — co-authored by Stanford and Pinterest engineers — describes the platform as a “giant human-curated bipartite graph of 7 billion pins and boards with over 100 billion edges.” Recommendations come from a random walk over the graph, not from cosine distance in an embedding space. The graph preserves board structure — the boards are the panels — and traversal respects which pins humans grouped together.

This is structurally a Warburg system at scale. The boards are the panels. The pins are the images. The walk is the user moving from cluster to cluster via the connections that human curators wired in. Pinterest proves that the spatial-curation model scales to consumer-product magnitudes — billions of items, hundreds of millions of users — without flattening into a continuous embedding space.

But Pinterest is not Warburg. Pinterest boards cluster by topical affinity: chocolate desserts, mid-century furniture, bridal hairstyles. Warburg’s panels clustered by gestural recurrence across topics: a dying gladiator, a Renaissance saint in martyrdom, and a Weimar boxing photograph might share a panel because they share a Pathosformel, even though no Pinterest user would ever pin them to the same board. Pinterest preserves spatial arrangement; it does not preserve cross-category Nachleben. The afterlife of a gesture across centuries is not a board you can build in a UI optimized for “more things that look like this thing.”

The gap between Pinterest and Warburg is the gap between a topical-curation product and a discovery engine.

What a Mnemosyne memory architecture would look like

If you wanted to build agent memory the way Warburg built the Atlas, you would need four primitives that current architectures do not give you cleanly.

Explicit panel metadata. Memories cluster into discrete groups whose membership is a first-class property, not an emergent neighborhood in continuous embedding space. Each panel has a spatial logic — a curatorial reason — distinct from the content of its members. “Failures in the rate-limit subsystem during high-traffic events” is a panel; the failures inside it may be embedded far apart by content similarity, but they share a panel because the curator (the agent or the human supervising it) said they do. Implementation: a panel_id on each memory record, plus a panel-level note that documents why these things are adjacent.

Gesture-level keys distinct from content keys. A memory’s pattern signature — “resource-exhaustion failure mode,” “premature optimization that obscured a deeper bug,” “trust transferred without provenance” — is indexed separately from its content. Two memories with completely different surface content but the same pattern signature should be retrievable together. This is what Pathosformel indexing means in practice. The gesture is a kinematic key; the content is a textual key; they live in different indices and the agent queries them differently.

Recurrence tracking. When a pattern signature reappears across temporally distant entries, that recurrence is itself a queryable event. “This failure mode last appeared 47 days ago in an entirely different subsystem” is exactly the Warburg move: the afterlife of a motif across contexts the taxonomy did not anticipate. A Nachleben tracker is structurally a recurrence detector running over the pattern-signature index, surfacing signals where a vector-store cosine query would have returned nothing.

Preserved gaps. Not every memory connects to every other memory. The absence of a connection is information. Vector stores eliminate this by construction — every embedding has a non-zero cosine similarity with every other embedding. A Mnemosyne memory makes “no edge” a representable state and treats high-density adjacency and unbridged gaps as different signals to the agent’s reasoning loop.

You can implement all four on top of a plain file system with a few hundred lines of glue code. The DEV Community piece “Why Your Agent’s Memory Architecture Is Probably Wrong” puts the practical version of this argument bluntly: for bounded projects, plain files outperform vectors in predictability, debuggability, and agent autonomy. Vector retrieval becomes worth its complexity only at thousands-of-documents scale, and even there, mostly for high-recall topical retrieval where “semantically similar enough” is the actual goal.

Where the analogy breaks

Warburg was a single curator working from forty years of immersion in classical and Renaissance imagery. His Pathosformeln were intuited, not formalized; his Gute Nachbarschaft judgments were the product of one particular scholar’s eye, formed in the Hamburg banking-and-art milieu of the late 19th century. An agent system does not have that single eye. Trying to encode “good neighborliness” by hand across millions of memories is not a research program — it is a refusal of the problem of scale.

So the design move is not to imitate Warburg literally. It is to recognize that the structural primitives he was using — explicit panels, gesture keys, recurrence trackers, preserved gaps — are missing from most current agent-memory stacks, and that adding them is a different kind of problem than tuning a reranker or chunking strategy. Vector stores still earn their place when the task is high-recall topical retrieval (“find me the docs about authentication”) and when “similar to what I’m reading” is the right question. They fail when the question is “where has this pattern shown up before, in places I would not have looked.”

The gaps where thought happens

The Atlas was unfinished. Warburg planned up to 200 panels and built 63. The original wooden panels are gone; we have only photographs of an arrangement made by a man who knew his time was running out and did not finish.

But the unfinished form is itself a kind of argument. The panels were designed to be infinitely extensible by rearrangement — by re-pinning, re-juxtaposing, opening new gaps for new thought. The interface was the intelligence. Vector stores scale by adding more vectors. Warburg scaled by rearranging the ones he already had.

If you remember nothing else from this essay, remember the Zwischenräume — the gaps between the images, in which thought happens. Build the gap into your memory architecture. Index gestures separately from content. Track recurrence as a first-class event. Let your panels be panels, with curators behind their groupings, and resist the temptation to flatten everything into a continuous neighborhood where every thought is close to every other thought and nothing is surprising.

A retrieval system optimized for similarity will never surface what you most need to know — by definition, it returns what the model already considers close. Black hessian, wooden panels, and pins were designed, in the last five years of one man’s life, to do the opposite. That is why the Bilderatlas Mnemosyne still beats your vector store.

The Pattern Signature Your Memory Stack Doesn’t Index

If your agent stores every action, retrieval, and decision in a vector store, you have content keys and nothing else. The same four properties Warburg built — explicit panels, gesture-level keys distinct from content, recurrence as a queryable event, and preserved gaps — need a substrate that records what happened, in what context, and linked to what prior entry, not just “what is this similar to.” Chain of Consciousness is a hash-linked, append-only log per agent action: every entry carries a panel reference and a back-link to whatever prior entry it elaborates, so recurrence detection becomes a query rather than a re-embedding job. The gesture key is structural; the content key sits beside it.

pip install chain-of-consciousness
npm install chain-of-consciousness

Try Hosted CoC — a panel-aware, recurrence-queryable record of agent actions.