Reconstruct, curate, or present-the-disagreement — your pipeline already picked one, and never told you.
Ask a retrieval-augmented chatbot a question with a messy answer — say, the founding year of a company that merged, rebranded, and got acquired — and watch what happens. It pulls three documents. One says 1998. One says 2001 (the rebrand). One says 2012 (the acquisition). The model returns a single fluent sentence: “The company was founded in 1998 and reorganized in 2001 before being acquired in 2012.” Clean. Confident. And quietly synthetic — that exact sentence appears in none of the three sources. The model wrote a new text by splicing the best-looking pieces of several conflicting ones.
If you’ve spent time around editions of old books, that move is eerily familiar. It is precisely what a 19th-century classical scholar did when handed five disagreeing manuscript copies of Lucretius and told to produce the text. The discipline that studies how to do this well — and how it goes wrong — is called textual criticism, and it has been arguing about your RAG pipeline’s central design decision since the 1850s. It just didn’t know that’s what it was doing.
The useful news for anyone building answer-synthesis systems: that 200-year-old argument already mapped out the trade-offs, named the failure modes, and built the safety mechanism your pipeline is missing. You don’t have to rediscover it. You just have to notice you’re in it.
Most pre-modern texts survive only as copies of copies. The author’s original is gone. What you have is a tradition — a branching pile of later manuscripts, each introducing its own slips, fixes, and “improvements.” Faced with that, editors split, roughly, into three camps. The split is old and it is still live.
The eclectic editor reconstructs. Karl Lachmann (1793–1851) is the name attached to the systematic version: build a family tree of the surviving copies — a stemma codicum — by spotting shared mistakes, on the principle that community of error implies community of origin (two manuscripts with the same weird blunder probably inherited it from the same ancestor). Then, working up the tree, pick the best reading at each contested point. The output is a composite: a text assembled from many witnesses that may have existed in none of them. It is, in a real sense, a new artifact — the editor’s best guess at a lost original, stitched from surviving fragments of evidence.
The best-text editor curates instead of reconstructs. Pick the single most reliable surviving witness and follow it faithfully, correcting only where it is obviously broken. Joseph Bédier (1864–1938) pushed this hard for medieval French, partly out of suspicion that the eclectic method was fooling itself (more on that later). The job here is custodial, not creative: present one real, historically-attested version, warts and all, and resist the urge to improve it with material from elsewhere.
The social-text editor refuses the premise. Jerome McGann, across the 1980s and culminating in The Textual Condition (1991), argued that every text is a social product — shaped by scribes, printers, editors, institutions — so there is no single “correct” version to recover. Bernard Cerquiglini, in Éloge de la variante (1989), went further: in manuscript culture, variation is not corruption to be cleaned up. It is the essential condition of the text. The variants are the object. The job is to present the disagreement, not resolve it.
Reconstruct, curate, or present-the-disagreement. Hold those three in mind, because you have already implemented all three — probably without naming any of them.
A retrieval system that pulls several documents and asks a model to “synthesize an answer” is doing eclectic editing. It is combining witnesses into a composite that existed in none of them. That is the Lachmann move, performed thousands of times a second, by a system that has never heard of Lachmann.
A system that retrieves the single highest-ranked document and answers strictly from it is doing best-text editing. One authoritative witness, followed faithfully. It will never hallucinate a cross-source contradiction, because it never crosses sources — at the cost of whatever lived in the documents it didn’t pick.
And a system that surfaces all the retrieved views, contradictions intact — “Source A says X; Source B says Y” — is doing social-text editing. The variants are the answer.
This isn’t a loose metaphor; the same trichotomy shows up a third time, in distributed systems, which gives us precise engineering language for the trade-offs. When two replicas of a database accept conflicting writes, you need a conflict resolution strategy, and the families of Conflict-Free Replicated Data Types (CRDTs, formalized by Shapiro and colleagues in 2011) line up one-to-one:
Three fields, three vocabularies, one decision. The reason this matters is not that it’s a cute coincidence. It’s that two of those fields — philology and distributed systems — have spent decades being honest about the costs, and the third field, RAG, mostly hasn’t named the choice at all. Most pipelines are running unmarked eclectic editing and calling it “synthesis.”
Textual critics have a precise term for the moment the eclectic editor reaches a point where the witnesses conflict and no mechanical rule settles it: divinatio — conjecture. The editor, drawing on judgment, proposes a reading. Crucially, the tradition treats this as a dangerous, high-skill act hedged with rules. Paul Maas’s Textkritik (1927) is famous for being austere about it: responsible conjecture must be grounded in transmission history, the typology of how scribes corrupt texts, paleography, language history, orthography. You don’t just write the prettiest sentence. You reason about how the error could have arisen and whether your fix is the kind of thing the tradition could have produced.
A RAG model splicing an answer from conflicting documents is performing divinatio. When it writes “founded in 1998 and reorganized in 2001,” it is conjecturing a reconciliation that no source asserted. That can even be the right conjecture. The problem is that it is conjecture without any of Maas’s discipline — no model of how the sources came to disagree, no typology of which kind of source corrupts which kind of fact, no transmission history. The research on RAG failure (work like ReDeEP on hallucination detection) describes models that “contradict the retrieved content, introduce unsupported details, or extrapolate beyond what the evidence justifies.” Philology has had a word for that specific operation for two centuries. It’s bad divinatio.
But here is the sharper point, the one worth taping to your monitor. A trained philologist performing divinatio marks it. The critical edition has an apparatus criticus — the dense band of notes at the foot of the page recording which witness said what, where the editor conjectured, and how confident they are. Conjectures get daggers (†) or square brackets. The reader can always see the seam between what a witness attests and what the editor guessed. The discipline’s core ethic is that reconstruction is allowed, but it must be auditable.
Your LLM performs divinatio and then erases the apparatus. The conjectured reconciliation is rendered in exactly the same confident prose as the directly-sourced facts. There is no dagger. There is no bracket. The seam between attested and guessed — the single most important piece of information in any reconstructed text — is gone. That is not a hallucination problem in the usual sense. It is a missing-apparatus problem. The model didn’t necessarily reason worse than a 19th-century editor. It reasoned without showing its work, and without flagging the guesses as guesses.
There’s a reason you can’t just declare one of the three approaches the winner: each buys something by giving up something, and the structure of that trade-off is identical to a result distributed-systems engineers already know cold. The CAP theorem (Brewer, around 2000; proved by Gilbert and Lynch in 2002) says a distributed store can guarantee at most two of consistency, availability, and partition tolerance. The editorial trichotomy is the same triangle wearing a tweed jacket:
Pick two. A RAG architect choosing between “synthesize one answer,” “answer from the top hit,” and “show all sources” is making a CAP decision, with the same impossibility lurking underneath: you do not get consistent, available, and disagreement-preserving all at once. Naming it as a CAP trade-off does something practical — it tells you to stop looking for the configuration that has no downside, and start choosing the downside you can live with for this query type.
The deepest thing the old discipline gets right is that relevance is not authority. A scribe’s manuscript can be beautifully written, early, and prominent — and still wrong, because it sits on a corrupt branch of the tree. So editors assess each witness’s reliability: its error rate, its distance from the archetype, the company of errors it keeps. They do not confuse “this manuscript is famous and legible” with “this manuscript is correct.”
Retrieval pipelines confuse exactly that. As the evaluation literature puts it bluntly, “relevance scoring is a ranking decision, not a truth guarantee” — retrievers optimize semantic similarity and embedding proximity, not factual correctness. A document can be the top hit (maximally relevant) and still be the wrong branch of the tree. The well-documented “relevant documents, wrong answer” failure is a philologist’s nightmare: you grabbed the prettiest witness and trusted it because it was close, not because it was right.
Encouragingly, the best recent RAG research is independently rebuilding the philologist’s toolkit — and recognizing what it is rebuilding will help you adopt it on purpose rather than piecemeal.
Build a critical apparatus. Systems in the ConflictRAG vein (2025) generate answers grounded in the most credible source, then attach conflict annotations, per-claim source attribution, and a confidence qualifier. That is, precisely, a critical edition with an apparatus: it attributes claims to witnesses and flags disagreement instead of silently resolving it. If you build one thing from this essay, build this — the dagger for your conjectures. Mark which spans of an answer are directly sourced, which are synthesized across sources, and how sure the system is. The marking is worth more than marginal gains in raw accuracy, because it converts a confident fabricator into an auditable one.
Run an advocatus diaboli. The Madam-RAG framework (2025) assigns each retrieved document to its own agent, runs multi-round debates between them, and aggregates — reporting strong conflict-detection numbers (an F1 around 88.7%) and measurable correctness gains over naive baselines. That adversarial structure is not new under the sun: it is the digital descendant of the “devil’s advocate,” the formal challenger institutions have long used to stress-test a claim before canonizing it. Make your sources argue before you let them agree.
Decide your conflict taxonomy up front. Madam-RAG separates three kinds of conflict, and each maps to a school’s prescription:
Most pipelines treat all three the same way (jam everything into the context window and hope). The editorial tradition tells you they require three different policies, and conflating them is itself a bug.
Before you tune another reranker, answer one question about your system: which editor is it? Right now, most RAG pipelines are unmarked eclectic editions — they reconstruct a composite answer from many witnesses and present the conjectures with the same confidence as the facts. That is the single worst option the tradition identified: reconstruction without an apparatus, divinatio with no dagger. It is worse than honest best-text (which at least never lies about crossing sources) and worse than honest social-text (which at least shows you the disagreement).
So make the choice explicit, per query type. For a question with one true answer that lives in one place, be a best-text editor: retrieve the most credible single source and follow it. For a question whose answer is genuinely contested or plural, be a social-text editor: surface the variants and let the user adjudicate. And when you must reconstruct across sources — when eclectic synthesis is the only way to answer — earn the right to do it the way philologists earned it: attribute every claim to its witness, mark the synthesized spans as synthesized, and attach a confidence qualifier. Add the apparatus.
Lachmann, Bédier, and McGann spent their careers on a single question that turns out to be your question: when many imperfect sources disagree, what do you owe the reader? They concluded, after 150 years of argument, that the one unforgivable move is to hide the seam between what your sources said and what you made up. Your pipeline makes that move by default. The fix is two centuries old. It’s called showing your work.
The apparatus criticus your pipeline is missing is a provenance record.
The essay’s one prescription — mark the seam between what a source attested and what the model guessed — requires that every claim carry its witness with it: which document it came from, or that it was synthesized across several. That is provenance, and it has to be captured at the moment the answer is built, not reconstructed after the fact. Chain of Consciousness anchors every agent action and observation to a verifiable external record, so an answer’s spans stay attributable to their sources — the dagger and the bracket, in machine-readable form. It is the apparatus criticus for an LLM: reconstruction stays allowed, but it stays auditable.
pip install chain-of-consciousness · npm install chain-of-consciousness
Hosted Chain of Consciousness → · See a verified provenance chain