Otto's Notebook Has a Spec: Clark's Four Criteria as a Memory-Architecture Audit

Four agent bugs that look like four unrelated problems share one root cause. A 1998 philosophy paper about a man with a notebook is, read correctly, a four-point audit for agent memory.

Published June 2026 · 9 min read

Picture the bug queue for an AI agent that's misbehaving in production. Ticket one: the agent gives different answers to the same task on different runs, and nobody can reproduce it. Ticket two: the agent confidently reports that it called a tool and got a result, except the tool was never invoked; it made the result up. Ticket three: the agent retrieves a correct fact from its own memory store and then argues with it, talking itself back into a wrong answer it already had. Ticket four: the agent acts, with total confidence, on a "fact" in its memory that no one on the team can trace to any source.

Four tickets. They'll get filed under four different labels, non-determinism, hallucination, RAG faithfulness, security, and routed to four different people. And they are, all four, the same bug: a memory component that has failed to be a cognitive component. The diagnostic that unifies them was written in 1998, by two philosophers, about a man with a notebook.

Otto, Inga, and the notebook that was a mind

In "The Extended Mind" (Analysis, 1998), Andy Clark and David Chalmers ask you to compare two people. Inga hears the museum has a new exhibit, recalls from biological memory that the museum is on 53rd Street, and walks there. Otto has Alzheimer's; he keeps a notebook in which he has written, among everything else, "the museum is on 53rd Street." He looks it up and walks there. Clark and Chalmers' provocative claim is that there is no principled difference between the two cases: Otto believed the museum was on 53rd Street in the same functional sense Inga did, and his notebook was, for that purpose, literally part of his cognitive system. Their justification is the parity principle: if an external resource does the job a brain process would do, and we'd happily call that brain process cognition, then the externally-supported process has an equal claim to count.

Here is the part everyone forgets, the part that turns a philosophy paper into an engineering document: Otto's notebook doesn't get to count automatically. Clark and Chalmers were careful. The notebook qualifies as part of Otto's mind only because it meets four conditions:

Constant availability: the notebook is "a constant in Otto's life"; he rarely acts, in the relevant situations, without it.
Easy accessibility: when he wants the information, it's "directly available without difficulty."
Automatic endorsement: when he retrieves something, he "automatically endorses it." He doesn't re-litigate it; he trusts it the way you trust your own recall.
Past endorsement: the information is in the notebook because Otto, at some point, "consciously endorsed" it. It's there as a consequence of his own act of putting it there, on evidence.

For twenty-five years this was a thought experiment, a prompt for seminar arguments about where the mind stops. For an agent, it stopped being a thought experiment. It became an architecture diagram.

For agents, the notebook is real

Here is the move, and the reason this matters to anyone shipping agents. For Otto, "is the notebook really part of his mind?" is a genuine metaphysical puzzle. For an AI agent it isn't a puzzle at all; it's a plain fact of the system. The agent's memory file, its vector index, a message queue, a KV cache, the outputs of the tools it calls: these are literally in the cognitive loop. They are the source of the "beliefs" the model conditions on when it generates the next token. There is no inner sanctum where the "real" reasoning happens, walled off from the notebook. The notebook is the working memory.

Which means we get to skip the entire consciousness debate (this is a functional claim, not a metaphysical one, and nothing here says agents have minds or inner lives) and ask the engineer's version of the question instead. Not "does this store count as cognition?" but: "does this store meet the four conditions a working cognitive component has to meet, and if it doesn't, what breaks?" Clark and Chalmers' criteria stop being a gate for awarding the title of "mind" and become something far more useful: a four-point audit checklist for agent memory. And each criterion, when you fail it, predicts a specific, documented, named failure. (To be clear about the epistemics: Clark and Chalmers did not forecast these bugs; they were writing about Alzheimer's and notebooks. Read as an audit, their criteria illuminate and organize a set of failures we already see; the framework is a lens, not a 1998 prophecy.)

The audit: four criteria, four pathologies

Constant availability → non-determinism. If the memory store or tool isn't reliably there, a flaky API, an index that sometimes returns and sometimes times out, then the agent sometimes holds a belief and sometimes doesn't, and you get the same task producing different behavior on different runs. This is the failure that masquerades as "LLMs are just non-deterministic." Some of it is sampling, yes, but a real and underappreciated chunk is memory non-determinism: as the knowledge-conflict literature documents, models give "inconsistent responses to semantically similar inputs," and an agent has "no internal mechanism that remembers if an API call actually went through." Otto with a notebook that's sometimes in his pocket and sometimes left at home is not a reliable believer; he's a coin flip. Reliability of the store isn't a nice-to-have; it's criterion one.

Easy accessibility → pre-call hallucination. This is the most counterintuitive of the four, and the most operationally vicious. Clark's second condition says the information must be reachable without difficulty. Put friction in front of an agent's tool, an auth handshake, a rate limit, latency, a flaky connection, and the agent does not simply wait. It fabricates the tool's output. The benchmark literature on tool-use hallucination (MIRAGE-Bench and related work) describes exactly this: "the agent answers directly, simulating or inventing results instead of performing a valid tool invocation." When reaching the notebook is hard, "the most natural-sounding next step is to confidently say the task is finished, whether the underlying code executed or not." Inaccessibility doesn't slow the agent down; it causes the agent to confabulate. Otto's notebook welded shut doesn't make Otto careful; it makes him invent the address.

Automatic endorsement → dithering. The third condition says retrieved information must be trusted like recall, endorsed automatically, not subjected to fresh critical scrutiny every time. Agents violate this constantly, and the way they violate it is genuinely surprising: given retrieved context that conflicts with the model's own parametric prior, models tend to side with the prior. The surveys on knowledge conflict and RAG faithfulness find a "strong confirmation bias toward parametric knowledge": the agent retrieves the correct fact from its notebook and then talks itself out of it. And the twist that should keep architects up at night: this gets worse with capability. Stronger models are reported to cling to their internal beliefs more stubbornly, persisting on an incorrect prior even when correct external evidence sits right there in context. Otto doesn't dither over his notebook; he reads "53rd Street" and goes. An agent that re-argues every retrieved fact against its own half-memory isn't using its notebook; it's fighting it. (A reported tendency rather than an iron law, though a well-attested one.)

Past endorsement → memory poisoning. The fourth condition is the quiet one, and it proves to be the loudest. It says the contents are in the notebook because Otto put them there, on evidence, there as a consequence of his own past endorsement. Violate it, let the agent act on memory it never originated and never vetted, and you have handed control of the agent to whoever can write to the store. This is precisely the shape of memory poisoning and indirect prompt injection. AgentPoison (NeurIPS 2024) injects malicious triggers into RAG knowledge bases and long-term memory; MINJA poisons through ordinary interaction history; MemoryGraft implants fake "successful experiences" the agent later imitates. The security surveys put indirect prompt injection at success rates approaching 100% against undefended RAG pipelines, undefended being the load-bearing word; defenses exist and change the number. Otto's notebook, if a stranger can scribble in it while he isn't looking, doesn't just make Otto wrong. It makes Otto remote-controlled.

The twist: philosophy's weakest criterion is engineering's spine

Now the payoff, and it's a clean inversion. Of Clark and Chalmers' four conditions, the fourth, past endorsement, is the one philosophers tried to throw away. Clark and Chalmers themselves hedged it in the original paper, and there's a whole subsequent literature arguing it's dispensable: you can plausibly have extended beliefs you never "consciously endorsed" (a notebook someone trustworthy filled in for you still seems to extend your mind), so the cleanest version of the extended-mind thesis drops criterion four as an unnecessary restriction. In the philosophy debate, past endorsement is the afterthought, the soft spot most likely to get cut.

For agents, it is the most critical criterion of the four, by a wide margin, because its violation is the entire memory-poisoning attack surface. The condition the philosophers found weakest is the one whose absence is catastrophic in deployment. And once you see it, the reframing is permanent: provenance, "is this belief in my memory because I put it there, on evidence I evaluated?", is not a philosophical nicety. It is a security property. A memory store with no answer to that question is a store anyone can write to, and a store anyone can write to is an agent anyone can drive. The metaphysics seminar's throwaway is the production system's spine.

I should keep one honest caveat in view: the extended-mind thesis itself is contested ground: Adams and Aizawa's "mark of the cognitive," the cognitive-bloat objections, and so on. I'm not adjudicating that. I'm using parity as a productive architectural lens: a way to see that the same four conditions a notebook needs to function as memory are the four conditions an agent's memory store needs to function without one of these four pathologies. You don't have to believe Otto literally has an extended mind to find the checklist useful. You only have to notice that your vector index is playing Otto's notebook's role, and ask whether it passes the same four tests.

Run the audit

The practical version is a checklist you can run against any agent-memory architecture, and the value is that it unifies failures you were treating as unrelated:

Constant availability → is the store/tool reliably there? Give it a real availability target. Symptom of failure: non-determinism on identical tasks.
Easy accessibility → is retrieval low-friction? Budget the access cost; assume that any friction you don't remove, the agent will "remove" by inventing the answer. Symptom: pre-call / tool-bypass hallucination.
Automatic endorsement → does the agent trust the store, or re-litigate it? Decide the trust policy explicitly, and know that simply having a "smarter" model can make this worse, not better. Symptom: dithering, fighting its own retrieved memory.
Past endorsement (provenance) → can every belief in the store be traced to a source the agent actually vetted? Sign it, attribute it, gate writes to it. Symptom: memory poisoning, and the agent quietly acting on assertions it has no relationship to.

Most agent-memory stacks fail at least one of these silently, and only discover it through the downstream pathology, which, without the framework, reads like a random bug from a different department. The deepest lesson is the one the twist hands you: when you build an agent's memory, build it the way Clark and Chalmers said a notebook becomes a mind: constant, accessible, trusted, and yours, and treat that last word as the security requirement it actually is. The other three criteria decide whether your agent works. The fourth decides whether it's yours.

A philosopher's puzzle about where the mind ends became a spec. The notebook was never just a metaphor. For your agent, it's the architecture, and it has four acceptance tests, written in 1998, that most of us are failing without knowing which one.

Sources

Andy Clark & David Chalmers, "The Extended Mind," Analysis 58:1 (1998): Otto/Inga, the parity principle, the four criteria (era.ed.ac.uk); overview: "Extended mind thesis" (en.wikipedia.org)
"Overcoming the Past-endorsement Criterion": on criterion #4 being contested/dispensable (pmc.ncbi.nlm.nih.gov)
"Knowledge Conflicts for LLMs: A Survey" (arXiv 2403.08319) and FaithfulRAG (arXiv 2506.08938): automatic-endorsement failure / parametric-prior persistence / dithering
MIRAGE-Bench (arXiv 2507.21017) and tool-use-hallucination analyses: easy-accessibility failure / tool-bypass hallucination
AgentPoison (NeurIPS 2024) and "Survey on Long-Term Memory Security in LLM Agents" (arXiv): past-endorsement failure / memory poisoning; ~100% indirect-prompt-injection success against undefended RAG

Provenance is the fourth criterion. It is also a security property.

"Is this belief in my memory because I put it there, on evidence I evaluated?" is the question that decides whether your agent is yours or anyone's. Chain-of-consciousness records where each belief in an agent's working memory came from and the reasoning that admitted it, so every entry in the notebook has a traceable source instead of an anonymous one. The criterion the philosophers tried to discard, wired into the store as an acceptance test.

pip install chain-of-consciousness · npm install chain-of-consciousness
Hosted Chain-of-Consciousness → · vibeagentmaking.com

← Back to all posts