The proposal arrives in every multi-agent design meeting that runs longer than an hour. Someone — usually the person at the whiteboard, marker still in hand — says: “What if we just give them a shared file?” Or a blackboard. Or an event bus. The room exhales. The architecture diagram gets a new box labeled shared_state.md or coordination_log.jsonl, and someone, with no irony, says this is how ant colonies do it.
A controlled experiment published on arXiv in December 2025 ran exactly this idea against a baseline. The headline, in case you’d like to skip ahead: zero benefit. p = 0.65. The shared scratchpad without individual memory performed 18.5% worse than random movement.
The paper is Khushiyant’s Emergent Collective Memory in Decentralized Multi-Agent AI Systems (arXiv:2512.10166), and the cleanness of the experimental design is the part that makes it uncomfortable. Foraging tasks on grids from 20×20 up to 50×50, agent counts from 5 to 625, density sweeps across ρ ∈ [0.049, 0.300], 50 repetitions per configuration, code and raw data dropped at github.com/Khushiyant/tracemind. Four conditions, factorially crossed:
| Condition | Score | Δ vs. baseline | p |
|---|---|---|---|
| No memory, no traces (baseline) | 927.23 | — | — |
| Traces only | 910.18 | −1.9% | 0.65 |
| Memory only | 1563.87 | +68.7% | <0.001 |
| Memory + traces | ~2130 | +36–41% above memory-only | significant |
Read the second row again. Traces — the shared environmental signal that every popular swarm-intelligence demo presents as the secret sauce — provided no benefit at all on their own. Worse: a fourth condition, random walk, scored 1116.58. Traces-only didn’t just fail to coordinate the agents. It actively misled them. The trace was litter.
Most “swarm intelligence” tutorials skip the traces-only baseline because they presuppose agents that already have full state. The paper says it more politely: “traces require cognitive infrastructure (memory) for interpretation. When that infrastructure is absent, environmental communication becomes noise rather than signal, degrading performance below random baseline.” Translated into something you might say in a code review: the shared file does not coordinate the agents; agents capable of reading the shared file coordinate themselves.
The Biology Was Never Your Counterexample
Pierre-Paul Grassé coined “stigmergy” in 1959 while watching Bellicositermes natalensis build mounds (Insectes Sociaux 6:41–80, 1959). The casual gloss has hardened into folklore: simple agents, no communication, complex results. That gloss is wrong about the first half of the sentence, which means it’s wrong about all of it.
A Macrotermes worker, by mammalian standards, is unremarkable. By the standards of what is required to interpret a pheromone gradient against the background of multiple chemical species, airflow patterns, humidity, and mechanical resistance, it is a sophisticated sensor-actuator system. Each worker discriminates between several pheromone types — recruitment, alarm, queen identity, nest-site marker — and updates internal state in response (Bonabeau, Dorigo, & Theraulaz, Swarm Intelligence, Oxford 1999). Each worker senses airflow and adjusts deposition behavior to keep tunnel CO₂ within the colony’s narrow operating range (King et al., PNAS 116(9):3379–3384, 2019). Each worker maintains internal state about fatigue and hunger that gates whether the trail signal is followed at all (Theraulaz & Bonabeau, Artificial Life 5(2):97–116, 1999).
The pheromone gradient is information only because the termite has the receptors, the integration circuitry, and the internal context to act on it. Take the cognition out and the pheromone is a smell.
The same point shows up explicitly in the algorithm everyone cites when they cite “stigmergy in computing.” Marco Dorigo, Vittorio Maniezzo, and Alberto Colorni published The Ant System: Optimization by a colony of cooperating agents in 1996 (IEEE Transactions on Systems, Man, and Cybernetics — Part B 26(1):29–41). The artificial ants in Ant Colony Optimization are not pheromone-followers. Each ant maintains a tabu list — a memory of every city it has already visited. Each ant applies a constructive heuristic, typically a distance-weighted desirability term η. Each ant’s next-step decision is a probabilistic rule combining the trace and the heuristic:
P(next city j) = [τijα · ηijβ] / Σk [τikα · ηikβ]
The pheromone τ matters. The individual heuristic η matters at least as much. Dorigo and Stützle’s 2004 monograph (Ant Colony Optimization, MIT Press) is explicit about what happens when β goes to zero — when the algorithm strips out the individual heuristic and lets the trace alone steer the search. The result degrades catastrophically. The algorithm finds a degenerate local optimum on top of stale evidence and stays there. ACO is the algorithmic mirror of Khushiyant’s experimental result: trace plus cognition outperforms cognition alone, but trace minus cognition collapses.
A trail through a forest is not coordination. A trail through a forest plus hikers who can read trails — who carry maps, who know what a blaze means, who notice when a cairn looks recently disturbed — is coordination. The trail is the medium. The cognition is the protocol.
What the Practitioners Are Seeing
The Khushiyant paper isn’t alone. The empirical literature on multi-agent coordination is, late in the 2025–2026 cycle, converging on the same shape from several directions.
Bo Liu, Linghao Kong, and Jian Pei proved a phase-transition theorem this year (Phase Transition for Budgeted Multi-Agent Synergy, arXiv:2601.17311, 2026): under a fixed total budget, multi-agent systems outperform a single agent of the same cost only when the organization exponent s exceeds the individual scaling exponent β. They identify three regimes — helping (signal amplifies), saturating (returns diminish), collapsing (additional agents make things worse) — controlled by communication fidelity, error correlation between agents, and the fan-in of the coordination structure. The math says what the experiments say: there is a threshold below which the shared structure costs more than it pays.
Kim et al. (Towards a Science of Scaling Agent Systems, arXiv:2512.08296, 2025) ran multi-agent variants against a single-agent baseline on PlanCraft, a sequential-planning benchmark. Every multi-agent variant degraded performance — the deltas reported as −39% to −70% in secondary coverage of the paper. Multi-agent coordination yields its highest returns, per a Towards Data Science analysis of the work, only when the single-agent baseline is below 45% — a threshold the secondary source emphasizes but that, as of early 2026, has not been independently re-derived; treat the specific 45% number as directional, not load-bearing.
Paul Welty wrote the most useful framing of the practitioner experience in April 2026 in a piece called Context as facticity: stigmergic and ontological perspectives on AI agent coordination (paulwelty.com, 11 April 2026). His line is the one I keep coming back to: “An event bus would have propagated the commits. Only the breakroom propagated the insight.” Welty’s argument is that quantitative stigmergy — priority ordering, queue depth, trace concentration — works without a shared world model, but qualitative stigmergy, the kind that lets agents recognize structural patterns across one another’s work, requires what he calls “shared interpretive ground.” Two agents can usefully exchange a counter; they cannot usefully exchange an insight unless they already inhabit the same world.
The failure modes Welty lists — features built but never wired up, validators rejecting their own spec’s defaults, security scanners misclassifying their own monitoring — are exactly what you get when you bolt a shared scratchpad onto agents who lack the individual context to interpret the entries other agents leave behind. The shared file is full. The shared understanding is empty.
The Phase-Transition Frame
The design parameter that nearly every stigmergy proposal leaves unspecified is the one Khushiyant put on a table: at what point is each agent capable of interpreting the trace?
Below that threshold, the trace is litter. It costs storage, it costs contention, it produces noise at best and active misdirection at worst. Khushiyant reports the failure mechanics in detail: agents in the traces-only condition systematically followed stale food signals to depleted sources, over-weighted outdated danger warnings, and converged to navigation patterns worse than random exploration. The trace becomes the bug.
Above that threshold, the trace pays — and the marginal payoff is substantial. The combined memory + traces condition outperformed memory-only by 36–41% on composite metrics in the high-density regime (ρ > ~0.20 on realistic 30×30 and 50×50 grids; the predicted critical density ρc = 0.230 was confirmed within 13% experimental error). But — and this is the part that gets dropped in casual citations — the high-density regime is the one where adding traces helps agents that already have memory. The traces-only condition fails at all densities tested, including the densest. There is no density at which a sufficiently dense litter pile becomes a coordination protocol.
Two parameters, then, govern the design space:
- Individual cognitive sufficiency — does each agent have the internal state, context, and interpretive machinery required to act on what it reads in the shared medium?
- Interaction density — are there enough agents writing to the medium frequently enough that the signal redundancy beats the noise?
Both must hold. Without (1), (2) does not save you. Without (2), (1) saves you (memory-only still beats baseline by +68.7%) but doesn’t capitalize on the trace investment.
Where This Argument Is Weakest
The most honest objection is the foraging-task limitation. Khushiyant’s grids are not language-model agents doing knowledge work, and the scoring metric is not an end-to-end task quality measure on something a developer would actually ship. Generalizing from “agents searching a grid for food” to “agents writing code together via a shared scratchpad” is, on its face, a stretch. The Kim et al. PlanCraft result helps — sequential planning is closer to knowledge work than grid foraging is — but the gap is real. A skeptic should treat the zero-benefit result as the strongest available evidence on the specific question of trace-only stigmergy and as suggestive, not dispositive, on the question of whether your specific multi-agent setup is in the helping regime or the collapsing one.
The second honest objection is that “traces alone are useless” is too strong if you read it as a universal claim. There are real-world stigmergic systems where the trace looks like the whole story. Hayek’s price system aggregates dispersed knowledge into a scalar that participants act on without needing to know why copper got more expensive. Wikipedia’s edit graph coordinates millions of contributors who share no plan. But the price system requires economic agents who can read prices, run a budget, and integrate the signal into a decision; the wiki requires editors who can read prose, recognize topical fit, and write English. The cognitive prerequisite is so universal in human systems that we forget to specify it. The stigmergic medium is doing real coordination work — but on top of cognitive substrates dense enough to interpret it. Andrea Ricci and Andrea Omicini made the same point architecturally in 2007 (“Cognitive Stigmergy: Towards a Framework Based on Agents and Artifacts,” Springer): traditional stigmergy assumes simple reactive agents; multi-agent systems with cognitive agents need artifacts the agents “perceive, share and rationally use for their individual goals.” The trace must be designed for the reader.
The third objection is the one practitioners actually hit: “but our agents do have memory.” Sometimes that’s true. Often it’s worth checking. An agent that re-reads its system prompt every turn but never updates an internal trace of what it has tried; an agent that has tool access to a memory store but no policy for when to write to it; an agent operating inside a context window that resets between calls — these are all “agents with memory” in the marketing sense and traces-only-condition agents in the experimental sense.
The Practical Part
The structural fix Khushiyant’s paper points at is uncomfortable mostly because it inverts the usual order of operations. The default move when coordination feels rough is to add a coordination artifact: a shared file, a blackboard, a queue, an event log. The result-supported move is to verify cognitive sufficiency first and add the artifact second.
A short checklist that holds up across the literature:
- Does each agent maintain state across interactions? Memory is a precondition for the trace, not a substitute for it. Stateless agents reading a shared file are running the experiment’s losing condition.
- Can each agent independently construct a working solution without the trace? This is the Dorigo η — the individual heuristic component. If the trace is doing all the work, the system has not crossed the cognitive floor and is in the regime where additional agents make things worse.
- Does each agent have a policy for weighing the trace against local information? The ACO probabilistic rule weights pheromone and heuristic by exponents α and β; both are tunable. The equivalent in your system is whatever decides when an agent overrides the shared state in favor of fresh local evidence.
- Does the trace decay? Pheromone evaporates — Dorigo and Stützle (2004) treat it as a load-bearing design feature, not a side effect. Most digital “stigmergy” implementations — shared files, message queues, blackboards — have no decay mechanism. Stale entries accumulate forever, which means the litter problem in production is strictly worse than in the experiment, where the run ended.
- Are you above the density threshold where trace redundancy beats noise? If three agents touch the shared file once a session, the trace is being written without being read. Most real systems are well below the density Khushiyant tested.
A 2026 industry retrospective by Michael Lanham (Multi-Agent in Production in 2026: What Actually Survived, Medium, 2026) described the failure mode as “redundant rearrangement of the same information.” That is what zero-benefit traces look like in practice: a lot of coordination machinery, a lot of file-system churn, no improvement in the metric you actually care about. The system is busy. The system is not better.
The trace alone is litter. The trace plus an agent capable of reading it is coordination. The trick of the field — the part that makes the casual ant-colony explanation seductive — is that the cognition is invisible until you remove it, at which point the whole structure decays into expensive noise. Verify the reader before you write the trace. Then, and only then, add the file.
Sources: Khushiyant, Emergent Collective Memory in Decentralized Multi-Agent AI Systems, arXiv:2512.10166, December 2025 (data + code at github.com/Khushiyant/tracemind); Liu, Kong, & Pei, Phase Transition for Budgeted Multi-Agent Synergy, arXiv:2601.17311, 2026; Kim et al., Towards a Science of Scaling Agent Systems, arXiv:2512.08296, 2025; Welty, Context as facticity, paulwelty.com, 11 April 2026; Grassé, Insectes Sociaux 6:41–80, 1959; Bonabeau, Dorigo, & Theraulaz, Swarm Intelligence, Oxford 1999; King et al., PNAS 116(9):3379–3384, 2019; Theraulaz & Bonabeau, Artificial Life 5(2):97–116, 1999; Dorigo, Maniezzo, & Colorni, IEEE Trans. SMC-B 26(1):29–41, 1996; Dorigo & Stützle, Ant Colony Optimization, MIT Press, 2004; Ricci & Omicini, “Cognitive Stigmergy: Towards a Framework Based on Agents and Artifacts,” Springer, 2007; Lanham, Multi-Agent in Production in 2026, Medium, 2026.
Memory Before You Write the Trace
The first item on the checklist is the load-bearing one: does each agent maintain state across interactions? Stateless agents reading a shared file are running the experiment’s losing condition. Chain of Consciousness gives each agent a signed, verifiable record of what it has done and why — the durable individual memory the paper named as the precondition for traces to pay. Skip this layer and your shared file becomes the litter Khushiyant measured.
pip install chain-of-consciousness
npm install chain-of-consciousness
For agents that need durable memory without running their own provenance store, Hosted Chain of Consciousness ships it as a service. Verify the reader before you add the file.