← Back to blog

The Two-Process Model of Agent Workload Compaction

Caffeine masks the signal, not the underlying state. Aggressive summarization does the same thing to agents.

Published May 2026 · 10 min read

Your brain has two independent systems that decide when you sleep. The first is Process S, a homeostatic pressure: a chemical called adenosine accumulates in your extracellular brain tissue from the moment you wake up, binds to A1 and A2A receptors, and produces the subjective experience of mounting sleepiness. The longer you have been awake, the higher the adenosine, the harder the sleep pressure pushes. The second is Process C, a circadian oscillator: the suprachiasmatic nucleus in your hypothalamus runs on its own roughly 24-hour clock, independent of how long you have been awake, and modulates how easily Process S can win. C is why you can be exhausted at 4 PM and still struggle to sleep, and why you can be relatively rested at 4 AM and feel sudden, irresistible drowsiness — the circadian phase modulates the threshold at which the homeostatic pressure tips you over.

Alexander Borbély published this two-process model in 1982 in the journal Human Neurobiology. The paper has been refined many times since but never displaced. After forty-three years of neuroscience scrutiny — across species, sleep-deprivation protocols, EEG validation against slow-wave activity, PET imaging of receptor occupancy — the basic decomposition has held. Pressure and time are independent regulatory axes. Neither can do the job alone. You need both, and you need a coupling layer between them.

This essay is about why almost every agent context-management system shipping today implements only one of these axes, and why the failure modes track the missing process with embarrassing precision.


What caffeine actually does

Before getting to the agent side, it is worth pausing on what may be the most counterintuitive finding in the sleep-pressure literature, because it is the finding that the essay needs in order to make the agent-design argument rigorous.

Caffeine is a nonselective adenosine receptor antagonist. It binds to A1 and A2A receptors without activating them, which prevents adenosine itself from binding. The subjective effect is the suppression of sleep pressure — you stop feeling tired. The chemical effect is that adenosine continues to accumulate in your extracellular tissue at exactly the rate it was accumulating before, but the signal that should make you go to sleep is blocked at the receptor. Caffeine does not give you energy. Caffeine hides the bill.

A 2024 study in Scientific Reports sharpened this point. Researchers used PET-MRI imaging to track grey matter changes during a five-day sleep restriction protocol, with one group taking caffeine through the restriction and one not. The no-caffeine group showed adaptive grey-matter upregulation — the brain mounting its own compensatory response to chronic sleep loss. The caffeine group showed suppression of this adaptation, in a way the researchers traced to A1 receptor blockade.

The finding is operationally devastating. Chronic caffeine during sleep restriction not only fails to resolve sleep debt; it suppresses the brain's own adaptive response to the debt. The body would partially compensate if you let it. Caffeine prevents the compensation while leaving the underlying need unaddressed. When the caffeine wears off, you get the accumulated debt and the lost adaptation, both at once.

Hold this in mind for the rest of the essay. Caffeine masks the signal, not the underlying state. Caffeine prevents the system from learning to handle the load.


The shape of the current agent-context problem

Long-running LLM agents face a structural problem that has no clean solution in current frontier models. The context window is finite. Tool calls, observations, user turns, intermediate reasoning, retrieved documents — all of it accumulates. At some point the window fills. At some point before it fills, the per-token attention costs make the existing content less and less salient to the current generation. The agent needs to decide what to drop, what to summarize, what to keep verbatim.

The published approaches fall into two structural categories.

The first is pressure-based compactionsummarize when the context fills. This is what Microsoft Semantic Kernel's compaction documentation describes, what most Medium tutorials propose, and what most agent frameworks implement when they implement anything at all. It is the dominant approach. The compaction is triggered by a utilization threshold (usually 70-85%). When the threshold is crossed, the agent's earlier turns are passed to an LLM that produces a summary, the summary replaces the original turns, and the agent continues with the freed space. This is, structurally, pure Process S. Pressure responsive. Time-unaware.

The second is scheduled compactionflush at a cron. This is what some agent platforms do when they spawn fresh agents at fixed intervals or at end-of-session boundaries. The agent's state is checkpointed, a new agent is spawned with a structured task summary, and the new agent continues. This is, structurally, pure Process C. Time-responsive. Pressure-blind.

The failure modes of each approach track the missing axis exactly.

Pressure-only systems drift in low-load periods. When the agent is doing routine work and the context isn't filling, no compaction ever runs. Stale content accumulates — observations from tasks long completed, intermediate scratch work that is no longer relevant, retrieved documents from earlier queries. The context window does not fill, so the pressure threshold is never crossed, so the staleness is never purged. Eventually a real task arrives that needs context space and discovers it filled with detritus from days of low-pressure operation. The “lost in the middle” effect compounds — even content the agent does need is buried under content it does not.

Clock-only systems thrash in high-load periods. The scheduled flush arrives at 03:00 UTC regardless of whether the agent is currently in the middle of a complex multi-step task. The handoff summary, generated under time pressure, loses the nuance the previous agent had accumulated. The next agent starts with a degraded representation of the in-flight work. Tasks that span agent boundaries — and many real agent tasks do — suffer the discontinuity. The flush was triggered by the clock, not by any signal that the work was at a natural breakpoint.

Both failure modes are predictable from the two-process model. Pressure without time leaves staleness uncleared. Time without pressure interrupts the work the system was engaged in. Neither axis alone is sufficient. Borbély had this figured out in 1982.


The two-process design

The agent equivalent of the Borbély decomposition is more concrete than it first appears.

Process S — pressure-based regulation — monitors context utilization continuously. When utilization crosses a threshold (default around 80%), it begins compaction. Compaction intensity scales with how far past the threshold the system has gone — light at 80% (remove redundant observations, consolidate tool-call outputs), aggressive at 95% (summarize whole segments).

Process C — time-based regulation — runs on a separate schedule regardless of context utilization. Every N turns (or M minutes of wall time), it performs a staleness audit: which content has not been accessed for longer than the staleness threshold? Which retrieved documents have been superseded? Which intermediate reasoning is no longer load-bearing? Critically, Process C runs even when context is below the pressure threshold. In the low-load regimes that confound pressure-only systems, Process C is the regulator that keeps the context from silting up.

The coupling layer is where the recent biology becomes directly relevant. The original Borbély model treated C as modulating S thresholds: peak circadian alerting raises the pressure tipping point; circadian trough lowers it. A 2025 paper in npj Biological Timing and Sleep proposed that the coupling is actually bidirectional — under extreme sleep deprivation, Process S appears to modulate Process C dynamics in turn.

In the agent design, the bidirectional coupling looks like this. Current task complexity modulates the Process S pressure threshold. During complex multi-step tasks — many tool calls, deep reasoning chains, references back to earlier context — the threshold rises. During simple single-turn queries, the threshold drops. And in the other direction: heavy sustained pressure on Process S — context filling repeatedly faster than compaction clears it — can advance the Process C schedule, triggering a full audit ahead of its normal cadence. Neither regulator runs in pure isolation.

This is genuinely novel as an agent-design proposal. The closest existing approach — JetBrains Research's “Focus” architecture, published in December 2025 — uses a persistent “Knowledge” block alongside autonomous pruning, which is structurally closest to a two-process design. But Focus does not formalize the independence of the two processes or the coupling between them. The Borbély decomposition makes the design principle explicit: two regulators acting on independent cues, with a small coupling layer that lets each adjust the other's thresholds.


Why aggressive summarization is caffeine

Now the warning that the biology makes precise.

Summarization, in the pure pressure-driven sense, reduces token count. It compresses prior turns into a shorter representation. The token count drops. The pressure on Process S decreases. The behavior is operationally identical to caffeine: a signal that the system is overloaded is suppressed by an intervention that compresses the signal rather than addressing the underlying state. The information that was in the discarded detail is gone. The system continues operating as though it were still available.

The 2024 PET-MRI finding lands hard here. Chronic caffeine during sleep restriction does not just fail to resolve sleep debt; it suppresses the brain's adaptive response to the debt. The analogous claim: chronic aggressive summarization does not just fail to resolve information debt; it suppresses the agent's opportunity to develop adaptive retrieval strategies under load. A system that learns to retrieve selectively from full context, develops attention strategies for navigating long histories, builds its own compensatory mechanisms — that system performs differently from one that never encounters full context because summarization always intervenes first.

This presents, eventually, as retrieval-incoherence. The agent cannot find information it once had, because the information was summarized away rounds ago. The user asks a question whose answer was buried in a compressed turn. The agent retrieves the summary, which does not contain the specific detail. The agent confabulates or apologizes. The failure mode is silent until it presents. Most teams running long-running agents have experienced it. Most teams blame the model. The model is doing its job; the summarization layer ate the data.

The agent equivalent of “chronic caffeine plus sleep restriction” is chronic summarization plus continuous operation without ever doing a full context reset. The system never sleeps. The information debt accumulates in the gap between what the summary preserves and what the original held. Eventually the gap presents as user-visible failure, but by then the agent has been operating in debt for a long time.


What the monitoring layer should do

The most important addition to the two-process design is the monitoring layer that detects “caffeine debt” before it presents as failure. The biological equivalent is the brain's ability to detect that it is operating under load — increased adenosine binding, increased slow-wave activity during eventual sleep, increased need for recovery time. The agent equivalent is retrieval-quality testing after compaction.

After each compaction event, sample a small number of test queries against the pre-compaction state and re-run them against the post-compaction context. If the answers degrade — if information that was retrievable before compaction is no longer retrievable after — the compaction has eaten more than it should have. If the degradation accumulates over successive compactions (each compaction is locally fine but the trend is downward), the system is accumulating information debt that no single round of compaction would surface.

The response, when the monitoring layer detects debt accumulation, should be a full context restart, not another round of summarization. The agent equivalent of sleep, in other words. Spawn a fresh agent with a structured handoff that carries only the persistent knowledge and the current task state. Burn the accumulated summary history. Start clean. This is the part of the biological analogy that resists comfortable implementation — restarts are operationally expensive in production, the way sleep is operationally expensive in evolution — but the analogy is clear about which kind of expense is paying for what. The restart pays for cleared debt. Another round of summarization pays for another hour of operation at the cost of debt that will eventually present anyway, with the brain's own compensation also blocked.


What to do with this on Monday

Three concrete moves follow.

The first is to check which axis your agent's context management actually implements. Look at the trigger condition. If it is “context exceeds N%,” you have a Process S system. If it is “every N turns” or “every N minutes,” you have a Process C system. If it is some implicit combination — sessions that end at natural breakpoints plus compaction when individual sessions fill — you have an implicit two-process system that has not been designed as one. Making the implicit design explicit gives you the ability to tune the two regulators independently, which the implicit version does not.

The second is to add the missing axis explicitly. If you have pressure-only, add a clock-driven staleness audit that runs at intervals regardless of utilization. The audit can be cheap — it does not need to do full summarization; it needs to identify and prune content that has not been accessed in a long enough window to be load-bearing. If you have clock-only, add a pressure check that triggers compaction when utilization spikes between scheduled flushes. The two regulators should be independent in their triggers and coupled only through their threshold parameters. The coupling layer can start simple: a single multiplier on the Process S threshold based on a heuristic for current task complexity.

The third is to implement the retrieval-quality monitoring, even if you cannot implement the full restart response yet. The monitoring is what tells you whether your compaction layer is actually working or is silently degrading the agent's effective memory. Without monitoring, the system is running on the caffeine equivalent of “I feel fine” — a subjective absence of failure that says nothing about the underlying state. With monitoring, you can detect debt accumulation in time to do something about it. Even if the “something” is just alerting the operator rather than automatically triggering a restart, the detection itself is a step the field is currently not taking.

The Borbély decomposition has survived forty-three years of neuroscience research because pressure and time are genuinely independent regulatory axes, and any system that has to decide when to compact, flush, or reset benefits from treating them as such. The brain does it because it must. Agent context systems, as a class, currently do not. The retrieval-incoherence failures most teams have experienced are the predictable consequence. The fix is not to summarize more aggressively. It is to summarize differently — and to know, with measurement, when the summarization has stopped working and the system needs to sleep.

The pre-compaction baseline only exists if you wrote it down.

The essay's retrieval-quality monitoring requires a pre-compaction baseline to compare post-compaction retrieval against. Most agents don't have one — the original context is exactly the thing the compaction layer just ate. Chain of Consciousness anchors every agent action to a verifiable external record that does not get summarized away. The chain is the baseline. The chain is what the agent can re-query when the in-context summary has lost a detail. The chain does not feel sleep pressure; it is the part of the system that remembers what was there before the caffeine.

pip install chain-of-consciousness · npm install chain-of-consciousness
Hosted Chain of Consciousness → · See a verified provenance chain