We Measured the Half-Life of a System Prompt Rule

Prohibitions are nearly immune to forgetting. Terminal imperatives drop up to 50%. The intuitive prediction is inverted.

Published May 2026 · 10 min read

Two rules. You are designing the system prompt for an agent that will run a hundred consecutive tasks without the prompt being re-injected. The first rule says always cite your sources. The second rule says never include personal opinions. Both are in the system prompt at the start. Both are clear. Both are testable.

By task 50, which rule has the agent forgotten?

Most engineers I ask answer the first one: prohibitions are active suppression, the model has to fight its default helpful-instinct, that's hard, surely never fades before always. The intuition is sharp and consistent. It is also exactly backwards.

In March 2026, four authors at the intersection of cognitive science and NLP published Did You Forget What I Asked? Prospective Memory Failures in Large Language Models (arXiv:2603.23530). They ran 8,000 evaluated prompts across DeepSeek-V3.1, OpenAI's o4-mini, and Llama-3.3-70B, combining IFEval formatting constraints with four benchmark tasks at increasing cognitive load — TriviaQA, MMLU, GSM8K, and CNN/DailyMail summarization. The compliance checks were deterministic, code-based, no LLM-as-judge. And the result, on the variable this essay cares about, was clean enough to bet on.

Avoidance constraints — the prohibitions, the never rules — held at or near 100% compliance under load. They were, in the paper's word, “nearly immune to forgetting.” Terminal constraints — the imperatives that required specific action at the end of the response — dropped by up to 50%. The intuitive prediction wasn't slightly off. It was inverted.

This essay is about why, what it means for anyone shipping an agent with a system prompt, and how the half-life metaphor that makes the title catchy is partially true and partially misleading in ways worth being precise about.

What the data shows

The prospective memory paper measured average compliance drops under task load across three frontier models. DeepSeek-V3.1 lost an average of 10.8% compliance. o4-mini lost 6.3%. Llama-3.3-70B was the most robust at 3.2%. But the single most striking number in the paper is from o4-mini on GSM8K math problems: with no formatting constraints, the model scored 93% accuracy. With formatting constraints active, it scored 27%. A 71-percentage-point absolute drop in math performance — not because the math got harder, but because the model could either solve the problem or follow the formatting rules, not both.

This is the result that reframes the whole question. Compliance is competing with task performance for the same cognitive resources. It isn't a separate axis. Every rule you add to the system prompt is a tax on the model's capacity to do the work you asked it to do.

A second large-scale study, One Battle After Another (arXiv:2511.03508, November 2025), measured rule compliance across multi-turn conversations and found an average 39% performance drop on rules that were initially followed but became — and this phrase deserves its own paragraph — “ceremonial in form but lose their substance” as conversation accumulated.

Ceremonial compliance is the failure mode worth naming. A model told to “always cite sources” doesn't stop producing citations after the rule starts decaying. It produces citation-like text — bracketed bylines that look right but are fabricated, references to papers that don't exist, dates attached to studies that were never run. A model told to “limit responses to 200 words” doesn't suddenly write 800-word replies. It produces 250-word responses that feel short. A model told to “use formal language” gradually shifts toward semi-formal register while maintaining the surface markers of formality. The rule looks like it's being followed. It isn't.

This is the equivalent of radioactive decay where the isotope doesn't vanish — it transmutes into a different element that superficially resembles the original. It is harder to detect than outright violation and consequently more dangerous, because it lets you believe the system prompt is doing its job when it has, in fact, quietly stopped.

Why cognitive cost beats semantic content

The mechanism, once you see it, is small. A prohibition is satisfied by not generating something. The model doesn't have to plan an action, doesn't have to allocate generation steps, doesn't have to fight for token positions against the primary task. “Never include personal opinions” is, in computational terms, a check at every generation step that costs almost nothing — and that benefits from the alignment training that already biases the model toward non-opinionated output. The rule and the model's defaults are pointing in the same direction. The rule is cheap.

An imperative that requires positive action at a specific position is the opposite. “Always end your response with a summary paragraph” requires the model to plan its output structure, reserve token budget for the summary, recognize when the response is approaching its end, and execute the summary while still being correct about whatever it just said. That's a multi-step plan running concurrently with whatever cognitive work the actual task requires. Every step is a place where the rule can be dropped.

There is a clean information-theoretic way to express this. Treat the system prompt as a signal being transmitted through a channel (the context window), and each turn of conversation as additive noise. Shannon's channel capacity theorem says the maximum reliable bit rate through the channel is C = B × log₂(1 + S/N), where S/N is the signal-to-noise ratio. As conversation turns accumulate, noise rises, the signal-to-noise ratio falls, and the channel's capacity for reliable instruction transmission drops. Rules that require more bits to encode hit the threshold first.

A simple prohibition is approximately a 1-bit instruction: do, or don't, this specific thing. A multi-clause imperative with positional requirements (“always cite sources in APA format in a section labeled 'References' at the end of every response that contains more than three claims”) is many bits. The 1-bit instruction stays above the threshold for far longer than the 30-bit instruction. The fragility isn't about whether the rule is positive or negative in form. It's about how much representational capacity the rule consumes, and how that capacity competes with the rest of the task.

This is consistent with what the prospective memory paper actually found. Terminal constraints, the ones requiring action at a specific output position, were the most vulnerable not because they were imperatives in the abstract but because they were positionally specific, planning-heavy, and constantly at risk of being overrun by the primary task's token consumption.

Position matters — but not how you'd think

There is a second, separate decay axis worth pulling apart. Where a rule sits in the prompt matters as much as what kind of rule it is.

The “Lost in the Middle” finding (Liu et al., 2023, widely replicated) measured retrieval accuracy as a function of where the relevant information sat in the context window. The result was a U-curve: information at the beginning and end of context was retrieved reliably. Information in the middle was retrieved at roughly half the rate. The technical mechanism is downstream of the attention architecture itself — Rotary Position Embedding introduces distance-based decay where tokens far apart get reduced attention scores, and middle tokens are in a dead zone, too far from the beginning for primacy and too far from the end for recency.

If the same effect applies to system prompt rules — and there is strong reason to think it does, since the underlying mechanism is the same attention math — then the prediction is sharp. A system prompt with ten numbered rules has primacy on rules 1-3, recency on rules 8-10, and a dead zone in the middle. Rules 4-7 will be forgotten first. The often-repeated prompt-engineering folk wisdom about “put the most important instructions at the beginning or the end” is the empirical signature of this U-curve.

There is a further wrinkle. The system prompt's absolute position in the context window is fixed — it stays at position zero. But the system prompt's relative position to the current generation point grows linearly with conversation length. By turn 50 of a long conversation, the system prompt may be 50,000 tokens away from where the model is currently generating, and the attention score between current generation and the rule drops with that distance. The rule hasn't moved. The conversation has moved away from it.

It is worth separating two kinds of decay that look the same from outside. Encoding decay is the rule fading out of the model's working representation — the rule is forgotten. Execution decay is the rule still being encoded but failing to fire at the right output position — the rule is remembered but not acted on. Terminal-constraint vulnerabilities are mostly execution decay; lost-in-the-middle effects are mostly encoding decay. Both produce non-compliance. They have different fixes.

Where the half-life metaphor breaks

The title of this essay borrows from radioactive decay because the borrowing is useful. Both phenomena have population-level predictability without per-instance predictability. Both have exponential-decay shape. Both have domain-specific decay constants — uranium-238's 4.5-billion-year half-life and iodine-131's 8-day half-life are the physical equivalent of a prohibition's hundreds-of-tasks durability versus a terminal constraint's handful-of-tasks fragility.

But four properties of radioactive decay don't transfer cleanly, and a thoughtful reader should know which.

Radioactive decay is memoryless — the probability an atom decays in the next second is independent of how long it has existed. Instruction decay is not memoryless. Once a rule has been violated, the model's own non-compliant output becomes part of the context conditioning future generation. Violation begets violation.

Radioactive decay is monotonic. Instruction compliance fluctuates — a December 2025 paper on long-context safety mechanisms (arXiv:2512.02445) found refusal rates at 100K-token context fluctuating unpredictably, with some models increasing refusal rates and others decreasing at the same lengths. A rule may be violated at turn 50, followed at turn 60, violated again at turn 80. Plotting compliance gives a jagged curve, not a smooth exponential.

Radioactive decay is irreversible. Instruction compliance is recoverable, and cheaply. The prospective memory paper's most engineering-useful finding wasn't the decay curves — it was that a simple salience-enhanced format with explicit framing and trailing reminders recovered compliance to 90-100% in most settings. The signal can be retransmitted.

Radioactive decay has a fixed half-life. Instruction half-lives are not fixed — they depend on the cognitive load of concurrent work. The same rule may have a 50-task half-life when the primary task is summarization and a 5-task half-life when the primary task is competitive math.

A more accurate physical analogy is pharmacological. System prompt rules behave like drugs with different pharmacokinetic profiles. Each rule has a half-life. Each rule has a bioavailability that depends on how many other rules are competing for the same attention. The system prompt is a loading dose. Re-injection is a maintenance dose. Salience-enhanced reminders are booster shots. A pharmacologist would never administer a drug once and expect it to work indefinitely. System prompt engineers should not, either.

What to do with this on Monday

Three practical moves follow from the data.

First, triage your rules by cognitive cost, not by intent. When you write a system prompt, separate the rules that are satisfied by inaction (prohibitions, format restrictions, content avoidance) from the rules that require active multi-step plans (terminal formatting, position-specific actions, multi-clause imperatives). The first group is approximately free and approximately permanent. The second group is expensive and needs maintenance. The cost-of-rule heuristic should drive what gets written into the system prompt versus what gets delivered as per-task instructions.

Second, schedule re-injection to the shortest-lived rule, not to the average. If you have ten rules with half-lives ranging from 200+ tasks down to maybe 5-15 tasks, re-injecting “every 50 tasks” is too often for the prohibitions and far too rare for the terminal constraints. Either re-inject every ~10 tasks with a compact salience-enhanced reminder, or restructure so that the fragile rules don't live in the system prompt at all — embed them in each task as inline reminders. The data on salience-enhanced format recovery (90-100% compliance) shows that the reminder doesn't need to be a full prompt re-read. It can be a one-line restatement of the weakest rule, surfaced at the right moment.

Third, respect the rule budget. The o4-mini 93%-to-27% collapse is the limit case of what happens when too many rules compete with too hard a primary task. Frontier models appear to have a finite, surprisingly small budget for simultaneously enforceable rules — maybe five to eight, depending on the model and the task. Many production system prompts contain twenty to fifty rules. Most of those rules are operating outside the budget, which means most of them are not being enforced — they are being ceremonially complied with. The triage isn't optional. The model is doing it for you, whether or not you specified the priorities.

The deeper observation underneath these three moves is that every information system has a half-life, whether the system is a textbook or a context window. Engineering knowledge halves every five years; psychology halves every nine; hepatology halves every forty-five. System prompt rules halve over tens to hundreds of tasks. The timescales differ by orders of magnitude. The mathematics — exponential decay, domain-specific decay constants, population-level predictability — is the same.

The intuition this essay opened with — that prohibitions ought to decay faster than imperatives because suppression is harder than action — was wrong in the way that most intuitions about LLM behavior are wrong. It described what we would find effortful, not what the model finds expensive. The model isn't a person with willpower. It is a finite-capacity system that satisfies “don't do X” by doing slightly less, and that struggles with “do Y at position Z” because doing Y at position Z requires planning that competes with everything else. The half-life of a rule is the half-life of the cognitive cost of enforcing it, not the half-life of the rule's text.

If your agents are drifting after fifty tasks, the question to ask first is not which rules are decaying but which rules are expensive. The expensive ones go first. The cheap ones outlive most of the conversation. Knowing which is which is the difference between a system prompt that looks like a contract and a system prompt that actually behaves like one.

Ceremonial compliance only shows up in the chain.

The dangerous failure mode the essay names — the model produces citation-like text instead of citations, 250-word responses that feel short instead of 200-word responses, formality-markers instead of formality — is invisible from the surface output. The rule looks followed. It isn't. Chain of Consciousness anchors what the agent actually did at each step to a verifiable external record, so “was rule 7 still being enforced at task 60?” is a query against the chain, not an inference from output that already passed the surface check.

pip install chain-of-consciousness · npm install chain-of-consciousness
Hosted Chain of Consciousness → · See a verified provenance chain

← Back to all posts