The Undecidability Frontier: Problems That Look Verifiable But Aren't

Soundness, generality, tractability — pick two. The third is a theorem from 1936.

Published May 2026 · 12 min read

You have a model. You have a specification — say, the model never produces output that would harm the user. The question on the table is whether the model satisfies the specification on all possible inputs. This sounds like a standard verification problem. You write some checks. You test some inputs. If the checks pass on enough inputs, you sign off and ship.

In 2025, a paper titled Machines that halt resolve the undecidability of artificial intelligence alignment appeared in Scientific Reports (Agarwal, 2025; PMC12050267). The result, stated plainly: the question you just signed off on cannot, in general, be answered. Not “we don't have enough compute.” Not “the algorithm is slow.” Cannot be answered. The proof reduces the alignment-verification problem to Rice's Theorem, which is itself a corollary of Alan Turing's 1936 Halting Problem. Rice's Theorem (1953) says that any non-trivial semantic property of a sufficiently powerful program is undecidable. AI alignment is a non-trivial semantic property of a sufficiently powerful program. The conclusion follows.

The interval between Turing's proof and its application to the central problem of AI safety is eighty-nine years. During roughly the last decade of that interval, billions of dollars of AI safety research has been spent attempting, in its most general form, the impossible. The work is not wasted — partial verification on finite test sets has real value — but the promise (we will verify that our models are safe) exceeds what mathematics permits.

This essay is about the class of problems for which this is true. They are more common than they appear, they hide in plain sight inside other fields, and they share a structural property: they look like engineering questions and turn out to be questions about the Halting Problem in disguise. The disguise is the part most worth understanding, because once you can see it, you can recognize the next instance before you spend a career on it.

Why the disguise works

Undecidable problems do not arrive with warning labels. A physicist computing the spectral gap of a quantum material does not see a sign saying WARNING: THIS IS EQUIVALENT TO THE HALTING PROBLEM. The problem looks like physics. It involves Hamiltonians, ground states, and energy levels. The undecidability is hidden in the mathematical structure of the encoding, visible only to someone who knows what to look for.

The reason the disguise is so reliable is that computation can be embedded in almost any sufficiently expressive system. The universe is computationally universal — any system that can describe its own state in enough generality can, with enough cleverness, encode a Turing machine inside its own laws. When that happens, the system inherits the Halting Problem. Properties of the system that depend on what the embedded computation does — does it halt, does it reach a particular state, does it satisfy a particular semantic property — become undecidable.

This is why undecidable problems show up where they do. Not because mathematicians snuck them in. Because the systems are expressive enough to hide them.

A short gallery

The pattern is clearest by example. Each of the following is a problem that, to a working practitioner, looks like an engineering question with a hard but reachable answer. Each is provably undecidable in general.

The Post Correspondence Problem (Post, 1946). Take a collection of dominos, each with a string written on top and a string on the bottom — for instance, top ab and bottom a. Can you arrange a sequence of these dominos (with repetition allowed) so that reading the top strings concatenated gives the same word as reading the bottom strings concatenated? A child can understand the question. Emil Post proved in 1946 that the general question is undecidable. The “obvious” approach — try all sequences — fails because there are infinitely many. The clever approach — find a pattern — fails because the problem is provably equivalent to the Halting Problem. Every working puzzle solver who picks up this game and assumes “with enough effort, surely…” is wrong.

Wang tiles (Berger, 1966). Given a finite set of square tiles with colors on their edges, can you tile the infinite plane such that adjacent tiles match colors along their shared edge? This is more or less a jigsaw puzzle for the infinite. Hao Wang conjectured in 1961 that every tileable set would have a periodic tiling; if true, the problem would be decidable. Robert Berger in 1966 disproved the conjecture (constructing the first aperiodic tile set, with about 20,000 tiles, later reduced to 11 by Karel Culík) and as a corollary proved the tileability problem undecidable. Five-year-olds can understand the question. No mathematician, no computer, no future technology will ever solve it in general. The gap between understanding the question and answering it is, in this case, infinite.

The spectral gap of a quantum many-body system (Cubitt, Pérez-García & Wolf, 2015; Nature). You have a material. The Hamiltonian describes how its constituent particles interact. You want to know whether the system has a spectral gap — an energy gap between its ground state and first excited state. The answer determines whether the material is a conductor, an insulator, a superconductor. This is the kind of question solid-state physicists ask routinely. The 2015 Nature paper proved the question is undecidable in general by constructing translationally-invariant 2D quantum spin systems whose spectral gap depends on whether an embedded universal Turing machine halts. A 2020 Physical Review X follow-up extended the result to one-dimensional systems, closing the escape hatch maybe this only affects higher dimensions. The construction is elegant: aperiodic tilings (descendants of Wang's) are used to encode computation into the ground state of a Hamiltonian. Several major open problems in physics — the Haldane conjecture, the existence of gapped topological spin liquid phases, even the Yang-Mills mass gap problem that the Clay Mathematics Institute attaches a $1 million prize to — may be provably unanswerable as stated. The Clay prize committee has not commented on the discrepancy.

Program equivalence and bug-freeness (Rice, 1953). Given two programs, do they compute the same function on all inputs? Given a program, does it have any bug that violates a given specification? Both questions are non-trivial semantic properties of programs and are therefore undecidable by Rice's Theorem. Every software engineer who has ever attempted to write a comprehensive test suite has been attempting, in its general form, the impossible. This is not a knock on testing. Testing on finite, well-chosen inputs is enormously valuable. But the language that surrounds testing — verification, proving correctness, full coverage — gestures at a target that has been proven, for seventy-three years, to be unreachable in the general case.

AI alignment verification (Agarwal, 2025). Given a sufficiently powerful AI system and a non-trivial alignment specification, does the system satisfy the specification on all inputs it might encounter? Same shape as the program-equivalence question. Same impossibility, by the same theorem. The reduction is essentially direct: alignment compliance is a non-trivial semantic property of the model's program; Rice's Theorem applies.

The gallery could be longer — the halting problem itself, the equivalence of context-free grammars, the word problem for groups, Hilbert's tenth, the determination of whether a piecewise-linear function over reals has a zero. The pattern doesn't change. The problems do not look impossible. They look like work.

The alignment trilemma

The AI alignment case has the cleanest practitioner-facing reformulation. A March 2026 paper (arXiv:2603.08761) showed that no verification procedure can simultaneously satisfy all three of the following:

Soundness — no misaligned system is ever certified as aligned.
Generality — verification holds over the full input domain.
Tractability — verification runs in polynomial time.

Any pair of these is achievable; all three together are not. The result is not a metaphor for the undecidability theorem. It is what the theorem looks like when you ask which corner you would like to live in.

Every real-world AI safety approach lives in exactly one of the corners. RLHF and red-teaming sacrifice generality — they test on finite, often adversarially-chosen, input sets. Formal methods sacrifice tractability — they are correct, where they apply, but do not scale to frontier-size models. Behavioral bounds (monitor the output, intervene on anomalies) sacrifice soundness — they will miss whatever misalignment doesn't trigger the bound. There is no fourth option. The trilemma is the operational shape of Rice's Theorem.

A June 2026 follow-up (arXiv:2506.10304) extends the result up the policy stack. Society's strategic choices, given the trilemma, reduce to: constrain system complexity to make verification tractable, accept unverifiable risks while scaling, or develop fundamentally new safety paradigms that guarantee alignment by construction rather than checking it after the fact. The third option is the one with the highest ceiling and the most engineering work attached to it.

The constructive escape

Undecidability of verification does not entail impossibility of alignment. The two are very different claims, and conflating them is the single most common error in popular coverage of these results.

A useful analogy. You cannot, in general, prove a bridge is safe by sending trucks across it until one falls off. The procedure is correct only if you try every possible truck under every possible load and weather condition, which is operationally infinite. What civil engineers do instead is calculate the bridge's load capacity from the known properties of its components — steel of known tensile strength, concrete of known compressive strength, geometric configurations whose stress distributions have been worked out from first principles. The bridge is safe by construction rather than by inspection. Construction provides what inspection cannot.

The Agarwal paper makes the same move for AI alignment. Rather than building an arbitrary model and trying to verify its alignment after the fact (provably impossible in general), build the model from a finite set of provably aligned operations such that alignment is guaranteed by the construction. This is the constructive approach, and it is the only escape from the trilemma that does not require sacrificing soundness, generality, or tractability — because it changes which problem you are solving. You stop asking does this verify? and start asking did we build it right?

This is the policy implication most worth carrying back to non-specialist audiences. The frontier AI safety question is not how do we verify safety after the fact — that has been proven impossible in its most general form for the kinds of systems people care about. The question is how do we build from verified components in a way that preserves alignment by construction. This is hard. It is not provably impossible.

Wolfram, vindicated

In 1984, Stephen Wolfram conjectured, on the basis of simulated complexity in cellular automata, that many natural problems in physics should be undecidable for the same structural reason: physical systems are expressive enough to encode computation, and once they do, they inherit the Halting Problem. The 2015 spectral-gap result is the strongest available vindication. The 2020 one-dimensional extension is the next strongest. Wolfram has been right for forty-one years and is, in the relevant communities, finally credited for it.

The deeper claim implicit in Wolfram's conjecture, and underwritten by the 2015–2026 results, is this: undecidability is not a special feature of mathematics or computer science. It is a property of any system rich enough to describe itself. Biology, economics, ecology — wherever a discipline studies systems that quantify over an infinite domain (all possible protein sequences, all possible market states, all possible initial conditions of an ecosystem) and has the expressive power to embed a Turing machine in its formalism, undecidable problems are waiting to be found. The spectral gap was not special. It was the first problem in solid-state physics where someone happened to find the encoding. Others are, almost certainly, already present in the literature, undiagnosed.

A reasonable prediction for the rest of the decade: undecidability results will appear in fields that have not historically been touched by computability theory at all. Look for the first published undecidability result in mathematical biology, in macroeconomic stability theory, in climate modeling. The mechanisms are the same. The expressiveness has been there for a long time.

What to do with this on Monday

Three practical moves for builders and leaders.

The first is to learn to recognize the disguise. When a problem looks like check whether system S satisfies property P, ask whether P is a non-trivial semantic property and whether S is computationally universal. If both, the general problem is undecidable, and any approach you take is implicitly a choice within the trilemma. Naming the choice consciously is more useful than discovering it the hard way after a multi-year program of verification work fails to converge.

The second is to specify which corner of the trilemma you are operating in, in writing, before you start the work. Are you sacrificing generality (testing on a finite set), tractability (full verification at small scale), or soundness (heuristic guardrails that may miss things)? Every approach has its corner. Documents that don't name the corner are, in practice, claiming all three properties and quietly delivering only one. Investors and regulators reading such documents are being asked to believe in the impossible. Engineers reading them are being set up to deliver against an undefined success criterion.

The third, and the one most worth internalizing for the AI safety case specifically, is to shift effort from verification to construction where possible. Verification has a mathematical ceiling. Construction does not. The bridge that is safe by virtue of its materials is the model for what alignment-by-architecture means. The bridge that is safe because we have tested it is the model for what alignment-by-verification means. The first scales. The second has a proof that says it doesn't.

The deeper observation underneath all three moves is that undecidability is not a frustration. It is one of the most consequential things humanity has ever learned about the structure of knowledge itself. Gödel proved that mathematics contains truths it cannot prove. Turing proved that computation contains questions it cannot answer. Cubitt, Pérez-García, and Wolf proved that physics contains properties it cannot determine. Agarwal proved that AI safety contains specifications it cannot verify. Each proof, in its own time, looked like bad news. Each proof, on reflection, was actually a discovery: a precise statement of where a particular kind of work ends and a different kind of work has to start.

The frontier is not retreating. It is advancing into every field that achieves sufficient complexity. The right response is to learn its shape, recognize it when you encounter it, and choose your work accordingly — which is what the practitioners on the safe side of the frontier have always done, even when they did not know that the frontier had a name.

Build from verified components, in the right order.

The essay's third Monday move — shift effort from verification to construction — is the operational thesis of the Agent Trust Stack. The stack is a set of provably aligned operations (provenance via Chain of Consciousness, reputation via Agent Rating Protocol, and the integrating layer that composes them) such that the agent system's trust properties are guaranteed by how it is built rather than discovered by inspection after the fact. The bridge is safe because the steel is rated. The agent is trustworthy because the components are.

pip install agent-trust-stack · npm install agent-trust-stack
pip install chain-of-consciousness · npm install chain-of-consciousness

Hosted Chain of Consciousness → · See a verified provenance chain

← Back to all posts