What 1958 philosophy teaches about why 80% of features go unused.
Every Monday morning, somewhere in a software company, a product manager stands at a whiteboard and writes four words: We should build SSO.
She has data. Sixty-two percent of enterprise prospects asked for it this quarter. Three lost deals cited it in their post-mortems. Sales has been nagging. The CEO has been asking. She walks the room through the slide, lands on a RICE score, and — because nobody objects — it goes on the roadmap.
Six months later, SSO is live. Eight percent of enterprise customers have turned it on. Support tickets haven't budged. Sales still can't close the two biggest deals, because it turns out those prospects also needed SCIM provisioning and audit logs. The feature works. Nobody uses it. It joins the backlog of other features that work and nobody uses.
Pendo runs the numbers every year. Its 2019 Feature Adoption Report found that 80% of features in B2B software are rarely or never used, and 12% drive 80% of daily usage (Pendo, 2019 Feature Adoption Report). Standish Group data (Jim Johnson, XP 2002 keynote, cited via Mountain Goat Software) puts the figure at 45% never used, 19% rarely. In large-scale A/B programs, R. Kohavi has reported that only about 1 in 8 tested ideas ships a material improvement to the key metric (Kohavi, KDD 2015 keynote; AB Tasty “1,000 Experiments Club” interview). And when the 1/3 rule shows up, one in three ideas is flat and one in three makes things actively worse.
This is not a prioritization-framework problem. Teams have never had more frameworks. RICE, ICE, Kano, MoSCoW, WSJF, Opportunity Scoring — there are books on each. The problem is older than frameworks and lives underneath them. It's an argumentation problem. Every feature proposal is an argument. Most of them are bad ones. The industry's 80%-unused backlog is what happens when bad arguments get funded at scale.
There is a vocabulary for this. It comes from 1958, from a philosopher nobody in product has heard of, and from a sub-discipline of software engineering that has been quietly using it for twenty years.
In 1958, S. Toulmin — a British philosopher at the University of Leeds — published The Uses of Argument with Cambridge University Press. He was trying to solve a different problem. Formal deductive logic, he argued, was a terrible model of how humans actually reason in real disputes. Real arguments happen in courtrooms, not syllogism exercises. A lawyer doesn't reason from major premise to minor premise to conclusion. She marshals facts, points to a rule, and braces for objections.
Toulmin's book decomposed practical argument into six components (Purdue OWL; Blinn College Writing Centers):
The first three are the minimum viable argument. The other three are what separate a strong argument from one that only sounds strong.
Philosophers hated the book. Rhetoricians, lawyers, and social workers picked it up and ran. Over the following decades, Toulmin's model became the dominant descriptive framework in rhetoric and legal reasoning. More interesting for our purposes, it quietly colonized pockets of software engineering. The Duke CS408 course on software architecture uses Toulmin structure to capture why design decisions were made, not just what was decided (Duke Computer Science, CS408 spring 2025 readings). C. Haley, R. Laney, J. Moffett, and B. Nuseibeh developed a method called “Structured Toulmin-Style Argumentation” to check whether stated security requirements actually satisfy security goals (Haley et al., Open University). Belief-desire-intention agent researchers now use Toulmin to structure machine deliberation (Panisson et al., “Reasoning in BDI Agents Using Toulmin's Argumentation Model,” Science of Computer Programming, 2019, doi:10.1016/j.scico.2019.102340).
The framework has been naturalized into software — except on the product side. That is the specific gap worth closing.
Return to Monday's whiteboard. We should build SSO.
That is the claim.
The grounds are the 62% request rate, the three lost deals, the sales team's noise. Good grounds. Real data.
The warrant is the sentence that would appear if anyone forced the PM to write it down: Enterprise feature requests are a reliable signal of enterprise revenue potential. This is the causal theory. It is never stated. It is never inspected. And it might be wrong for this company at this stage of its growth.
The backing would be whatever supports the warrant itself — prior cases in which enterprise feature requests predicted enterprise revenue. Was there a last-quarter feature that followed the same signal and paid off? No one asked.
The qualifier appears as the “Confidence” score in RICE. A number from one to ten. It looks rigorous. It is a guess.
The rebuttal is absent. There is no sentence in the proposal that reads: This feature will have failed if, within six months, SSO adoption among enterprise customers is below X percent. Without that sentence, the feature will ship, underperform, and stay in the product forever because nobody defined what failure looked like.
You have been writing Toulmin arguments your whole career. The Monday whiteboard is where the warrant usually dies, silent and unexamined.
On Toulmin's account, an argument is only as strong as its weakest warrant. In practical disputes, he observed, the real disagreement is almost never about the data. Two teams looking at the same usage telemetry can reach opposite conclusions because they hold different warrants. The facts are shared. The causal theories are not.
Product management is a machine for producing hidden warrants. A short list of common ones, each plausible and frequently wrong:
Each of these warrants feels like common sense. That is what makes them dangerous. Common sense is a warrant that has been invisible for so long that people mistake it for the floor.
The canonical counter-case is older than any of them. H. Ford's apocryphal line — if I had asked customers what they wanted, they would have said a faster horse — is really a complaint about a warrant. “Users describe solutions” is the warrant; “users describe problems and solutions interchangeably” is the reality. The jobs-to-be-done framework (formalized by A. Ulwick in the early 1990s; popularized by C. Christensen) is, fundamentally, an attack on this single warrant, and an attempt to replace it with a more defensible one.
Kohavi's 1/3 rule is the empirical rebuttal to most product warrants. When warrants are actually tested — by running the proposal as a controlled experiment — roughly one in three ideas is positive, one in three is flat, and one in three is negative (Kohavi, KDD 2015 keynote). The warrant was wrong, or the data was misread, or the causal theory didn't generalize, two times out of three. That is the argumentation tax the industry pays for not noticing that proposals are arguments.
Every prioritization framework in popular use is a partial Toulmin structure. Some are more partial than others.
RICE (Intercom) scores Reach × Impact × Confidence ÷ Effort. Reach and Impact are grounds. Confidence is a qualifier. The warrant — that Reach × Impact ÷ Effort predicts actual outcome — is assumed, never stated, never backed.
ICE (S. Ellis) scores Impact × Confidence × Ease. Same shape. Same gap.
MoSCoW (Must, Should, Could, Won't) skips grounds entirely. It is pure classification — a claim in four intensities.
WSJF (Weighted Shortest Job First, from SAFe) ranks by Cost of Delay divided by Duration. The implied warrant — time-to-market dominates other value considerations — is almost never examined in the settings where it is applied.
Kano (N. Kano, 1984) classifies features as must-be, performance, delighters, indifferent, or reverse. The classification itself is the warrant — different satisfaction mechanisms operate for different feature types — and Kano's questionnaire is the backing. This is the rare framework that makes the warrant visible and testable.
Opportunity Scoring (A. Ulwick, What Customers Want, 2005) scores Importance + (Importance − Satisfaction). The warrant — underserved outcomes are where value lives — is explicit and comes with its own theoretical scaffolding from jobs-to-be-done.
Notice the pattern. Frameworks that make warrants visible (Kano, Opportunity Scoring) are tools of discovery — they help teams decide what belongs on the backlog at all. Frameworks that hide warrants (RICE, ICE, MoSCoW, WSJF) are tools of ranking — they order what's already there. A product culture that relies on RICE alone is stacking ranked claims on top of unexamined warrants. The backlog gets longer. The hit rate doesn't move.
This is not an indictment of RICE. RICE is a useful ranker. It is a diagnosis of what's structurally missing: the slot where the warrant should be written down.
In 2007, A. Kaushik coined the term HiPPO — the Highest Paid Person's Opinion — in Web Analytics: An Hour a Day. It is the most-cited pathology in product decision-making, the one every PM complains about, and the one nobody seems able to beat.
Seen through Toulmin, the HiPPO problem has a specific shape. The HiPPO rarely brings grounds. She brings a claim (we should build X) and a silent warrant (I have taste). The warrant is unchallengeable because it is never stated. Nobody in the room can argue against a sentence that hasn't been spoken.
Research out of Rotterdam School of Management — B. Szatmari's work on status and project performance at Erasmus University, summarized in RSM's “Why you need to question your hippo boss” and discussed in industry writeups (t2informatik; LeadershipHQ) — finds that projects led by middle-ranking managers frequently outperform HiPPO-led projects. The mechanism is not mysterious: HiPPO-led teams become less critical, less willing to name the warrant, and the argument loses its rebuttal structure before it has even formed.
The operational fix is not to oppose the HiPPO. It is to require a warrant section in every feature proposal, regardless of author seniority. Given these grounds, the claim follows because: [one sentence]. The HiPPO now has to write her warrant down. She can still win — she often should win — but she now wins in a form where her reasoning can be cross-examined, and where a data-backed junior warrant has a structural chance against a vibes-backed senior one. Forcing the warrant out of someone's head is what democratizes the argument.
The adjacent fix is the rebuttal tax. Every proposal includes a sentence of the form: This feature will have failed if X by Y. Feature flags and staged rollouts make this cheap; what is scarce is the cultural expectation that the sentence exists. E. Ries made the same argument in The Lean Startup (2011), but attached it to build-measure-learn rather than to the proposal itself. Toulmin's contribution is to put the falsifiability clause inside the argument, before it ships, rather than in the instrumentation afterward. That is where the rebuttal earns its keep — it changes which proposals get written, not just how they get measured.
A Toulmin-literate product team does four things differently, none of which require a new tool.
Every feature proposal has a warrant section. One sentence. These grounds support this claim because [causal theory]. If the writer cannot produce that sentence, the proposal is not ready for review. The warrant is made visible so it can be contested.
Every feature proposal has a rebuttal section. One sentence. This feature will have failed if [measurable condition] by [date]. The rebuttal is not a prediction of failure. It is a commitment to notice failure when it happens. Feature flags make it operational; the sentence makes it durable.
Reviewers are trained to attack the warrant, not the data. Data disputes are usually resolvable and usually irrelevant. Warrant disputes are the real argument, and the place where most proposals should die.
Seniority is warrant-transparent. The same template applies to the HiPPO and the intern. This is the smallest cultural change that produces the largest argumentation gain, because it surfaces the assumption that has always been doing the real deciding.
A small improvement in warrant quality compounds hard against Pendo's 80% baseline. You do not need most proposals to be right. You need a smaller fraction of wrong proposals to survive contact with a named warrant and an explicit rebuttal. That is achievable with a sentence each.
Return to the whiteboard one more time. We should build SSO.
This time the proposal has a warrant section. Enterprise feature requests predict enterprise revenue — stated out loud, for the first time. Somebody in the room has actually read the last four features that shipped in response to enterprise requests. Two moved the needle. Two didn't. The warrant is real but weak. The PM revises: Enterprise feature requests predict enterprise revenue when the requesting accounts are in active procurement cycles. The warrant is now narrower, more defensible, and tests differently against the backlog.
The rebuttal section reads: This feature will have failed if adoption among enterprise customers is below 30% six months after GA. Nobody had to guess what success looks like. Six months in, SSO is at 12%, the feature is deprecated, and the team ships the audit-log feature that the warrant should have pointed to all along.
Toulmin was watching lawyers argue about murder trials and social workers argue about case placements. Product management is the youngest white-collar profession, about forty years old. The oldest thing the older professions can teach it is how to build an argument that survives an adversary — because every feature proposal already is one. The question is whether you want to notice that before, or after, 80% of them fail.
Toulmin-shaped claims for the agent economy
The warrant problem generalizes. When an agent claims “I am trustworthy,” the grounds are benchmarks and reputation data. The warrant — past performance predicts future performance in this context — is usually silent. The Agent Rating Protocol forces the warrant out of hiding: every rating is a signed claim attached to grounds, anchored in a public chain, with an explicit rebuttal path if the claim fails on new evidence. The argumentation structure PMs need on Monday is what a functioning agent trust layer will need on day one.
See a live provenance chain · Verify an agent's rating · pip install agent-rating-protocol