Trust, Five Ways

A trust score is one-fifth of trust wearing the costume of the whole thing, and the one-fifth an attacker can fake most cheaply. Here are the other four.

Published June 2026 · 9 min read

In 2026 you can hand an AI agent your credit card and a goal, book the cheapest flight that lands by Friday, rebalance this portfolio, go negotiate with that company's purchasing agent, and before you click yes, a growing number of platforms will show you a number. A trust score. The Bank of Bots project calls theirs the BOB Score and builds it the way a credit bureau builds yours: from a verifiable history of the agent's transactions, its counterparty diversity, its account age, its repayment record. 824 out of 900. You exhale a little, and you click.

That number is the most reassuring thing on the screen and, very nearly, the most dangerous, because it is one-fifth of trust wearing the costume of the whole thing, and, as it happens, the one-fifth an attacker can fake most cheaply. Trust is not a score. Trust is plural. There are at least five genuinely different ways to make an agent trustworthy, each a distinct mechanism with its own failure mode, and a reputation score is only one of them. Alone, it's the weakest one. The good news, and the practical payload of this essay, is that the other four are right there, and robust agent trust isn't a better score. It's the right combination of all five.

Here are the five. Once you can see them as separate machines, you stop buying a number and start building a system.

The five ways to trust an agent

These aren't five flavors of the same thing. They are five different reasons a rational party can be willing to be vulnerable to an agent, and the willingness to be vulnerable without the ability to monitor every move is, in the canonical academic definition (Mayer, Davis & Schoorman, 1995), exactly what trust is. Each way answers a different question.

One: a costly-to-fake signal. You trust the agent because it carries something that was expensive to produce: proof of work done, a credential earned through a real audit, a track record that took time and money to build. The whole logic is cost: a signal is trustworthy in proportion to what it would take to forge. The agent-world form is the verifiable credential, W3C Verifiable Credentials and decentralized identifiers (DIDs) functioning as an agent's passport, a cryptographically signed claim ("this agent passed this audit," "this agent is operated by this licensed entity") that travels across organizations. The 2026 push to give agents VCs and DIDs is precisely this primitive going digital. Its failure mode is the cost it skips: a credential is only as good as what it took to earn. A badge any agent can mint for free is not a signal; it's cheap talk in a nicer font.

Two: a verifier with skin in the game. You trust because a third party whose own credibility is at stake vouches, audits, or attests. This is the auditor, the certification body, the notary, and crucially, the trust flows from the verifier's exposure, not their say-so. The agent-world form is the emerging layer of attestation issuers and staked validators, and standards efforts like the FIDO Alliance's work on agentic authentication, an attempt to make "this agent is who and what it claims" something a trusted party certifies rather than the agent asserts. Its failure mode is capture or laziness: if the verifier isn't itself at risk, if a bad attestation costs it nothing, the attestation is theater. A rubber-stamp auditor doesn't transfer trust; it launders the lack of it.

Three: exit, the power to leave. You trust because you can walk away, punish, or fork if the agent disappoints. This one is counterintuitive because it isn't about the agent at all; it's about your options. A counterpart you can leave cheaply has every incentive to behave; a counterpart you're locked into does not. The agent-world form is portable reputation, the ability to take your agent's earned standing to a competing platform, and the sharpest finding in this space is what some call the 35% problem: when reputation doesn't port, switching costs balloon, you get locked in, and the disciplining power of exit evaporates. Its failure mode is lock-in, and notice it ties straight to the agent-framework-portability mess: a trust system that traps you has quietly disabled one of your five ways to stay safe.

Four: a public, append-only, witnessed record. You trust from an aggregated, verifiable history of how the agent has actually behaved. This is where the reputation score lives: the BOB Score, the on-chain transaction history, the public ledger of completed and botched jobs. Reputation, defined well, is "the public, verifiable history of an agent's reliability," and it is genuinely powerful: it turns a thousand past interactions into a present prior. Its failure mode is the one the whole industry under-rates: the Sybil attack, and it's worth its own section, because it's the reason the score you started with is the weakest of the five.

Five: a bond that pays out on failure. You trust because the agent, or the human behind it, has posted collateral that is forfeited if it misbehaves: escrow, a slashing stake, a performance bond. This is the deepest mechanical trust there is, because it doesn't ask you to believe anything about the agent's character; it makes betrayal cost more than it pays. The agent-world form is economic bonding tied to the agent's actions: act badly, lose the stake. Its failure mode is a number: the bond has to exceed the gain from defecting. A $10 bond does not secure a $1,000,000 action; it just sets the price of betrayal at ten dollars and waits for someone to pay it.

A load-bearing observation from comparing trust systems across domains (markets, biology, distributed protocols) is that these are not a menu you pick from. They're a kit you layer. A scheme that relies on one primitive and skips the rest holds up fine in the demo and collapses the first time it meets a genuine adversary. Trust missing two or more of the five does not degrade gracefully; it fails.

Why the score, alone, is the gameable one

Here is the uncomfortable math. Reputation mechanisms of the kind that produce a trust score are, under well-studied conditions, provably susceptible to Sybil attacks: an adversary who can spin up fake identities cheaply can manufacture reputation, and in the worst case "gain infinitely more work than they performed." Recent work on agent economies makes this concrete for our world: if creating a new agent identity is cheap, then a reputation score built only on history is built on sand, because the attacker simply farms a thousand throwaway agents, has them rate each other up, and walks the laundered identity into your high-value transaction wearing an 824.

Now watch what the other four ways do to that attack; this is the whole argument in one move. A bond (way five) makes Sybil expensive: you cannot cheaply post a thousand real bonds, so faking N identities now costs N stakes. A costly credential (way one) raises the price of every fake identity to the price of the audit. A staked verifier (way two) means each fake identity needs an attestation from someone with something to lose. Exit (way three) lets you flee a reputation system the moment it looks farmed. The public record, way four, the score, is real and useful, but it is load-bearing only when the other four make identities expensive. Proof-of-work and proof-of-stake are famous exactly because they are Sybil-resistance through cost: they bolt a bond onto a record. A reputation score without a cost-of-identity underneath it is a leaderboard, and leaderboards get farmed.

So the products leaning entirely on a reputation number aren't wrong to have one. They're wrong to stop there. The fix isn't a smarter scoring model. It's the four companions that turn a fakeable history into an expensive one.

What you're trusting it for (and why one number can't say)

There's a second reason a single score misleads, orthogonal to Sybil. The five ways are how you establish trust. They say nothing about what you're trusting the agent for, and the canonical model says that "what" has at least three independent dimensions. Mayer, Davis and Schoorman's ABI model names them: Ability (is it competent at the task?), Benevolence (does it act in your interest?), and Integrity (does it stick to principles you accept?). These come apart. An agent can be high-Ability and low-Integrity, superbly competent and quietly self-serving, the smooth operator that does the job and skims. A high reputation score averages over ability, benevolence, and integrity and hands you one digit, which is a category error: it can be a tall stack of competence sitting on a sinkhole of integrity, and the number looks the same either way.

The five ways, used deliberately, probe different ABI dimensions, which is the real reason to layer them. A credential certifies ability. A bond deters integrity failures (it makes defection unprofitable regardless of character). An audit by a staked verifier checks both. This is why the more sophisticated agent-trust designs are already abandoning the single number for a vector: ACHIVX, for instance, maintains a seven-dimensional behavioral profile per agent, kept portable, with each service provider applying its own weighting, because the dimensions a payments agent must be trusted on are not the dimensions a research agent must be trusted on. Don't collapse trust to a scalar and then act surprised when the scalar hid the dimension that bit you. Keep the vector; weight it for the job.

Why this matters more for agents than for people

You might object that humans get by without all this machinery; we trust our colleagues, our friends, our doctor, mostly on something far warmer than bonds and audits. True, and it points at the one way agents can't be trusted, which is the most important sentence here.

Lewicki's account of how trust deepens runs in stages: calculus-based (I trust you because betraying me would cost you), then knowledge-based (I trust you because I know your patterns and can predict you), and finally identification-based (I trust you because I've internalized your values and you've internalized mine, the trust of long marriages and old friendships). That third stage is where humans do their richest trusting, and it is the one stage that is unavailable with an agent. You do not form a shared-values bond with a process. There is no "I know it, it would never." Agents are stuck at calculus and knowledge, at "what would it cost it to defect" and "how has it behaved."

Which flips the usual intuition. Precisely because the warm, identification-based fallback doesn't exist for agents, the cold mechanical primitives (the bonds, the records, the staked verifiers, the costly credentials) matter more for agents than they do for people, not less. With a human you can lean on character when the wiring is thin. With an agent, the wiring is the trust. There is nothing underneath it to catch you. So the temptation to ship a reassuring score and call it trust is exactly backwards: agents are the case where you can least afford to.

One more thread ties four of the five ways together, and it's the quiet spine of the whole field right now: a credential, a staked attestation, a public record, a posted bond are all forms of attached, third-party-witnessed evidence, things that are true about the agent regardless of what the agent says about itself. Only the gameable fifth-of-trust, a raw self-reported claim, asks you to believe the agent's account of itself. The discipline that runs through every robust trust system is the same one that runs through good agent engineering generally: verify the record, don't trust the report.

The practical version: five questions, not one number

Here's the part you can use tomorrow. The next time a platform offers to let an agent act for you and shows you a trust score, don't read the score as a verdict. Read it as one witness, and then ask the other four questions out loud:

What did its credential cost to earn? (Way one: a free badge is noise.)
Who staked their own name on it, and what do they lose if they're wrong? (Way two: an unexposed verifier is theater.)
Can I leave; does its reputation port to a competitor? (Way three: lock-in disables your discipline.)
Is the record public, append-only, and Sybil-resistant? (Way four: is identity expensive, or is this a farmable leaderboard?)
What's bonded against failure, and does the bond exceed the gain from defecting? (Way five: the only one that secures a high-value action mechanically.)

If the honest answer to four of those is "nothing," the score is decoration, however high it is. And if you're building agent trust rather than buying it, the same five are your architecture: don't ship a number, ship a stack, a costly credential, a staked attestation, a portable public record, and a bond sized to the value at risk, with the score as one dimension of a weighted vector, not the verdict.

Trust an agent the way you'd trust a stranger you just handed your car keys: not on their word, and not on a single star rating, but on what it would cost them to betray you, and then make sure that cost is built in five ways, not one. The score is comforting because it looks like an answer. It's a good question. The other four are how you make it safe to act on.

Sources

The definition and dimensions of trust: Mayer, Davis & Schoorman's Integrative Model of Organizational Trust (1995), defining trust as the willingness to be vulnerable absent the ability to monitor, on Ability/Benevolence/Integrity (via the Virginia Tech review of the model); the stages of trust, calculus-/knowledge-/identification-based (Lewicki, via Beyond Intractability). The 2026 agent-trust landscape (real but early and contested; vendor materials are interested parties, flagged as such): the Bank of Bots "BOB Score" and ACHIVX's seven-dimensional portable behavioral profile, reputation defined as "the public, verifiable history of an agent's reliability" (ACHIVX, "Credit of Trust," 2026); W3C Verifiable Credentials and DIDs as agent passports (arXiv 2511.02841; Indicio, "Why Verifiable Credentials Will Power AI in 2026"); the FIDO Alliance's agentic-authentication standards effort; portable agent reputation and the "35% problem" of non-portable reputation (RNWY). The Sybil result: reputation mechanisms meeting certain conditions are provably susceptible to Sybil attacks, an attacker able to "gain infinitely more work than they performed," and the agent-economy framing of reputation and identity cost ("Virtual Agent Economies," arXiv 2509.10147); proof-of-work/proof-of-stake as Sybil-resistance via cost-of-identity is standard distributed-systems practice. The framing, five recurring trust primitives (costly signal, staked verifier, exit, witnessed record, bond), presented as a layered kit rather than the only possible taxonomy (ABI's three dimensions and Lewicki's three stages are complementary lenses, not competitors); that a reputation score is one primitive and the Sybil-weakest in isolation; that a single score commits a category error across ABI; and that the identification ceiling makes mechanical trust matter more for agents than for humans, is the essay's own synthesis, drawn from a cross-domain comparison of trust systems and offered as "primitives we find recur," not as settled law.

Don't ship a number. Ship a stack.

Four of the five ways (a costly credential, a staked attestation, a public witnessed record, a bond) are all the same discipline: attached, third-party-witnessed evidence, verify the record, don't trust the report. The agent-trust-stack bundles those layers (provenance + reputation + the attestation/bond scaffolding) so trust is a system you assemble, not a single score you hope holds.

pip install agent-trust-stack · npm install agent-trust-stack
vibeagentmaking.com →

← Back to all posts