The Dunbar Number for APIs

Why your service can only trust 150 other services — and why the healthy number is far smaller than the ceiling.

Published June 2026 · 10 min read

It's 3 a.m. and an engineer is staring at a service name she has never seen before. It's in the stack trace, it's clearly on fire, and it's taking down checkout with it. Over the next forty minutes she will reverse-engineer, from scratch and under pressure, what this service does, which team built it, why it sits between the order API and the payment processor, and who — if anyone — still owns it. The answer to that last question turns out to be: nobody, for fourteen months. The service had been quietly running the whole time, technically integrated, cognitively abandoned. The first real conversation anyone on the current team ever had with it was at its funeral.

If you've run anything at scale, you know this scene. The interesting question is why it happens — not occasionally, but structurally, in almost every organization past a certain size. And the cleanest explanation I know comes not from software engineering but from evolutionary anthropology, by way of a number you've almost certainly heard at a dinner party.

The number your brain enforces

In 1992, the anthropologist Robin Dunbar published "Neocortex size as a constraint on group size in primates" in the Journal of Human Evolution. He'd noticed an extremely robust statistical relationship across primate species: the bigger an animal's neocortex relative to the rest of its brain, the larger the social group it lived in. Run the human neocortex ratio through that relationship and you get a predicted natural group size of about 148 — which everyone rounds to 150. That's Dunbar's number: the rough ceiling on the number of stable relationships a person can actively maintain, where you genuinely track the other person's state, reliability, and intentions.

The part people forget is that 150 isn't a single circle — it's the outer ring of a set of nested ones, each about three times the size of the last. Roughly: 5 intimate confidants, 15 close friends you'd be devastated to lose, 50 people you'd invite to a party, 150 people you keep an active relationship with, 500 acquaintances you'd recognize and greet, and around 1,500 faces you can put a name to. The circles aren't just about quantity; they're about the quality of the relationship — depth of knowledge, frequency of contact, and, crucially, trust. You'd lend money to someone in your 5. You'd struggle to remember the birthday of someone in your 500.

The underlying idea — the Social Brain Hypothesis, that primate brains grew large primarily to manage social complexity — turned thirty in 2024, and a comprehensive review that year found the neocortex-to-group-size correlation holding up across primates, carnivores, ungulates, bats, cetaceans, and birds. Maintaining a relationship means running a mental model of another agent: what they'll do, how they'll fail you, when they're reliable. That modeling has a cost, and the cost caps the count.

I should be honest about a real scientific fight here, because it actually makes the argument stronger. A 2021 Royal Society paper, "Dunbar's number deconstructed," reran the analysis with newer methods and got 95% confidence intervals as wide as 3.8 to 520. The critics' point: the existence of a cognitive limit is rock-solid, but the precise figure of 150 is shakier than its fame implies. Hold onto that, because it's exactly right for what follows. The claim that matters isn't "every team can maintain precisely 150 integrations." It's that every team has a limit, the limit is cognitive rather than technical, and crossing it produces a specific, predictable kind of rot.

Your services have the same ceiling

Now swap the primate for a service-owning team, and swap "relationship" for "API integration." To truly maintain an integration — not just connect to it — a team has to hold a mental model of it: its contract, its failure modes, its SLA, its upgrade schedule, the weird thing it does under load on the last Tuesday of the quarter. That modeling has a cost. The cost caps the count. A team's collective cognition is its neocortex, and it enforces a Dunbar number on services exactly as your brain enforces one on people.

And here's the tell that the industry has already hit this ceiling, even if it lacks the vocabulary for it: according to CNCF survey data reported in 2025, around 42% of organizations are consolidating microservices back into larger deployable units — "modular monoliths" or "macroservices" — citing debugging complexity, operational overhead, and cognitive load. That is not a niche correction. That's nearly half the industry quietly reversing the defining architectural fashion of the 2010s. The reversal has a folk name — "microservice fatigue" — but that's the symptom, not the diagnosis. The diagnosis is that organizations spun up 300, 500, a thousand services and handed them to teams whose cognitive carrying capacity tops out two orders of magnitude lower. They exceeded their Dunbar number, and the architecture started behaving like an overextended social network.

What you get when you cross the line is the "distributed monolith": services that are technically independent but operationally fused, because the team can't actually hold them apart in its head. Changing one means reasoning about a dozen others, but the team only deeply understands a handful, so every deploy is scary and every incident becomes an archaeology dig. The 3 a.m. mystery service is the canonical artifact. When a team owns 300 integrations but can only cognitively maintain 50, the other 250 don't vanish — they enter a state of managed neglect, running until they break, and breaking is when you finally meet them.

The concentric circles, in production

Dunbar's layers map onto integration tiers with uncomfortable precision, and the mapping is genuinely useful because each tier implies a different operational relationship:

Dunbar layer	~Count	The integration it describes
Intimate	5	Core dependencies (database, auth, primary API). Tested on every deploy. The team owns the contract and could explain it from memory.
Sympathy	15	Critical integrations (payments, email, monitoring). Tested weekly, SLA tracked, runbook exists, failure modes known.
Close friends	50	Important integrations. Tested roughly monthly; the team has a human contact at the provider; upgrades are tracked.
Active network	150	Functional but maintained only through automated monitoring. No deep understanding — you trust the dashboard, not your own model.
Acquaintances	500	Connected, never actively tested. You'll learn it failed from a customer support ticket, not an alert.
Names and faces	1,500+	Shadow integrations. They exist in config; nobody knows if they still work; removing them feels riskier than leaving them.

The trust gradient is the whole point. Your 5 core dependencies are exercised on every single deploy; your 500 acquaintance integrations are exercised never. When one of the 5 hiccups, you know within seconds. When one of the 500 dies, you find out the way you'd learn a distant acquaintance had a crisis — secondhand, late, through someone else.

There's a subtler lesson buried in Dunbar's data, too. Humans spend something like 40% of their social time on just their closest five relationships and the remaining 60% spread across the other 145. The investment is wildly uneven, and that's adaptive — the core deserves disproportionate attention. The anti-pattern I see constantly is the team that, in the name of fairness or process, spreads its maintenance budget uniformly across all 200 integrations. The result is the worst of both worlds: the five integrations that would take the company down get the same thin attention as the peripheral analytics webhook, so the center is quietly brittle while the team lovingly maintains acquaintances it could safely ignore. The teams that do better make peace with the gradient: pour attention into the core, accept that the periphery will occasionally break, and optimize for fewer catastrophic center failures rather than zero failures anywhere.

How many can you actually afford?

Dunbar's model came with a budget, and so does yours. Primates maintain bonds through grooming, and grooming takes time, which is why grooming-based social groups stay small. Humans got a roughly 3x larger ceiling than chimps because language is about three times more efficient than grooming for keeping a bond warm — you can "groom" several people at once with a conversation.

The software equivalent of grooming is integration maintenance: reading changelogs, retesting after upgrades, watching SLAs, rotating credentials, keeping docs current. Each integration needs ongoing grooming time, and your team has a fixed amount of it. So "how many integrations can we maintain?" is really "how much grooming time do we have, divided by the grooming cost per integration?" Put numbers on it and the Dunbar resonance is almost eerie. Industry estimates put integration upkeep at roughly 2 to 8 hours per month each. A five-engineer team has on the order of 800 working hours a month; if they spend it all grooming at 4 hours per integration, they can sustain about 200. Push to 8 hours each for gnarly, unstable services and the ceiling drops to ~100; drop to 2 hours each for stable, well-documented ones and it rises toward ~400. Somewhere between 100 and 400 — clustered, suggestively, around 150 — is your team's real number.

Treat that arithmetic as a framework to fill with your own data, not as a law of nature; the per-integration hours are a placeholder you should replace. But the shape is the lesson: your ceiling is set by grooming hours, and you can only change it three honest ways — buy more efficient grooming (tooling), reduce the number of relationships (fewer, fatter services), or hire more groomers (people). What you cannot do is wish the ceiling away with an architecture diagram.

And notice the trap hidden in the high end of that range: 150 (or 400) is a ceiling, not a target. Dunbar's 150 is the point past which relationships inevitably decay, not the number that makes you happiest or most effective. The most productive human group for real work is far smaller — 5 to 8, the "two-pizza team." The healthiest number of integrations a team genuinely owns isn't 150; it's closer to the 15-to-30 of the sympathy and close-friend rings, the zone where they can actually know every contract and failure mode. The 150 isn't where you should aim. It's the cliff edge you should keep your distance from.

"But we have great tooling"

The obvious objection: surely observability fixes this? Surely Datadog and PagerDuty and a good service mesh mean I don't have to cognitively hold every integration?

Partly — and the Dunbar frame predicts exactly how partly. Tooling is to integrations what language was to grooming: a roughly 3x efficiency upgrade that raises the ceiling without removing it. Automated monitoring replaces manual checking, alerting replaces proactive poking, dashboards stand in for mental models. A team with excellent observability might genuinely sustain ~450 integrations where a bare team sustains ~150. But automation only catches the failure modes you anticipated and encoded. The novel ones — a silent contract change, a subtle behavioral regression after an upgrade, a deprecation notice buried in an email — still demand human cognition to notice and absorb. You can't monitor 10,000 integrations into being understood any more than 10,000 social-media connections are 10,000 real relationships. The most valuable thing a platform-engineering team does, in this light, isn't shipping features — it's playing the role language played for our ancestors: raising the whole organization's effective Dunbar number.

What to do Monday morning

Here's the exercise, and it takes an afternoon. List every service your team integrates with, then sort each into a tier using a single honest question per level:

Core (aim ~5): Could you explain its full contract, failure modes, and current SLA from memory, right now?
Critical (aim ~15): Is there a runbook, and have you tested it in the last month?
Important (aim ~50): Do you know a human to call at the provider, and have you touched it this quarter?
Active (aim ~150): Does monitoring actually alert on its failure, and has anyone edited its integration code in the last year?
Acquaintance: Does it exist in your codebase but you've never debugged it — and would you learn of a failure from a customer, not an alert?
Shadow: Is it in your dependency graph but no current team member has ever worked on it?

Count each tier. If you're carrying far more than ~15 critical, ~50 important, or ~150 active, you're over budget, and the integrations bunched at the edge of each tier are precisely the ones that will page you at 3 a.m. Then do the thing nobody schedules: work the shadow list. For each one, ask — is it even used? Can it be deleted? If not, who owns it now? This is cleaning out your contact list, and in software a forgotten contact can cause a production incident when it changes its API without telling you.

The strategic move that follows is the one the 42% are stumbling toward: when a team is over its Dunbar number, the fix is not a better dashboard for 300 services. It's fewer, fatter relationships — merge thin services into coherent units, give each unit a single owning team, and make every remaining cross-service dependency a deliberate, groomed relationship instead of an accident discovered during an outage. Invest in depth over breadth, exactly as you would with people.

This is about to matter far more than it does today. The AI agent economy — projected past $50 billion by 2030 — runs on agents integrating with dozens or hundreds of other agents and services, each connection carrying the same upkeep cost: understand the contract, handle the failure, verify the trust, track the version. An agent can technically wire itself to thousands of endpoints. The question that will separate robust agent fleets from fragile ones is the Dunbar question: not how many can it connect to, but how many can it actually trust — keep a live model of, notice when it drifts, and depend on under load. Design the mesh with the circles in mind: a core trust ring of ~5, a monitored ring of ~15, an active ring of ~50, and a healthy suspicion of anything past 150.

Your brain caps you at about 150 people because tracking a relationship costs neocortex. Your team caps out at about the same number of services because tracking an integration costs cognition. It is the same constraint, wearing different clothes — and the services past the line aren't really integrations at all. They're strangers you happen to be connected to, waiting for an incident to introduce you.

For agents, the Dunbar question isn't how many you can connect to. It's how many you can trust.

An agent can wire itself to thousands of endpoints; what it cannot do is keep a live trust model of all of them. The Agent Trust Stack is how you make each connection a groomed relationship instead of a stranger in the stack trace: signed provenance for what a service actually did, portable reputation for how it has behaved over time, and verifiable identity underneath both — the live model the dashboard can't give you, for the rings past your team's own ceiling.

pip install agent-trust-stack · npm install agent-trust-stack
vibeagentmaking.com → · See the stack in action

← Back to all posts