← Back to blog

The Asperity Junction Problem in API Integration

Real contact between two metal surfaces is 1–10% of the apparent contact area. The same geometry shows up in every API integration — and most documentation effort is busy measuring the wrong surface.

Published April 2026 · ~12 min read

In 1939, Frank Bowden and David Tabor ran an electric current through two pieces of contacting metal in a Cambridge laboratory. They measured the resistance, changed the geometry — flat block against flat block, sphere against flat — and measured again. Pressing a flat block of steel onto another flat block produced only about twice the electrical conductance of pressing a sphere of similar load against the same surface, despite the flat block’s nominal contact area being orders of magnitude larger.

Current flows through metal where metal touches metal. If the conductance was barely changed by going from a point to a plane, then what people had been calling “the contact area” had nothing to do with the actual contact. The plane was a lie. The two flat blocks were touching at a few microscopic peaks — asperities — and the smooth landscape between those peaks was carrying no load at all.

This experiment, published in the Proceedings of the Royal Society and elaborated in Bowden and Tabor’s two-volume The Friction and Lubrication of Solids (Oxford, 1950 and 1964), established the foundational geometry of modern tribology: when two surfaces meet, the real contact area is typically between 1% and 10% of the apparent contact area, and for some metals under light loads it drops below 0.1%. The asperity junctions bear the entire load. Friction force resolves cleanly: F = τ × A_real, where τ is the shear strength at the junctions. The coefficient of friction μ — Amontons’s mysterious constant — turns out not to be a constant at all. It is a derived quantity, a ratio that falls out of the geometry once you measure the right surface.

Most API integration failures live at exactly this geometry, and most API documentation effort is busy measuring the wrong surface.

Where Specs Don’t Touch

A REST or GraphQL specification looks complete the way two flat metal blocks look complete. Every endpoint listed. Every field typed. Every error code enumerated. The contract describes a smooth, broad surface of agreed contact: here is what we promise, and here is what you can rely on.

In production, almost none of that surface is bearing load.

A 2026 audit of 47 internal API endpoints at a mid-size SaaS company, reported in DEV Community (“Your API Tests Are Lying to You: The Schema Drift Problem Nobody Talks About,” February 2026), found that 23 of the 47 endpoints — 49% — had at least one undocumented structural change over six months. Nine endpoints (19%) experienced silent type changes in existing fields. Fourteen (30%) added undocumented new fields. Four (9%) had fields become nullable without notification. Two had fields silently removed. The CI test suite passed 100% every day for the entire period.

One concrete incident from the audit traced to a single field: user_id, changed from integer (4521) to string ("4521") during a routine database migration. A mobile client with strict JSON parsing rejected the string, swallowed the error, and rendered a blank screen. About 30% of users were affected. No alert fired. No test failed.

This is the perfect Bowden-Tabor illustration of the API surface. Forty-seven endpoints, hundreds of fields, a spec that runs to thousands of lines — apparent contact area is enormous. Real contact area, the place where production traffic actually loads the integration, is a handful of fields. One of those fields was user_id, and its real contact area was 100% of the mobile crash blast radius.

The same audit estimated the cost in concrete terms: roughly two schema-drift incidents per month, averaging 3.5 days per incident to resolve. That is one full-time engineer dedicated entirely to remediation of asperity-junction failures. A separate 2026 analysis from API documentation vendor Theneo found that unmanaged API changes account for around 40% of integration failures and 15–20 hours per incident in emergency fixes.

The numbers track Bowden and Tabor’s. The cost of believing in apparent area is the same in both domains: effort spent polishing a surface that isn’t where the contact is.

Junction Growth, Without a Software-Engineering Name

Tabor’s 1959 follow-up work added a wrinkle with no analogue in standard software engineering vocabulary. When normal load is held constant and a tangential force is applied, the asperity junctions plastically deform and grow in real area without any additional normal load. Friction self-amplifies through the geometry of the contact itself.

The integration version is familiar to anyone who has nursed a long-running API dependency through several quarters. Operational stress arrives — a surge in request rate, a tightened timeout, a new error response that suddenly returns in 0.1% of calls. The integration’s real contact area with the API expands. Error handlers proliferate. Retry logic complexifies. The coupling surface grows beyond the designed interface, the same way physical junctions grow beyond the Hertzian prediction under tangential stress. Software engineering files all of this under “technical debt,” which is a categorical bucket, not a geometry.

Four Wear Mechanisms, Four Failure Modes

Tribology recognizes four primary wear mechanisms. Each of them has a structural twin in API integration. The mapping is not metaphorical decoration — it is the same geometric story playing out on different materials.

Adhesive wear is the cold-welding of asperities. Two contacting peaks bond at the metal-on-metal interface, and when the surfaces move, material transfers from one body to the other. In integration code, adhesive wear is the consequence of a consumer that parses the entire response body — including fields it was never contracted to use — and quietly absorbs them into its own data model. Postel’s law (“be liberal in what you accept”) was meant to prevent friction. Instead, it created adhesive wear: consumers cold-weld to undocumented fields, and the fields cannot be removed by the provider without dragging the consumer’s data model along. The cost of separation is material transfer — refactoring the consumer.

Abrasive wear is plowing: a hard asperity gouges a softer surface. The asymmetry matters. In APIs, the hard surface is a dominant provider with rapid release cadence; the soft surface is a consumer whose deployment cycle is measured in weeks. Stripe is a deliberate case study in plowing control. Their versioning system pins each consumer to a specific API version; breaking changes are concentrated into a small number of named releases (the “2024-09-30.acacia” pattern in their public versioning documentation), with most monthly releases non-breaking. This is engineering with the geometry in mind: the provider releases continuously, but the consumer’s contact surface stays quiet until it explicitly pins to a new version. Surface coatings perform the same function in mechanical systems — DLC (diamond-like carbon) layers are not the base material; they are a controlled surface that determines the contact mechanics.

Fatigue wear is cyclic stress that nucleates subsurface cracks. After enough cycles, a piece spalls off — not because of the latest load, but because of accumulated damage from thousands of prior cycles that each looked harmless. The audit data is fatigue wear in action. Six months of passing tests; six months of minor non-breaking version bumps that each looked clean; 23 endpoints quietly accumulating subsurface damage. The spalling event hadn’t happened yet. The stress concentrators were already in place.

Tribochemical wear, also called fretting, is the most insidious mechanism, and the most useful mapping. Fretting occurs at clamped joints under small-amplitude oscillatory loading — a few microns of motion, repeated millions of times. The result is a distinctive red oxide debris that engineers in the field call “cocoa” (Hutchings & Shipway, Tribology: Friction and Wear of Engineering Materials, 2nd ed., 2017). The cocoa is harder than the original steel. It works as an abrasive. It accelerates the wear it was meant to absorb.

Two services that share an API contract but never undergo a “breaking change” are nonetheless vibrating against each other. Request rates fluctuate. Timeouts tighten. Retry policies drift across versions. Rate limits change. Error codes nobody used to return start showing up in 0.01% of responses. Each adjustment is below the threshold of a formal contract change. None individually breaks anything. What accumulates around the integration — workaround code, special-case handlers, compatibility shims, retry loops nobody documented — is the cocoa. It is harder than the original codebase. Future engineers cannot remove it without breaking something. It is actively making the system worse.

This is the wear pattern that experienced engineers feel in their bones when they say a stable integration is “fragile.” They are not wrong. They are looking at oxidized debris.

The Diagnostic Problem

Bowden and Tabor’s central methodological move was to stop measuring bulk friction and start measuring junctions directly. In 1986, the atomic force microscope made this physically possible at the single-asperity scale (Mate et al., “Atomic-scale friction of a tungsten tip on a graphite surface,” Phys. Rev. Lett. 59:1942, 1987). Before AFM, tribologists measured the macro behavior and inferred the microstructure. After AFM, they could see the asperities one at a time.

API integration is at the equivalent of pre-AFM tribology when it relies on spec-conformance testing. A test suite that exercises every documented field measures apparent area. It cannot, by construction, find the asperity junctions, because the asperity junctions are the fields whose behavior diverges from the spec under load — fields the spec is not describing accurately.

Two diagnostic instruments break out of this trap.

Consumer-driven contract testing (Pact, originating around 2013, and successors) inverts the testing geometry. Instead of asserting against the full spec, the consumer publishes the specific contract it requires — only the fields it actually uses, with the types and constraints it actually depends on — and the provider verifies its current implementation against every consumer’s contract. Pactflow’s “can-i-deploy” check queries a broker and refuses deployments that would violate any consumer’s real contact surface. This is the AFM move applied to APIs: stop measuring the bulk surface, measure the load-bearing junctions one at a time.

Traffic capture and replay is the profilometry equivalent. A surface profilometer maps a surface before a wear test and after; the difference is the wear scar. Tools like GoReplay and Speedscale, and increasingly eBPF-based interceptors that capture at the kernel level with overhead typically reported under 1% CPU, record real production requests and replay them against staging. The diff between recorded and replayed responses is the schema drift — the wear scar of API evolution made visible. eBPF maturation across 2024–2025 has made this approach broadly deployable without modifying application code, much the way AFM measures friction without modifying the surface.

The two instruments work in complementary directions. Contract testing names the junctions ahead of time and asserts on them. Traffic replay measures the junctions as they exist in the wild, including the ones nobody knew were load-bearing.

Where the Analogy Breaks

API fields are designed; surface asperities are accidents of grinding and machining. The analogy is not airtight, and it is worth saying where.

A fair concession: API fields are intentional, but the load-bearing status of specific fields is emergent, not designed. Nobody specifies user_id to be the field whose type change crashes 30% of the user base. That status emerges from production usage patterns — the same way asperity junctions emerge from contact patterns even though every peak on the surface was, in some sense, “produced” by the manufacturing process. The geometry of which peaks bear load is not designed in either case.

A second concession: surfaces cannot be versioned, but APIs can. This is true at the level of the original material. It is less true once you consider surface coatings, which serve precisely the function of controlled versioning — a layer applied over the base substrate to determine the contact mechanics without altering what’s underneath. Stripe’s version pinning is the DLC of integration.

The load-bearing test for any cross-domain analogy is whether it predicts something the standard framing does not. The asperity-junction view predicts four things that “good API documentation” does not: that measuring the spec surface is systematically misleading; that failure concentration at a few junctions is geometrically inevitable rather than a fixable bug; that diagnostic effort should be directed at measuring real contact rather than expanding documented contact; and that the four wear mechanisms require four distinct engineering responses, not one generic “better testing.” Those predictions hold. The framing earns its keep.

What To Do On Monday

The practical move is small and procedural, and it cuts across the four wear mechanisms.

For adhesive wear: enforce strict consumer parsing. Read only the fields the consumer’s contract names; ignore the rest. JSON schema validation at the consumer boundary, configured to reject undocumented properties (additionalProperties: false in JSON Schema, or the equivalent in your validator), prevents cold-welding before it forms. The cost is a few lines of validator config per integration. The benefit is that the consumer’s data model never accidentally contains the provider’s internal schema.

For abrasive wear: pin versions explicitly. If the provider supports version headers, set them and watch deprecation notices. If the provider doesn’t, build a compatibility shim at the integration boundary so a single team owns the upgrade path rather than every consumer absorbing it independently.

For fatigue wear: deploy contract tests that assert on the specific fields the consumer uses. A Pact contract can be a handful of lines per field, and the broker becomes the system of record for “where this integration actually touches the API.” Run the contract verification in the provider’s CI; treat a contract failure as a deploy block, not a warning.

For tribochemical wear: measure the cocoa. Inventory the workaround code around each integration on a quarterly cadence. Compatibility shims, special-case error handlers, retry loops, magic numbers in timeout configs — these are wear products. Track their growth. When the wear products outweigh the original integration code, the integration is failing whether or not anything is “broken.”

The Greenwood-Williamson statistical model of surface contact (1966) made one observation that ports cleanly: asperities are not sharp spikes. The radius of a typical asperity is orders of magnitude larger than its height. Asperities look like normal surface, until you load them. The dangerous fields in an API look the same way: user_id, timestamp, status, email. Broad, shallow, normal-looking. Their dangerousness is not visible in the spec. It only appears when production traffic loads them.

Bowden and Tabor’s 1939 experiment ended with a number — twice the conductance of a sphere-on-plane contact — and a conclusion that what people had been calling “the contact area” was wrong. The right contact area was a few asperity junctions, and friction was their shear strength times their real area. Everything that looked like flat surface was, in the language of metal-on-metal contact, not touching.

The next time you read an API spec, do the same experiment. Pass current through it — production traffic, real consumer code, captured and replayed. Measure what conducts. The fields that light up are the junctions. The rest of the spec is a flat surface. Beautifully smooth. Carrying no load.


A profilometer for the contract surface

Contract testing names the junctions. Traffic replay finds the ones nobody named. The third instrument is provenance: a hash-linked record of which contract version, which consumer, and which payload shape was active when the request was served. When the spalling event arrives six months in, you don’t infer the wear path from logs — you read it.

Chain of Consciousness is that record. Append-only, periodically anchored, structurally non-fabricable. The audit trail your CI suite isn’t writing.

pip install chain-of-consciousness  or  npm install chain-of-consciousness

Hosted Chain of Consciousness →