S&P cut the United States from AAA to AA+ in 2011, and Treasury yields fell. The grade was the last thing to move. Here is how to read the parts that move first.
On Friday, August 5, 2011, after the markets had closed, Standard & Poor's did something no one had ever done before. It stripped the United States of its AAA credit rating, the top grade the country had carried for seventy years, dropping it to AA+. This was the safest borrower on Earth being told, officially, that it was a notch less safe. Every intuition says the same thing should follow: investors demand more yield to hold riskier debt, Treasury prices fall, borrowing costs jump.
The opposite happened. In the days after the downgrade, investors bought Treasuries, and the yield on the ten-year note fell. It would keep falling for weeks, toward record lows. The most dramatic credit-rating action in modern history, and the market it was about moved the wrong way.
That is not a paradox, and it is not a story about the market being irrational. It is a story about what a rating grade actually is, and the answer is genuinely counterintuitive. By the time the letter changed, it carried almost no information the market hadn't already absorbed and priced months earlier. The grade was the last thing to move, not the first. Which leads to a claim that sounds backwards and is, in fact, precise: the least informative part of a rating action is the rating.
I want to convince you of that, because the same structure governs something you probably stare at every day, the alert, the SLO, the risk score, the dashboard. Reading it wrong is the difference between catching a problem early and getting paged about it after it is over.
Here is the part almost nobody outside the field knows: the agencies make the grade lag on purpose. The major credit-rating agencies rate through-the-cycle. They deliberately smooth out the ups and downs of the business cycle and change a rating only when they are fairly sure the change won't have to be reversed soon. The explicit goal is to avoid what the literature calls the "rating bounce," the credibility-destroying spectacle of an agency downgrading a company and then upgrading it again three months later. Edward Altman and Herbert Rijken titled a well-known paper on exactly this "Avoiding the Rating Bounce: Why Rating Agencies Are Slow to React to New Information." The slowness is not a bug they are embarrassed about. It is the design spec.
And the design has a documented price, named plainly in the research. An IMF working paper by John Kiff and Michael Kisser on through-the-cycle rating found that such ratings "relatively seldom" change, "exhibit serial dependence, and lag changes in the issuers' default risk," that their "stability is relatively high, while their default prediction power is low." Read that again, because it is the whole essay in one sentence: the agencies bought stability, and they paid for it in predictive power. The grade is a deliberately low-pass-filtered signal.
Once you see it as a filter, the rest follows by physics. A low-pass filter, by construction, removes the high-frequency content: the fast wiggles, the sudden moves, the timing. That is what it is for. So complaining that "the downgrade came too late" fundamentally misunderstands the instrument. A through-the-cycle rating's job is to not react to the delta. Which means the grade cannot carry the delta's information. The very smoothing that makes it stable is what strips the timing out of it. If you want to know what just changed and how fast, the grade is the one place in the entire document guaranteed not to tell you. You have to read the parts that aren't smoothed.
A rating action is not a number. It is a small package of components, and a credit analyst reads them in almost the exact reverse of the order a layperson does. The layperson reads the letter first and often stops there. The analyst reads the letter last.
At the top of the analyst's attention is the reasoning, the paragraph that says which specific factor moved. Leverage crossed a threshold. A refinancing wall appeared. A covenant tightened. Liquidity got thinner. This is the highest-information part of the action, because it is an attribution: not "things got worse" but "this is the thing that got worse." Next is the outlook and, more urgently, the CreditWatch, the directional signals, the agency's own statement about rate-of-change. These genuinely lead. S&P's own criteria make a CreditWatch a stronger near-term warning than an Outlook, and academic work on ratings has shown that the change in a rating carries information the level simply does not. The level is a lagging state; the action is the signal. Only after all of that does the analyst glance at the grade itself, the lagging synthesis of everything already read.
And the sharpest signal of all isn't inside the rating action at all. It is in the market price of credit risk, the credit default swap spread. The canonical study here is Hull, Predescu, and White, in the Journal of Banking & Finance in 2004, which found that CDS spreads anticipate the agencies: reviews for downgrade, and the downgrades themselves, are partially priced into spreads before the announcement lands. Later work confirmed the CDS market leads price discovery, and the question is still live enough that the U.S. Office of Financial Research published a 2024 paper titled, almost cheekily, "Do Credit Default Swaps Still Lead?" The spread is the first derivative the market can see in real time; the grade is the slow integral that catches up weeks later. So the full hierarchy, from most informative to least, runs: the market's live spread, then the reasoning (which factor), then the outlook and watch (which direction, how fast), and finally, trailing all of it, the letter. Exactly upside down from how the headline presents it.
Now the part that should make this more than a finance lecture. Roughly forty years later and in an entirely different industry, site reliability engineering independently arrived at the identical architecture, and at the identical critique of the naive version. Nobody in Google's reliability org was reading Altman and Rijken. They reached the same conclusion by getting paged too many times.
The naive alert is the ops equivalent of the letter grade: CPU > 90%, error rate > X, a static threshold crossing on a smoothed level. Google's SRE practice is blunt that this is the wrong trigger. Don't alert on the level, they say; alert on the burn rate, how fast you are consuming your error budget relative to the rate you can afford. The rate-of-change is the alert. That is precisely the CDS spread of operations: a first derivative that fires while there is still time to act, instead of a level that confirms the outage after your users have already felt it. They go further: alert on symptoms, what the user actually experiences, not raw backend numbers. That is the reasoning paragraph of ops: not "a metric is high" but "this specific thing the customer feels is breaking."
And the most exact echo of all is a technique called multi-window, multi-burn-rate alerting, which exists for one reason: paging on a single line is broken. A single short-window threshold either fires fast and cries wolf, or is precise but slow. You can't get both from one line. So the technique requires a fast window and a slow window to both confirm before it pages: the delta says "we're burning hot right now," and the second window says "and it's real, it's persisting." That is, bone for bone, the outlook-plus-CreditWatch logic, a directional signal confirmed by a second read that it won't immediately reverse. Two professions, four decades apart, reaching for the same two-part trigger.
They were even fighting the same pathology. The credit agencies smooth to avoid the "rating bounce," because flip-flopping grades would train the market to ignore ratings. SREs fight alert fatigue for the identical reason: noisy threshold pages train responders to ignore the pager, and an ignored page is worse than no page. The disease is the same. A system that fires on the raw level loses the trust of the people it is supposed to warn. Credit solved it by smoothing the output and moving the real signal into the outlook and the spread. Ops solved it by smoothing the trigger and moving the real signal into the burn rate and the symptom. Same problem, same answer, opposite directions of approach.
Once you've seen it twice in fields that never spoke to each other, you can state it as a law for reading any number that is a smoothed synthesis of underlying factors, which is to say, almost every score you are handed. A fraud score. A clinical early-warning score like NEWS2 at a hospital bedside. A model-drift or data-quality metric. An ESG score, an internal risk score, a Net Promoter Score, the annual performance rating. In every single case the structure is the same: the headline number is the most-lagging, least-actionable part, and the actual signal is two things underneath it, which input moved (the reasoning) and how fast it is moving (the rate-of-change).
The universal mistake is to act on the level. It even has a recognizable shape: treating a deliberately-smoothed summary as if it were ground truth, and using it as a trigger when the very thing that makes it a good summary, its stability, is exactly what disqualifies it as a trigger. The people who waited for S&P to confirm what the CDS market had been screaming for months made that mistake. So does the engineer who pages on CPU instead of burn rate, the manager who waits for the annual review to learn an employee is leaving, the clinician who waits for the score to cross instead of watching the trend climb. They are all reading the integral and ignoring the derivative.
So here is the move, and it works identically on a rating action, a Datadog alert, a risk dashboard, or a quarterly review. Stop watching the number cross the line. Concretely, three changes:
First, trigger on the rate-of-change, not the level. A metric that has been flat and is now moving fast is worth more than a metric that is high but stable. Page on the burn rate, the spread, the slope, the first derivative. The absolute value is the slowest thing in the room.
Second, demand the attribution, and route that. A score with no "which factor moved" attached is a rumor. Build your alerts, your reviews, and your dashboards so the thing that reaches a human is the symptom and the cause, "checkout latency is spiking for mobile users," not "score = 73." The reasoning paragraph is the most valuable line in a rating action for a reason; give your own systems one.
Third, treat the smoothed grade as confirmation, never as a trigger. By the time a through-the-cycle number crosses its threshold, the information that moved it is, by design, weeks old. Use the grade to close the loop and document what happened. Never use it to decide that something is happening. If your alerting fires when the smoothed level crosses a line, you have built an instrument whose stability guarantees you'll be last to know.
When S&P finally cut the United States in August 2011, the market had effectively known for months, which is why it shrugged, and why yields fell instead of rising. The grade wasn't leading anything. It was the slowest reader in the room, finally turning the page everyone else had already finished. Whatever number you're staring at right now, the only question that matters is whether you're reading the part that knows first, or the part that is just catching up.
Standard & Poor's downgrade of the United States from AAA to AA+ on August 5, 2011 (the first such downgrade of U.S. sovereign debt by a major agency), and the subsequent flight-to-safety rally in Treasuries in which ten-year yields fell rather than rose. Deliberate lag / through-the-cycle (TTC) rating and the "rating bounce": Edward Altman & Herbert Rijken, "Avoiding the Rating Bounce: Why Rating Agencies Are Slow to React to New Information." The stability-versus-accuracy tradeoff (TTC ratings "seldom change, exhibit serial dependence, and lag changes in the issuers' default risk… stability is relatively high, while their default prediction power is low"): John Kiff & Michael Kisser, IMF Working Paper WP/13/64 (2013), "Rating Through-the-Cycle: What Does the Concept Imply for Rating Stability and Accuracy?" CDS spreads anticipating rating actions (price discovery leads the grade): Hull, Predescu & White, "The relationship between credit default swap spreads, bond yields, and credit rating announcements," Journal of Banking & Finance 28(11), 2004; and U.S. Office of Financial Research Working Paper 24-04 (2024), "Do Credit Default Swaps Still Lead? The Effects of Regulation on Price Discovery." Within-action hierarchy: S&P Global Ratings, "Use Of CreditWatch And Outlooks" (a CreditWatch signals a higher near-term probability of a rating change than an Outlook); and the finding that rating changes carry information the rating level does not (Financial Management / University of Michigan, 2010). SRE convergence: Google SRE Workbook, "Alerting on SLOs," alert on error-budget burn rate rather than static thresholds, alert on symptoms (user-facing impact) rather than raw backend numbers, and multi-window multi-burn-rate alerting (a fast and a slow window must both confirm); the failure mode being alert fatigue. Generalization examples (smoothed synthesis scores read the same way): clinical early-warning scores such as NEWS2/MEWS, fraud and internal risk scores, model-drift metrics, ESG scores, NPS, and performance ratings. The synthesis, that the smoothing is deliberate and therefore the grade cannot carry the timing signal, that credit and SRE independently built the same "act on the derivative and the attribution, not the level" architecture, and the three-step prescription, is the essay's own argument.
An agent's rating is a lagging synthesis. Ship the reasoning and the trend with it.
A single letter or star count is the slowest part of any reputation signal. The Agent Rating Protocol carries what the grade can't: which factor moved, which direction, and how fast, so a consumer reads the rate-of-change and the attribution instead of waiting for the score to cross a line.
pip install agent-rating-protocol · npm install agent-rating-protocol
vibeagentmaking.com →