Skip to main content
Dimensions Deep DiveD8 · Weight 0.13

Reliability and Agent Readiness: Why Status Pages Score 13% (D8)

Of the nine Agent Readiness dimensions, D8 Reliability carries a 0.13 weight — second-highest, behind only D2 API Quality. The reason is simple: agents automate repeat actions. An unreliable endpoint breaks automation, breaks user trust, and gets routed around. Here is why reliability is load-bearing, what AgentHermes actually checks, and how dev infra companies turn it into a 65-plus score.

AH
AgentHermes Research
April 15, 202612 min read

Why Reliability Is the Second-Highest Weighted Dimension

A human using a website forgives flakiness. They reload the page, wait for the spinner, or come back later. An agent completing a task on behalf of its user does not. A single 5xx response collapses a long-running workflow, burns inference budget, and surfaces an error to the user that the agent cannot recover from without human input.

AgentHermes weights the nine dimensions to reflect real agent behavior. D2 API Quality (0.15) comes first because without a usable API nothing else matters. D8 Reliability (0.13) is next because an API that exists but fails 2% of the time is effectively unusable for any workflow with more than 50 calls — and production agent workflows routinely make 500 calls per task.

The evidence is in the scan data. Of the 500 businesses AgentHermes has scanned, the seven companies with the highest D8 subscores are all developer infrastructure providers — Resend, Vercel, Statuspage, Supabase, Stripe, GitHub, Grafana. They cluster because reliability is existential to their business, and the discipline required to make an API reliable happens to be the exact discipline agents reward.

0.13
D8 weight
2nd
highest of 9 dimensions
70
Statuspage D8 ceiling score
500
businesses scanned

The Five Reliability Signals AgentHermes Scores

D8 is not a single check. It is the composite of five observable signals that together tell an agent whether this service will be there tomorrow, next month, and under load. Each signal contributes to the subscore and combines before we multiply by the 0.13 dimension weight.

Public status page

Largest single D8 signal

A dedicated status.yoursite.com or equivalent page that shows live component health. Agents check this before calling your API to avoid wasted attempts.

Example: status.openai.com, status.vercel.com, statuspage.io

/health or /status endpoint

High — this is the agent-native signal

A machine-readable endpoint that returns JSON with per-subsystem status. Agents can poll this directly instead of parsing a human status page.

Example: GET /health returns { "status": "ok", "db": "ok", "cache": "ok", "checked_at": "2026-04-15T..." }

Uptime history

Medium — trust signal

Historical SLA data published publicly — uptime over 30, 90, 365 days. Agents weight routing decisions against history, not just current state.

Example: 99.99% last 90 days, 99.97% last 365 days, published per region

Incident response discipline

Medium — qualitative signal

Post-mortems, postmortem templates, and visible RCAs. Tells agents you will not disappear when something breaks.

Example: Public postmortems for every incident over 10 minutes, mean time to recovery tracked

Published SLA

Signal plus real guarantee

A formal SLA document with availability targets, response times, and credits for breaches. Makes reliability contractual.

Example: 99.9% monthly uptime, 2-hour response on P1, prorated credits on miss

A business with all five signals fully implemented can saturate D8 and contribute roughly 13 points to the total score. A business with only a status page and no /health endpoint captures maybe 5 of those points. The gap between mediocre and excellent reliability is the difference between a Bronze and a Silver tier on total score.

Making Reliability Visible: The Top D8 Scorers

These seven companies set the reliability bar in our dataset. What they have in common is not better uptime — it is better visibility into that uptime. They made reliability a public artifact, not an internal metric.

Company
Score
What they do
Statuspage
70
Their product IS reliability infrastructure — public status, history, incident comms all bundled.
Vercel
70
status.vercel.com plus /status plus regional uptime history plus detailed postmortems.
Supabase
69
status.supabase.com, per-project health, public SLA, automated incident timelines.
Stripe
68
status.stripe.com with API latency percentiles per region, 7-year track record of transparency.
GitHub
67
githubstatus.com, per-service breakdown, live incident feeds parsed by agents across the industry.
Grafana
65
status.grafana.com plus deep observability of their own cloud — reliability is the brand.
MongoDB
65
status.mongodb.com, Atlas multi-region SLA, replica set health exposed via API.

Notice the pattern: making reliability visible IS the signal. Your actual uptime matters less than whether agents can verify it. A company with 99.999% uptime and no public status page scores worse on D8 than a company with 99.9% uptime and a beautiful status.yoursite.com. The public artifact is what closes the loop.

This is also why Statuspage itself scored 70 on total Agent Readiness despite being a reliability product. They nail D8, but the other eight dimensions bring the total down. See our deeper analysis in Why Developer Tools Dominate Agent Readiness for the full breakdown of why this cluster wins.

How to Raise Your D8 Subscore

Five actions, ranked by leverage. If you do the first three in a week, you move from the bottom quartile of D8 to the top third. The last two take longer but compound.

1

Stand up a status page at status.yoursite.com

Free options: Upptime on GitHub Pages, Instatus free tier, or Better Stack. Paid: Atlassian Statuspage. Any of them award full credit for the signal. Point it at your real monitoring — do not fake it.

2

Ship a /health endpoint that returns JSON

Top-level status, map of subsystems, checked_at timestamp. Unauthenticated, sub-100ms, CORS-open. Link it from your docs and your llms.txt so agents find it.

3

Publish uptime history

30-day, 90-day, 365-day uptime on your status page. This is standard in every status platform — turn it on. Historical numbers move D8 more than a single current-state green light.

4

Post every incident RCA publicly

A short postmortem for any incident over 10 minutes. Timeline, impact, root cause, remediation. Host under /incidents or directly on the status page. Compounding trust signal.

5

Publish a real SLA

99.9% uptime target, response time commitments by severity, and credits on miss. Even a self-imposed SLA with no legal teeth raises D8 because it turns reliability from marketing into a contract.

Pro move: add a monitoring tool link to your agent-card.json so agents can subscribe to status updates programmatically. An agent that can check your status page before every 1,000-call workflow saves budget and reports failures accurately. This is the kind of agent-native thinking that separates D8 ceiling scorers from the rest of the Silver tier.

Why Agent Adoption Tracks Reliability Directly

When an agent operator builds a workflow on top of your API, the first failure mode they fear is silent unreliability — a flaky endpoint that works in testing and fails 1% of the time in production. That 1% becomes 30% of user-visible failures when you chain three calls together.

Agent developers pick the most reliable service in a category, not the cheapest. We see this play out in our adoption data: among the APIs being wrapped in community-built MCP servers today, the ones with public status pages and /health endpoints are picked three times as often as similar-priced competitors without them. The signal is the sort.

If you want your service to become an agent-native category winner, reliability visibility is the first investment to make. It is cheaper than a better API, faster to ship than an OpenAPI spec, and moves your total Agent Readiness Score more per hour of engineering than almost any other action.

Frequently Asked Questions

Why is reliability weighted so heavily in the Agent Readiness Score?

D8 Reliability carries a 0.13 weight — the second-highest of the nine dimensions, behind only D2 API Quality at 0.15. Agents automate repeat actions. A single unreliable endpoint turns a thousand-call-a-day workflow into a queue of retries, timeouts, and user-visible failures. Unreliable services are abandoned by agent developers faster than unreliable websites are abandoned by humans, because the cost of a failed call is borne by the agent operator, not the end user. Reliability is load-bearing.

What does AgentHermes actually check for D8?

We probe five signals: presence of a public status page (status.yoursite.com or equivalent), presence of a machine-readable /health or /status endpoint, published uptime history, visible incident response discipline, and a posted SLA. Each signal contributes to the D8 subscore, which is then multiplied by the 0.13 dimension weight. Caps apply — if your primary endpoints return 5xx during the scan, D8 is capped regardless of status-page quality.

Do I need a dedicated Statuspage or Atlassian Statuspage subscription?

No. A public status page built on any platform counts — Atlassian Statuspage, Better Stack, Instatus, or a self-hosted Upptime page on GitHub Pages. What matters is that the page is reachable, shows per-component status, and exposes historical uptime. AgentHermes does not check the vendor, only the signal. A GitHub Pages + Upptime setup gets the same D8 credit as a 200-dollar-a-month Statuspage subscription.

What should my /health endpoint return?

Return a JSON object with a top-level status field ("ok" or "degraded" or "down"), a map of subsystem statuses (db, cache, queue, upstream APIs), and a checked_at ISO timestamp. Keep the response under 100 milliseconds and do not require authentication. A minimal example: { "status": "ok", "version": "1.4.2", "checks": { "db": "ok", "redis": "ok" }, "checked_at": "2026-04-15T12:00:00Z" }. That one endpoint adds measurable points to D8 and signals that you built for agent-era clients.

Why did Statuspage itself only score 70, not higher?

Statuspage is Silver-tier (60-74) because while D8 Reliability scores at ceiling, other dimensions pull the total down. D2 API Quality is partial — they have an API but documentation is less structured than OpenAPI. D9 Agent Experience is limited because there is no MCP server or agent card. Even a reliability-as-a-service company hits the same pattern as the rest of the industry: great in one dimension, partial in others. The only Gold business in our 500-scan dataset is Resend at 75, and it took across-the-board discipline to get there.


See your D8 subscore

Run a free Agent Readiness scan to see exactly how reliability is scoring on your domain and which of the five signals are missing. Every report includes a prioritized fix list.


Share this article: