Skip to main content
Dimensions Deep DiveD9 = 10% Weight

Agent Experience (D9): The Dimension That Actually Measures Agent Usability

D1 asks “can agents find you.” D2 asks “can they call you.” D9 asks the question that actually determines whether they will keep calling you: is your API pleasant for an agent to use? Seven signals separate the top-decile APIs from the ones that return HTML stack traces.

AH
AgentHermes Research
April 15, 202612 min read

D9 Is the “Would a Human Developer Hate This API” Dimension

Most of the Agent Readiness Score measures capability. D1 checks whether agents can discover you. D2 checks whether your API exists. D5 checks whether an agent can pay. D7 checks whether auth works.

D9 measures something different: quality of the agent developer experience. It is the dimension that overlaps most with how a senior engineer would evaluate a third-party API before committing to it. Good D9 is invisible — the agent makes a call, gets exactly what it expects, and moves on. Bad D9 is where agents file 3-day bug reports because no one can tell which request failed.

D9 carries a 0.10 weight in the v4 scoring model. It is tied for fourth-highest with D6 Data Quality. Paired with D2 API Quality (15%) it represents 25% of the total score — the developer-experience surface of the full rubric.

10%
D9 weight in v4 scoring
7
signals checked
3-5
pt lift from middleware fix
500
businesses in dataset

The Seven Signals of Agent Experience

These are the exact signals AgentHermes checks during a D9 evaluation. Each one is weighted roughly equally — 1.0 to 1.5 points out of 10 — with slight bias toward the trio that dominate debugging experience: request IDs, structured errors, and rate-limit headers.

Request IDs in Responses

Every response carries a unique ID (header or body) the agent can log, include in bug reports, and correlate across retries. This is the single biggest quality-of-life upgrade for agents debugging prod issues.

Signal: X-Request-Id: req_01HX2Y7F9P3KQ8... also included in error body

Structured Error Codes

Errors return machine-readable codes (not just HTTP status) so agents can branch on specific failures. "AUTH_TOKEN_EXPIRED" is actionable. "Unauthorized" requires guessing.

Signal: { "error": { "code": "RATE_LIMITED", "message": "..." } }

Consistent Response Envelopes

Every response wraps data the same way — success and error. Agents write one parser instead of a special case per endpoint.

Signal: { "data": {...}, "meta": {...} } or { "error": {...} }

Rate-Limit Headers

X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After tell agents exactly how much budget they have and when it replenishes. Without these, agents hammer until they hit a 429 wall.

Signal: X-RateLimit-Remaining: 47, X-RateLimit-Reset: 1713200400, Retry-After: 30

Cursor-Based Pagination

Opaque cursors survive sorting changes and concurrent writes. Offset pagination silently duplicates or skips rows when data mutates mid-scan — exactly when agents paginate over large sets.

Signal: GET /items?cursor=eyJpZCI6MTIzfQ — response has next_cursor

Idempotency Key Support

Agents retry on network hiccups. Idempotency keys let the server deduplicate — critical for POST /charges, POST /orders, any action that costs money or creates state.

Signal: Idempotency-Key: a4f8... — second POST returns first result

OpenAPI with Realistic Examples

Examples in the OpenAPI spec that an agent can copy, modify, and send. Missing or placeholder examples ("foo", "example@example.com") force the agent to guess at shapes.

Signal: examples: { "charge_card": { "value": { "amount": 2000, ... }}}

What Bottom-Quartile APIs Return vs Top-Quartile

Six real-world scenarios, side-by-side. The left column is what agents encounter at the 199 bottom-quartile sites in our dataset. The right column is the top-decile response — close to what Stripe, Resend, and Supabase actually ship.

Scenario
Bottom-Quartile
Top-Decile
A 500 error
<html><body><h1>500 Internal Server Error</h1>...
{"error":{"code":"INTERNAL","message":"...","request_id":"req_abc"}}
A 401 from an expired token
<html>Please log in</html>
{"error":{"code":"AUTH_TOKEN_EXPIRED","retry_after":0,"request_id":"req_xyz"}}
Rate limit hit
429 Too Many Requests (no headers, no body)
429 + Retry-After: 30 + {"error":{"code":"RATE_LIMITED","retry_after":30}}
Pagination over 10k rows
?page=1&page=2... silent duplicates on writes
?cursor=... — stable, ordered, no duplicates
Network hiccup on POST /orders
Order created twice because retry had no idempotency
Idempotency-Key ensures second POST returns first result
Bug report from agent
"Something went wrong, unclear which request"
"request_id req_01HX... at 14:22:01 UTC returned INTERNAL"

The Stack Trace Problem — Why Bottom Sites Return HTML on 500

The most common D9 failure pattern across the 500 scans is identical: an endpoint that works on the happy path returns the framework's default HTML error page on failure. Flask, Django, Rails, ASP.NET — every framework ships a debug page that leaks internals and breaks every single D9 signal at once.

For an agent, this is a dead end. The response content-type is text/html. The body contains a stack trace. There is no request ID to include in a retry or a support ticket. There is no structured error code to branch on. There is no indication of whether retrying will help or whether the error is permanent.

Worse: the stack trace often leaks sensitive implementation details — database schemas, file paths, library versions — which independently hurts D7 Security. One unhandled exception hits two dimensions at once.

The 20-line fix: Wrap every API route with middleware that catches exceptions and returns a JSON envelope with a request ID. Every major framework supports this. Express: one app.use call. FastAPI: one @app.exception_handler. Next.js: one try-catch in the route handler plus a shared errorResponse() helper. This single commit typically moves D9 up by 2-3 points.

The Canonical Error Shape

If you adopt one pattern from this article, adopt this one. It satisfies four D9 signals at once — structured errors, consistent envelope, request IDs, and rate-limit handling.

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests. Retry in 30 seconds.",
    "retry_after": 30,
    "request_id": "req_01HX2Y7F9P3KQ8VN0SGZWM4R1B",
    "doc_url": "https://example.com/docs/errors#rate_limited"
  }
}

Five fields. Each serves a specific agent need: code for branching, message for logging, retry_after for scheduling, request_id for support tickets, and doc_url for the agent to fetch remediation context. This is the same shape Stripe, GitHub, and Linear all converged on independently. There is no bonus for innovating here.

Related reading: Data Quality and Agent Readiness (D6) covers the structural side of responses — envelope shape, null handling, JSON-LD. D6 and D9 together account for 20% of the full score.

Frequently Asked Questions

What is the difference between D2 API Quality and D9 Agent Experience?

D2 measures whether your API exists and is documented — OpenAPI spec, REST/JSON, auth flow, endpoint coverage. D9 measures whether your API is pleasant to use programmatically — request IDs, structured errors, rate-limit headers, idempotency, pagination quality, realistic examples. D2 is "can an agent call you" (15% weight). D9 is "will the agent keep calling you after the first failure" (10% weight). Together they are 25% of the total score.

Why do stack traces hurt the D9 score so much?

Returning HTML stack traces to API callers fails three signals at once. It breaks the consistent envelope (agents have to detect HTML mid-parse), it leaks internal details (security red flag that some scanners also ding), and it provides no structured code the agent can branch on. A simple wrapper that catches unhandled errors and returns a JSON error envelope with a request ID moves D9 up by 2-3 points on its own. Most backend frameworks let you do this in under 10 lines.

Do I need all 7 signals to get full D9 credit?

No. D9 scores roughly 1.0-1.5 points per signal out of the 10 available, so hitting 5 of 7 lands around 7-8 out of 10. The highest-leverage combo is structured errors + request IDs + rate-limit headers — those three together cover about 60% of the D9 weight because they dominate the debugging and reliability experience. Idempotency keys and cursor pagination are the tie-breakers for top-quartile scores.

Does Stripe pass D9 fully?

Essentially yes — Stripe is a near-reference implementation for D9. They return request IDs on every response (both header and body), use structured error types ("api_error", "card_error"), publish rate-limit information, accept idempotency keys on every POST, and their OpenAPI has curl examples for every endpoint. Their D9 score is close to 10/10. The cap on their overall score (68) comes from D3 and D4, not D9.

Can I improve D9 without changing my existing APIs?

Partially. You can add request IDs, rate-limit headers, and wrap errors through middleware without touching route handlers — that alone typically gets you to ~6/10 on D9. Idempotency and cursor pagination usually require per-endpoint logic because they affect business semantics (what counts as a "duplicate", what field to order by). The fastest agent-experience win is middleware: one commit, 4-5 point D9 lift.


See your D9 breakdown in 60 seconds

AgentHermes scans every endpoint for the seven D9 signals and returns a per-signal pass/fail with the exact remediation path.


Share this article: