Skip to main content
Technical GuideExecutive

The CTO's Guide to Agent Readiness: Technical Decisions That Impact Your Score

Your Agent Readiness Score is not random. It is the direct result of 10 architectural decisions your engineering team made (or did not make). Each one maps to a specific dimension with a specific point impact. A CTO who reads this article can estimate their score before running a single scan.

AH
AgentHermes Research
April 15, 202615 min read

Why Architecture Determines Agent Readiness

After scanning 500 businesses, one pattern is clear: agent readiness is an architecture outcome, not a marketing choice. The businesses that score Silver and Gold did not set out to be “agent-ready.” They made sound API architecture decisions that happen to be exactly what AI agents need.

Stripe scores 68. Resend scores 75. Vercel scores 70. None of them has an “agent readiness team.” They have engineering teams that chose API-first architecture, published OpenAPI specs, used Bearer auth, returned structured errors, and built status pages. These are CTO decisions.

Conversely, businesses that score under 20 made the opposite choices: website-first architecture, session cookies, HTML error pages, no API documentation. These are also CTO decisions — or decisions made by not having a CTO at all.

Here are the 10 decisions, each mapped to the dimension it impacts and the approximate points it contributes. Read through them, tally your answers, and you will have a reasonable estimate of your score before running a scan.

The 10 Decisions

#1: API-First vs Website-First Architecture

D2 API Quality|Weight: 15%|Impact: +15-25 pts
Wrong choice

Website-first: build HTML pages, add API later (or never)

Right choice

API-first: build the API, then build the website on top of it

API-first businesses score 45+ on D2 alone. Website-first businesses score 0-5. This is the single biggest architectural decision for agent readiness because D2 carries the highest weight of any dimension.

#2: OpenAPI Spec vs Ad-Hoc Documentation

D1 Discoverability + D2 API Quality|Weight: 27%|Impact: +10-18 pts
Wrong choice

Ad-hoc docs: Markdown pages, Notion wiki, or PDF guides

Right choice

OpenAPI 3.0+ spec: machine-readable, auto-discoverable, generates client SDKs

An OpenAPI spec at /openapi.json or /swagger.json is the single most impactful file you can publish. Agents parse it instantly to understand every endpoint, parameter, and response shape. Ad-hoc docs require LLMs to interpret natural language — slower, less reliable, and prone to hallucinated endpoints.

#3: Bearer Token Auth vs Session Cookies

D7 Security|Weight: 12%|Impact: +8-12 pts
Wrong choice

Session cookies: set-cookie header, CSRF tokens, browser-dependent state

Right choice

Bearer tokens: Authorization: Bearer <token> header, stateless, machine-friendly

AI agents do not have browsers. They cannot store cookies, handle CSRF tokens, or maintain session state. Bearer token authentication is the only auth pattern that works reliably for agent-to-API communication. OAuth 2.0 client_credentials flow is the gold standard.

#4: Structured JSON Errors vs HTML Error Pages

D6 Data Quality + D9 Agent Experience|Weight: 20%|Impact: +6-10 pts
Wrong choice

HTML 500 page: pretty for humans, meaningless to agents

Right choice

JSON errors: { "error": "message", "code": "INVALID_PARAM", "request_id": "abc" }

When an agent hits an error, it needs three things: what went wrong (error), a machine-parseable code (code), and a way to reference the failure (request_id). HTML error pages provide none of these. Agents that receive HTML errors cannot self-correct — they either retry blindly or give up.

#5: Cursor Pagination vs Offset Pagination

D9 Agent Experience|Weight: 10%|Impact: +2-4 pts
Wrong choice

Offset: ?page=2&limit=20 — breaks when data changes between pages

Right choice

Cursor: ?after=abc123&limit=20 — stable, no skipped or duplicated records

Agents iterate through datasets automatically, often processing thousands of records. Offset pagination causes duplicate or skipped items when records are inserted or deleted during iteration. Cursor-based pagination is deterministic regardless of data changes. Agents trust cursor pagination; they work around offset pagination.

#6: Webhook Events vs Polling

D9 Agent Experience|Weight: 10%|Impact: +3-5 pts
Wrong choice

Polling: agents must call GET /resource every N seconds to check for changes

Right choice

Webhooks: POST to agent endpoint when state changes, with HMAC signing and retry

Polling burns agent compute budget and misses events between intervals. Webhooks push state changes in real time. AgentHermes checks for webhook documentation, event catalog, HMAC signature verification, and retry logic. The top scorers (Stripe, GitHub, Slack) all publish comprehensive webhook systems.

#7: Sandbox Mode vs Production-Only

D3 Onboarding|Weight: 8%|Impact: +4-7 pts
Wrong choice

Production-only: agents must use real data and real money to test integrations

Right choice

Sandbox mode: test credentials, fake data, same API surface, no real consequences

Agents will not risk real transactions while learning your API. Stripe test mode (sk_test_*) is the gold standard: identical API behavior with fake money. Without a sandbox, the agent integration cost is too high — one wrong API call with real data is unrecoverable.

#8: Public Status Page vs No Monitoring

D8 Reliability|Weight: 13%|Impact: +4-8 pts
Wrong choice

No public monitoring: agents discover outages by hitting errors

Right choice

Status page: status.domain.com or /status with uptime history and incident log

Before delegating work to your API, agents check if you are operational. A status page at a well-known URL (status.domain.com, /health, /status) lets agents make this check instantly. Without one, agents must infer reliability from error rates — and they learn quickly which APIs to avoid.

#9: Versioned APIs vs Breaking Changes

D8 Reliability + D9 Agent Experience|Weight: 23%|Impact: +3-6 pts
Wrong choice

Unversioned: endpoints change behavior without notice, breaking agent integrations

Right choice

Versioned: /v1/ prefix, Accept-Version header, 2-year backward compatibility

Agents hardcode API interaction patterns. A breaking change that renames a field or restructures a response causes agent failures that are never manually fixed — the agent just stops using your API. Stripe maintains backward compatibility for years. Unversioned APIs permanently lose agent traffic after the first breaking change.

#10: MCP Server + Agent Card vs Nothing

Agent-Native Bonus|Weight: 7%|Impact: +5-8 pts
Wrong choice

No agent-native infrastructure: rely entirely on OpenAPI discovery

Right choice

MCP server with tools + agent-card.json at /.well-known/ for A2A discovery

The agent-native bonus is the newest scoring dimension. It rewards businesses that go beyond APIs to provide MCP servers (tool-calling protocol for agents), agent-card.json (A2A discovery), and llms.txt (LLM-readable business summary). Only 2 of 500 businesses scanned have any of these. Shipping all three is 30 minutes of work for +5-8 points.

Estimate Your Score

Count how many of the 10 decisions you have made correctly. Here is where you likely land:

0-19
ARL-0 Dark

Website-only, no API, session cookies, HTML errors. Completely invisible to agents.

20-39
ARL-1 Visible

Has an API but no spec. Basic auth. No sandbox, no status page, no versioning. Agents can find you but struggle to use you.

40-59
Bronze

OpenAPI spec published. Bearer auth. Some structured errors. Missing webhooks, sandbox, and agent-native files.

60-74
Silver

Full OpenAPI + Bearer + structured errors + status page + versioned API + webhooks. Missing MCP server and agent-card.json.

75-89
Gold

All 10 decisions made correctly. MCP server + agent-card + llms.txt + sandbox + cursor pagination + HMAC webhooks.

90-100
Platinum

Gold + x402 micropayments + sub-100ms p95 latency + automated onboarding + multi-protocol support. Nobody has achieved this yet.

The math: If you made 0-2 correct decisions, expect 0-25. Three to five correct decisions typically produce 30-50. Six to eight put you in 50-70. All 10 correct decisions push toward 75+. The exact score depends on implementation quality (not just presence), but the decisions themselves account for 70-80% of the variance across 500 scans.

What the Top Scorers All Have in Common

Every business scoring Silver or above in our 500-business scan shares the same foundation: API-first architecture with published OpenAPI spec, Bearer token auth, structured JSON errors, a status page, and versioned endpoints. That is decisions 1, 2, 3, 4, 8, and 9 — accounting for 70% of the scoring weight.

The top scorer (Resend, 75 Gold) has all 10. The next tier (Vercel 70, Supabase 69, Stripe 68) has 8-9 of 10. The difference between Silver and Gold is always the agent-native files: MCP server, agent-card.json, and llms.txt. These take an afternoon to ship and represent the easiest path from Silver to Gold.

Meanwhile, the Fortune 500 averages 37 — below Bronze. Not because they lack engineering resources, but because their architecture was built for human-first web experiences. The decision to be website-first (Decision 1 wrong) caps everything else.

75
Gold
Resend
10/10 decisions
70
Silver
Vercel
9/10 decisions
68
Silver
Stripe
9/10 decisions
37
Below Bronze
Fortune 500 avg
3-4/10 decisions

The Priority Order: What to Ship First

If you are starting from zero, the implementation order matters. Here is the sequence that produces the fastest score improvement based on weight-per-effort:

P0

API-first architecture (Decision 1)

Effort: 2-6 monthsFoundation — nothing else works without this
P1

OpenAPI spec (Decision 2)

Effort: 1-3 days+10-18 pts across D1 and D2
P1

Bearer token auth (Decision 3)

Effort: 1-2 days+8-12 pts on D7
P2

Structured JSON errors (Decision 4)

Effort: 2-4 hours+6-10 pts across D6 and D9
P2

Status page (Decision 8)

Effort: 1-2 hours+4-8 pts on D8
P3

Agent-native files (Decision 10)

Effort: 2-4 hours+5-8 pts agent bonus

Frequently Asked Questions

How long does it take to implement all 10 decisions?

It depends on your starting point. If you already have an API, adding an OpenAPI spec (Decision 2), structured errors (Decision 4), and agent-card.json (Decision 10) takes a single sprint — maybe 2-3 days of engineering time. If you are website-only (Decision 1), the API-first migration is the foundation and takes 2-6 months depending on complexity. The good news: each decision is independent. You can ship them in any order and see incremental score improvements after each one.

Which decision should I prioritize first?

If you have no API: Decision 1 (API-first) is the prerequisite for everything else. If you have an API but no spec: Decision 2 (OpenAPI) is the highest-leverage single file you can ship. If you already have an OpenAPI spec: Decision 10 (MCP + agent-card) is the fastest path to the agent-native bonus that separates Silver from Gold.

How accurate is the score estimate in this article?

The estimates are based on scoring 500 businesses and identifying the patterns that separate each tier. Your actual score also depends on factors like response latency, documentation quality, and pricing transparency that are not covered by these 10 decisions. However, these 10 decisions account for roughly 70-80% of the variance between high and low scorers. Run a free scan at /audit to see your exact score.

Do I need to be API-first to score well?

Technically no, but practically yes. The highest-scoring non-API-first business in our 500-business scan scored 38 — below Bronze. API-first architecture is not just one decision; it is the foundation that makes every other decision possible. Without callable endpoints, there is nothing for agents to discover, authenticate against, or use. The score caps at 29 without callable endpoints.


See your actual score

You have estimated it. Now verify it. Run a free Agent Readiness Scan and see exactly how your architecture maps to all 9 dimensions. 60 seconds. No signup.


Share this article: