Skip to main content
Technical Deep DiveD8 Reliability

API Latency Benchmarks for Agent Readiness: What p95 Response Times Score Silver

AgentHermes D8 Reliability carries the highest single dimension weight at 0.13. And the biggest factor within D8 is how fast your API responds. Our scan data from 500+ businesses reveals clear thresholds: p95 under 200ms puts you in the top tier. Over 2 seconds, and agents start timing out. The good news — the most impactful fix costs zero dollars.

AH
AgentHermes Research
April 15, 202613 min read

Why Milliseconds Matter in the Agent Economy

Human users tolerate slow websites. They wait for pages to load, watch spinners, and retry when things fail. AI agents do not. An agent executing a multi-step workflow makes 3-10 tool calls per interaction. If each call takes 2 seconds, the total interaction takes 20 seconds — which exceeds the patience threshold of both the agent framework and the human waiting for results.

This is why AgentHermes weights D8 Reliability at 0.13 — the highest single dimension in the scoring model. And within D8, response latency is the most measurable and impactful signal. We measure time-to-first-byte on every scan, across every discoverable endpoint, and use the p95 (95th percentile) as the scoring input.

The data is unambiguous: fast APIs score high. Slow APIs score low. There is no Silver-scoring business in our dataset with p95 over 200ms.

<200ms
Top-tier p95 target
0.13
D8 Reliability weight
1.5s
Median local biz TTFB
60-80%
CDN improvement

The Five Latency Tiers

Based on our scan data from 500+ businesses, API response times cluster into five distinct performance tiers. Each tier has a direct impact on your D8 Reliability score.

Platinum

p95 < 100ms
Full D8 latency points (100%)

Edge-served responses. CDN cache hits, static JSON, or edge-computed endpoints. Only achievable with a CDN or edge runtime like Cloudflare Workers or Vercel Edge Functions.

Examples: Vercel, Cloudflare, Stripe API docs

Silver

p95 100-200ms
Near-full D8 latency points (90%)

Fast origin responses. Well-optimized servers with connection pooling, query caching, and efficient serialization. CDN-assisted but not fully edge-cached.

Examples: Supabase, GitHub API, well-tuned Next.js

Acceptable

p95 200-500ms
Partial D8 latency points (60-70%)

Standard server response times. Database queries without caching, moderate serialization overhead, shared hosting with decent specs. No CDN penalty but no bonus either.

Examples: Most SaaS APIs, managed WordPress

Penalty Zone

p95 500ms-2s
Reduced D8 latency points (30-50%)

Slow responses that degrade agent experience. Cold starts, unoptimized database queries, no connection pooling, shared hosting with resource contention. Agents will deprioritize or timeout.

Examples: Cheap shared hosting, unoptimized WordPress

Failure Zone

p95 > 2s
Minimal or zero D8 latency points

Response times that cause agent timeouts. Most agent frameworks default to 5-10 second timeouts per tool call. A 2+ second response means the agent spends its entire budget on one request — and may abandon it entirely.

Examples: Overloaded shared hosting, no-cache CMS

What Our Scans Reveal

We analyzed time-to-first-byte data across every scan in our database. Four findings stand out.

Top scorers all respond sub-200ms

Every business in our dataset scoring Silver (60+) or above has p95 response times under 200ms. Without exception. Fast APIs and high agent readiness scores are perfectly correlated in our data.

Most local businesses: 1-3 second response times

The median local business website takes 1.5 seconds for time-to-first-byte. These are shared hosting environments running WordPress with 20+ plugins, no CDN, no caching layer, and no API endpoints — just slow HTML page renders.

CDN adoption is the single biggest lever

Businesses that added Cloudflare (free tier) saw p95 drop by 60-80% on average. A site going from 1.8s to 350ms just by putting a CDN in front. No code changes, no server upgrades, no database optimization — just DNS routing.

API endpoints are 2-5x slower than page loads

For businesses that do have API endpoints, those endpoints are typically 2-5x slower than their static pages. Dynamic database queries, no response caching, no connection pooling. The API is an afterthought.

The Agent Timeout Problem

Most agent frameworks impose strict timeout budgets. LangChain defaults to 10 seconds per tool call. CrewAI uses 30 seconds for complex tools. Claude's tool calling has built-in timeout handling. When your API takes 3 seconds to respond, the agent has already used 30% of its timeout budget on a single call.

Worse, agents learn. Modern agent frameworks track tool reliability metrics. If your endpoint times out once, the agent retries. If it times out repeatedly, the agent deprioritizes your tool in favor of faster alternatives. This is not theoretical — it is how production agent systems handle unreliable tools.

The math is simple. An agent with a 10-second budget can make:

  • 100 calls at 100ms each
  • 50 calls at 200ms each
  • 5 calls at 2 seconds each

The business with 100ms responses gets rich agent interactions. The business with 2-second responses gets one tool call and a fallback to web search.

Four Fixes Ranked by Impact per Dollar

Every fix below is ranked by the ratio of latency improvement to cost. Start with Cloudflare — it has the highest impact for zero cost.

Cloudflare Free Tier

60-80% p95 reductionCost: $0Effort: 15 minutes

Point your DNS to Cloudflare. They cache static assets, optimize TLS handshakes, and serve from 300+ edge locations. The single highest-ROI change for API latency. Works with any hosting provider.

Response Caching Headers

40-60% p95 reduction on repeat requestsCost: $0Effort: 30 minutes

Add Cache-Control headers to API responses. Business info, pricing, and service catalogs rarely change — cache them for 5-60 minutes. Agents making repeated calls get instant responses from the CDN edge.

Database Connection Pooling

30-50% p95 reduction on API callsCost: $0-20/moEffort: 1-2 hours

Each API request opening a new database connection adds 50-200ms. Connection pooling (PgBouncer, Supabase pooler, Prisma Accelerate) reuses connections. The fix is often a single environment variable change.

Edge Functions

70-90% p95 reductionCost: $0-20/moEffort: 2-4 hours

Move read-heavy API endpoints to edge runtimes (Cloudflare Workers, Vercel Edge Functions, Deno Deploy). Code runs in 300+ locations worldwide. Sub-50ms responses become standard. Best for structured data endpoints that agents call most.

The $0 path to Silver-tier latency: Cloudflare free tier plus response caching headers. These two changes alone can take a typical local business from 1.5-second p95 to under 300ms — crossing into the Acceptable tier and dramatically improving D8 Reliability scores. Total cost: $0. Total effort: under 45 minutes. Read our CDN and caching guide for step-by-step instructions.

How to Measure Your Own Latency

Before optimizing, measure. Here are three ways to check your current p95 response time.

1

Run an AgentHermes scan

Visit /audit and enter your URL. The scan report includes TTFB measurements and shows exactly where your latency stands relative to scoring thresholds. This is the fastest way to see your current D8 score.

2

Check browser DevTools

Open Chrome DevTools Network tab, reload your page, and look at the Waiting (TTFB) column. Do this 5-10 times and take the worst measurement — that approximates your p95.

3

Use WebPageTest

webpagetest.org runs from multiple global locations and provides detailed TTFB breakdowns including DNS, TCP, TLS, and server processing time. Test from 3+ locations to understand geographic variance.

The key insight from our data: latency improvements have diminishing returns for scoring, but massive returns for agent usability. Going from 2 seconds to 500ms is the most impactful change. Going from 500ms to 200ms crosses the Silver threshold. Going from 200ms to 50ms has minimal scoring impact but makes your API a top-tier agent target. Learn more about HTTP/2 and HTTP/3 protocols for additional latency optimization techniques.

Frequently Asked Questions

What exactly does AgentHermes measure for latency?

AgentHermes measures time-to-first-byte (TTFB) on every scan request. This is the time from when the HTTP request is sent to when the first byte of the response arrives. We measure this across multiple endpoints — the homepage, any API endpoints discovered, and standard paths like /api, /.well-known/agent-card.json, and /health. The p95 (95th percentile) of these measurements feeds into the D8 Reliability dimension.

How much does latency affect my overall agent readiness score?

D8 Reliability carries a 0.13 weight — the highest single dimension weight in the scoring model. Latency is one component of D8 alongside uptime, HTTP/2 support, and error rate consistency. A site with 2+ second response times can lose up to 8-10 points on the overall 100-point score from latency alone. That is often the difference between Bronze and Not Scored.

Why do agent frameworks care about response time?

Agent frameworks like LangChain, CrewAI, and AutoGen execute multi-step tool chains. A typical agent interaction might call 3-5 tools sequentially. If each tool call takes 2 seconds, the total interaction takes 10+ seconds — which exceeds most user patience thresholds. Agents naturally prefer faster tools because they can complete more complex workflows within reasonable time budgets. A 100ms API lets an agent make 50 calls in 5 seconds. A 2-second API allows only 2.

Is Cloudflare really free?

Yes. Cloudflare's free tier includes CDN caching, DDoS protection, SSL, HTTP/2, and basic analytics. It handles the vast majority of use cases for small and medium businesses. The free tier has no bandwidth limits. Premium features like image optimization, WAF rules, and Workers are paid, but the CDN caching that drops your p95 by 60-80% is completely free.

My API has fast response times but my website is slow. Does that matter?

For agent readiness, the API response time is what matters most. Agents do not render your website — they call your endpoints. However, AgentHermes scans both website and API endpoints during a scan, and the overall TTFB metric combines both. If your API is fast but your website is slow, you will still get partial credit. The recommended approach is to make everything fast — Cloudflare in front of your entire domain handles both.


How fast is your API?

Run a free Agent Readiness Scan and see your TTFB measurements, D8 Reliability score, and exactly where your latency falls in our benchmark tiers.


Share this article: