Token Counting and Agent Readiness: Why Response Size Matters for AI Agent Costs
Every AI agent pays for every token it processes. When an agent calls your API, the response goes through a language model that charges per token — input and output. A Stripe API response costs an agent about $0.0006 to process. A legacy XML response costs $0.006 — ten times more for the same information. The leaner your API, the cheaper agents can use it. The cheaper you are to use, the more agent traffic you get.
The Token Economics of Agent API Calls
When a human visits an API documentation page, the page size does not matter — humans scan visually and ignore irrelevant content. When an AI agent processes an API response, every byte matters. The response is tokenized and fed through a language model. More tokens means higher cost, longer processing time, and increased chance of context window overflow.
This creates a new competitive dimension for APIs. Two services that return identical information but different response sizes have dramatically different costs for agent consumers. An agent making 10,000 API calls per day will prefer the service that costs $6 over the one that costs $60 — even if the underlying service is identical.
Cost estimates based on: Claude 3.5 Sonnet at $3/million input tokens. GPT-4o at $2.50/million. Actual costs vary by model and provider, but the ratio between lean and bloated responses remains constant: a 10x size difference is a 10x cost difference.
Five Response Bloat Patterns That Cost Agents Money
We analyzed API responses across 500+ businesses in the AgentHermes database. These five patterns account for 90% of unnecessary token consumption.
HTML error pages
When an API returns a 500 error as a full HTML page with navigation, footer, and CSS, the agent consumes 2000+ tokens to learn "something went wrong." A structured JSON error uses 30 tokens.
Fix
Return { "error": "message", "code": "ERROR_CODE", "request_id": "abc123" }
Savings
98% token reduction on errors
Marketing copy in responses
Some APIs include promotional text, upsell messages, or legal disclaimers in every response body. Agents cannot use this information. It is pure token waste.
Fix
Strip non-functional text from API responses. Marketing belongs on the website, not in the JSON.
Savings
20-40% token reduction per response
Deeply nested envelopes
Wrapping data in { "status": "ok", "data": { "response": { "results": { "items": [...] } } } } adds 50+ tokens of structure per response with zero information value.
Fix
Return data at the top level. { "items": [...], "total": 42 }
Savings
10-15% token reduction
Unnecessary fields by default
Returning 30 fields when the agent needs 5. User avatar URLs, internal timestamps, deprecated fields, and metadata the agent will never use.
Fix
Support field selection: ?fields=id,name,price,available. Or use sparse fieldsets (JSON:API style).
Savings
50-80% token reduction with field selection
Verbose datetime formats
Returning "Wednesday, April 15th, 2026 at 3:00 PM Eastern Standard Time" instead of "2026-04-15T15:00:00-05:00". The human-readable version is 5x the tokens.
Fix
ISO 8601 always. Let the consuming application format for display.
Savings
3-5x reduction on datetime fields
How Token Efficiency Affects Your Agent Readiness Score
The AgentHermes 9-dimension scoring framework does not have a dedicated “token efficiency” dimension — but response quality affects four dimensions that collectively account for 48% of the total score. APIs that return concise, well-structured responses score higher across the board.
D6 Data Quality
Weight: 10%
Concise, well-structured responses score higher. Redundant fields, inconsistent types, and bloated payloads lower D6.
D9 Agent Experience
Weight: 10%
Agent-friendly response formats — minimal JSON, consistent schemas, proper error codes — directly reward lean APIs.
D8 Reliability
Weight: 13%
Smaller responses transfer faster, timeout less, and are less likely to be truncated. Reliability improves mechanically.
D2 API Quality
Weight: 15%
Clean API design naturally produces smaller responses. REST best practices and proper HTTP status codes reduce token waste.
For deeper analysis of the D6 Data Quality dimension and the D9 Agent Experience dimension, see our dedicated deep dives. Both dimensions reward the same outcome: structured, minimal, consistent responses that agents can process efficiently.
The Stripe Benchmark: What Lean Looks Like
Stripe is the gold standard for token-efficient API responses. A typical Stripe charge response contains about 200 tokens of clean JSON: the charge ID, amount, currency, status, and payment method. No HTML. No marketing copy. No unnecessary nesting. Every field serves a purpose.
Compare this to a legacy payment gateway that returns the same charge information wrapped in XML with namespaces, envelope elements, status messages, and verbose field names. The same information — “charge succeeded, here is the ID” — costs 8-10x more tokens to process.
This is not theoretical. Stripe scores 68 on the AgentHermes scale, among the highest we have measured. Their API was not designed for AI agents — it was designed for developers. But the same qualities that make an API developer-friendly make it agent-friendly: clean structure, minimal responses, consistent schemas, and proper error handling.
Stripe-style response (~200 tokens)
{
"id": "ch_1abc",
"amount": 2000,
"currency": "usd",
"status": "succeeded",
"payment_method": "pm_card"
}Legacy-style response (~1500 tokens)
<soap:Envelope xmlns:soap="...">
<soap:Body>
<ChargeResponse>
<Status>
<Code>200</Code>
<Message>OK</Message>
<Description>
Transaction processed
successfully...
</Description>
</Status>
<!-- 40 more lines -->
</ChargeResponse>
</soap:Body>
</soap:Envelope>Content Negotiation: Let Agents Choose Their Format
The most agent-friendly approach is content negotiation. Let the caller specify what format they want via the Accept header. An agent sends Accept: application/json and gets minimal JSON. A browser sends Accept: text/html and gets a rendered page. Same endpoint, right format for each consumer.
This also applies to error responses. When an agent receives a 404, it should get { "error": "not_found", "message": "Resource does not exist" } (about 15 tokens), not a full HTML 404 page with navigation, search box, and footer (2000+ tokens). The information content is identical. The token cost is 100x different.
Implementation tip: Check the User-Agent and Accept headers on incoming requests. If the caller identifies as an AI agent or requests application/json, return minimal JSON. This costs nothing to implement and immediately makes your API cheaper for agents to use.
Frequently Asked Questions
How much does a token cost for AI agents?
Token pricing varies by model. GPT-4o charges about $2.50 per million input tokens, Claude 3.5 Sonnet charges about $3 per million input tokens. These costs add up: an agent making 1,000 API calls per day at 2,000 tokens per response spends $5-6/day on input tokens alone. Reducing response size to 200 tokens drops that to $0.50-0.60/day — a 10x savings that makes your API dramatically more attractive to agent developers.
Does response size actually affect agent readiness scores?
Yes. The AgentHermes scoring framework rewards concise, well-structured data through D6 Data Quality and D9 Agent Experience. APIs that return minimal JSON with consistent schemas score higher than APIs that return bloated responses with unnecessary fields. The effect is indirect but measurable — lean APIs typically score 5-10 points higher across these dimensions.
Should I remove fields from my existing API?
Do not remove fields — that breaks existing integrations. Instead, support field selection: let callers specify which fields they want via a query parameter like ?fields=id,name,price. This gives agents the option to request minimal responses while keeping backward compatibility for existing consumers.
What about pagination? Does that affect token costs?
Pagination is critical. An API that returns 1,000 results in one response when the agent only needs the first 10 wastes 99% of tokens. Implement cursor-based pagination with a configurable page size. Default to small pages (10-25 items) and let callers increase if needed. This is both a token optimization and a reliability improvement.
How do I measure my API response token count?
Use a tokenizer library (tiktoken for OpenAI models, or a general BPE tokenizer). Measure your typical endpoint responses. Stripe responses average about 200 tokens. If your equivalent endpoint returns 800+, you have bloat. AgentHermes scans measure response structure and flag common bloat patterns automatically.
How lean are your API responses?
Run a free Agent Readiness Scan to see how your API scores on data quality, response structure, and agent experience. Find out if your responses are costing agents unnecessary tokens.