Skip to main content
Technical Deep DiveCost Optimization

Token Counting and Agent Readiness: Why Response Size Matters for AI Agent Costs

Every AI agent pays for every token it processes. When an agent calls your API, the response goes through a language model that charges per token — input and output. A Stripe API response costs an agent about $0.0006 to process. A legacy XML response costs $0.006 — ten times more for the same information. The leaner your API, the cheaper agents can use it. The cheaper you are to use, the more agent traffic you get.

AH
AgentHermes Research
April 15, 202611 min read

The Token Economics of Agent API Calls

When a human visits an API documentation page, the page size does not matter — humans scan visually and ignore irrelevant content. When an AI agent processes an API response, every byte matters. The response is tokenized and fed through a language model. More tokens means higher cost, longer processing time, and increased chance of context window overflow.

This creates a new competitive dimension for APIs. Two services that return identical information but different response sizes have dramatically different costs for agent consumers. An agent making 10,000 API calls per day will prefer the service that costs $6 over the one that costs $60 — even if the underlying service is identical.

Provider / Type
Format
Tokens
Cost per Call
Stripe (create charge)
Minimal JSON
~200
$0.0006
Modern REST API (typical)
Clean JSON
~400
$0.0012
Legacy API with envelope
Wrapped JSON
~800
$0.0024
SOAP/XML service
XML with namespaces
~1500
$0.0045
HTML error page
Full HTML document
~2000+
$0.006+

Cost estimates based on: Claude 3.5 Sonnet at $3/million input tokens. GPT-4o at $2.50/million. Actual costs vary by model and provider, but the ratio between lean and bloated responses remains constant: a 10x size difference is a 10x cost difference.

Five Response Bloat Patterns That Cost Agents Money

We analyzed API responses across 500+ businesses in the AgentHermes database. These five patterns account for 90% of unnecessary token consumption.

1

HTML error pages

When an API returns a 500 error as a full HTML page with navigation, footer, and CSS, the agent consumes 2000+ tokens to learn "something went wrong." A structured JSON error uses 30 tokens.

Fix

Return { "error": "message", "code": "ERROR_CODE", "request_id": "abc123" }

Savings

98% token reduction on errors

2

Marketing copy in responses

Some APIs include promotional text, upsell messages, or legal disclaimers in every response body. Agents cannot use this information. It is pure token waste.

Fix

Strip non-functional text from API responses. Marketing belongs on the website, not in the JSON.

Savings

20-40% token reduction per response

3

Deeply nested envelopes

Wrapping data in { "status": "ok", "data": { "response": { "results": { "items": [...] } } } } adds 50+ tokens of structure per response with zero information value.

Fix

Return data at the top level. { "items": [...], "total": 42 }

Savings

10-15% token reduction

4

Unnecessary fields by default

Returning 30 fields when the agent needs 5. User avatar URLs, internal timestamps, deprecated fields, and metadata the agent will never use.

Fix

Support field selection: ?fields=id,name,price,available. Or use sparse fieldsets (JSON:API style).

Savings

50-80% token reduction with field selection

5

Verbose datetime formats

Returning "Wednesday, April 15th, 2026 at 3:00 PM Eastern Standard Time" instead of "2026-04-15T15:00:00-05:00". The human-readable version is 5x the tokens.

Fix

ISO 8601 always. Let the consuming application format for display.

Savings

3-5x reduction on datetime fields

How Token Efficiency Affects Your Agent Readiness Score

The AgentHermes 9-dimension scoring framework does not have a dedicated “token efficiency” dimension — but response quality affects four dimensions that collectively account for 48% of the total score. APIs that return concise, well-structured responses score higher across the board.

D6 Data Quality

Weight: 10%

Concise, well-structured responses score higher. Redundant fields, inconsistent types, and bloated payloads lower D6.

D9 Agent Experience

Weight: 10%

Agent-friendly response formats — minimal JSON, consistent schemas, proper error codes — directly reward lean APIs.

D8 Reliability

Weight: 13%

Smaller responses transfer faster, timeout less, and are less likely to be truncated. Reliability improves mechanically.

D2 API Quality

Weight: 15%

Clean API design naturally produces smaller responses. REST best practices and proper HTTP status codes reduce token waste.

For deeper analysis of the D6 Data Quality dimension and the D9 Agent Experience dimension, see our dedicated deep dives. Both dimensions reward the same outcome: structured, minimal, consistent responses that agents can process efficiently.

The Stripe Benchmark: What Lean Looks Like

Stripe is the gold standard for token-efficient API responses. A typical Stripe charge response contains about 200 tokens of clean JSON: the charge ID, amount, currency, status, and payment method. No HTML. No marketing copy. No unnecessary nesting. Every field serves a purpose.

Compare this to a legacy payment gateway that returns the same charge information wrapped in XML with namespaces, envelope elements, status messages, and verbose field names. The same information — “charge succeeded, here is the ID” — costs 8-10x more tokens to process.

This is not theoretical. Stripe scores 68 on the AgentHermes scale, among the highest we have measured. Their API was not designed for AI agents — it was designed for developers. But the same qualities that make an API developer-friendly make it agent-friendly: clean structure, minimal responses, consistent schemas, and proper error handling.

Stripe-style response (~200 tokens)

{
  "id": "ch_1abc",
  "amount": 2000,
  "currency": "usd",
  "status": "succeeded",
  "payment_method": "pm_card"
}

Legacy-style response (~1500 tokens)

<soap:Envelope xmlns:soap="...">
  <soap:Body>
    <ChargeResponse>
      <Status>
        <Code>200</Code>
        <Message>OK</Message>
        <Description>
          Transaction processed
          successfully...
        </Description>
      </Status>
      <!-- 40 more lines -->
    </ChargeResponse>
  </soap:Body>
</soap:Envelope>

Content Negotiation: Let Agents Choose Their Format

The most agent-friendly approach is content negotiation. Let the caller specify what format they want via the Accept header. An agent sends Accept: application/json and gets minimal JSON. A browser sends Accept: text/html and gets a rendered page. Same endpoint, right format for each consumer.

This also applies to error responses. When an agent receives a 404, it should get { "error": "not_found", "message": "Resource does not exist" } (about 15 tokens), not a full HTML 404 page with navigation, search box, and footer (2000+ tokens). The information content is identical. The token cost is 100x different.

Implementation tip: Check the User-Agent and Accept headers on incoming requests. If the caller identifies as an AI agent or requests application/json, return minimal JSON. This costs nothing to implement and immediately makes your API cheaper for agents to use.

Frequently Asked Questions

How much does a token cost for AI agents?

Token pricing varies by model. GPT-4o charges about $2.50 per million input tokens, Claude 3.5 Sonnet charges about $3 per million input tokens. These costs add up: an agent making 1,000 API calls per day at 2,000 tokens per response spends $5-6/day on input tokens alone. Reducing response size to 200 tokens drops that to $0.50-0.60/day — a 10x savings that makes your API dramatically more attractive to agent developers.

Does response size actually affect agent readiness scores?

Yes. The AgentHermes scoring framework rewards concise, well-structured data through D6 Data Quality and D9 Agent Experience. APIs that return minimal JSON with consistent schemas score higher than APIs that return bloated responses with unnecessary fields. The effect is indirect but measurable — lean APIs typically score 5-10 points higher across these dimensions.

Should I remove fields from my existing API?

Do not remove fields — that breaks existing integrations. Instead, support field selection: let callers specify which fields they want via a query parameter like ?fields=id,name,price. This gives agents the option to request minimal responses while keeping backward compatibility for existing consumers.

What about pagination? Does that affect token costs?

Pagination is critical. An API that returns 1,000 results in one response when the agent only needs the first 10 wastes 99% of tokens. Implement cursor-based pagination with a configurable page size. Default to small pages (10-25 items) and let callers increase if needed. This is both a token optimization and a reliability improvement.

How do I measure my API response token count?

Use a tokenizer library (tiktoken for OpenAI models, or a general BPE tokenizer). Measure your typical endpoint responses. Stripe responses average about 200 tokens. If your equivalent endpoint returns 800+, you have bloat. AgentHermes scans measure response structure and flag common bloat patterns automatically.


How lean are your API responses?

Run a free Agent Readiness Scan to see how your API scores on data quality, response structure, and agent experience. Find out if your responses are costing agents unnecessary tokens.


Share this article: