Caching Strategies for Agent-Ready APIs: When to Cache, What to Invalidate
AI agents are the most repetitive API clients you will ever have. A single agent completing a task might call your get_services endpoint 10 times in one workflow — once for discovery, once for comparison, once for confirmation, and again every time it reconsiders. Without proper caching, each call hits your server and costs the agent tokens to parse the same response. With proper caching, most of those calls return instantly from cache with zero server load. This is not optimization — it is architecture for the agent economy.
Why Agents Are Different: The Repetitive Caller Problem
Human API clients are efficient. A mobile app fetches data once and renders it. A web dashboard loads on page visit and caches in the browser. Humans are lazy callers — they call your API only when they need to.
Agents are the opposite. An agent planning a task calls your API to understand what is available. Then it calls again to verify before making a decision. Then it calls again to confirm before executing. Then it calls again to validate after executing. A single agent completing a single booking might call get_services four times and check_availability six times.
Now multiply that by 100 agents serving 100 users in your area. Without caching, your server handles thousands of redundant requests for the same unchanged data. With proper caching, 90%+ of those calls never reach your origin. The difference is the difference between a $50/month server and a $500/month server — for the same business.
Four Caching Strategies Every Agent-Ready API Needs
There is no single caching strategy. Different endpoints need different approaches based on how often the data changes and how harmful staleness is. Here are the four strategies and when to use each one.
Cache-Control: max-age
Static data that changes infrequently — business info, service menus, team pages, FAQs.
Cache-Control: public, max-age=3600, s-maxage=86400Agent caches for 1 hour. CDN caches for 24 hours. Agent never calls your server for the same business info twice in an hour.
Savings: Up to 95% of redundant calls for catalog data
ETag + If-None-Match
Data that changes sometimes — pricing, product details, menus. Agent checks if data changed before re-downloading.
ETag: "abc123" → Agent sends If-None-Match: "abc123" → 304 Not Modified (zero body transfer)Agent sends the ETag it received last time. If your data has not changed, you return 304 with no body. The agent keeps its cached copy and saves tokens on parsing.
Savings: Token cost: 304 response = ~10 tokens vs full response = 500+ tokens
stale-while-revalidate
Data that should be fresh but where a brief stale window is acceptable — availability slots, inventory counts.
Cache-Control: public, max-age=60, stale-while-revalidate=300Agent gets cached data immediately (fast response) while the CDN revalidates in the background. The agent sees data at most 5 minutes old but never waits for the revalidation.
Savings: Latency: agent gets instant response instead of waiting for your origin server
Cache-Control: no-cache / no-store
Real-time data where staleness is unacceptable — live availability, current inventory, active orders, payment status.
Cache-Control: no-store, must-revalidateNever cache this. Every request hits your origin server. Use this for endpoints where showing stale data causes real harm — double-booking an appointment or selling out-of-stock items.
Savings: Nothing (that is the point) — freshness matters more than performance here
Endpoint Caching Matrix: What to Cache on Every MCP Tool
Here is a practical reference for common MCP tool endpoints. Copy this matrix and adapt it to your specific business. The principle: cache what does not change per-minute, invalidate what does.
The pattern is clear: read-heavy endpoints with slowly changing data get aggressive caching. Write endpoints and real-time availability get no caching. Everything in between uses conditional requests or stale-while-revalidate. This is the same pattern that powers the CDN layer of agent-ready infrastructure — the logic is identical whether caching happens at the CDN edge or in the agent client.
How Caching Impacts Your Agent Readiness Score
AgentHermes scans for caching infrastructure across two dimensions: D8 Reliability (does the API perform consistently under agent load?) and D9 Agent Experience (is the API designed with agent callers in mind?). Proper cache headers are one of the strongest signals of both.
Cache-Control headers present
D8 Reliability score +8-12 pointsAgentHermes scans for proper cache headers on API responses. Their presence indicates a mature, agent-considerate API.
ETag support on GET endpoints
D8 Reliability score +5-8 pointsConditional request support tells AgentHermes the API is designed for efficient repeated access — exactly what agents do.
Consistent cache behavior
D9 Agent Experience score +3-5 pointsWhen cache headers are consistent across endpoints (not random or missing on some), it signals intentional API design.
304 Not Modified responses
D8 Reliability score +3-5 pointsActually returning 304 when data has not changed (not just sending ETags) proves the caching layer works end-to-end.
Total impact: Proper caching can add 19-30 points to your Agent Readiness Score across D8 and D9. For a business scoring 40 (Bronze), adding cache headers alone could push them to 60+ (Silver). This is one of the highest-leverage improvements in the scoring framework — and one of the easiest to implement. Most API frameworks support cache headers with a single middleware addition.
Caching Anti-Patterns That Hurt Agent Readiness
Bad caching is worse than no caching. Here are the patterns we see in scans that actively hurt agent readiness — and what to do instead.
Cache-Control: no-cache on everything
Treating all endpoints as real-time when 80% of your data changes daily at most. This forces every agent call to your origin, multiplying server load by 10x unnecessarily.
Fix: Audit each endpoint. Only truly real-time data needs no-cache.
Missing ETag on large responses
Sending the full 50KB product catalog every time when nothing changed. The agent re-parses identical JSON, burning tokens and adding latency.
Fix: Add ETag based on content hash. Return 304 when unchanged.
Inconsistent headers across endpoints
Cache-Control on /products but not on /services. ETags on /info but not on /pricing. Inconsistency signals unintentional caching rather than designed caching.
Fix: Add caching middleware at the framework level. Every GET endpoint gets a strategy.
Caching personalized responses
Caching responses that include user-specific data (account details, order history) with public cache headers. This leaks data between agent sessions.
Fix: Use Cache-Control: private for authenticated responses. Public only for public data.
The Cost Math: What Proper Caching Saves
Consider a local restaurant with an MCP server handling 1,000 agent interactions per day. Each interaction calls an average of 8 endpoints: business info, menu, hours, availability (x2), pricing, booking, and confirmation. That is 8,000 API calls per day.
Without caching: all 8,000 calls hit your origin server. At 200ms per call, that is 26 minutes of compute time per day. Your server needs to handle 5-6 concurrent requests during peak hours.
With proper caching: business info, menu, hours, and pricing are cached at the CDN. Only availability and booking hit your origin — about 2,500 calls per day, a 69% reduction. The cached calls respond in under 10ms from the CDN edge. Your server handles peak loads that are one-third the size.
For the agent side, each cached response saves the agent from parsing the same JSON again. A 304 Not Modified response is roughly 10 tokens to process versus 500+ tokens for a full response. Across 5,500 cached calls per day, that is 2.7 million tokens saved — real money for agents using latency-sensitive workflows.
Frequently Asked Questions
Do AI agents actually respect Cache-Control headers?
The leading AI agent frameworks (LangChain, AutoGPT, CrewAI) are increasingly implementing HTTP caching. Even when agents themselves do not cache, the CDN layer and any proxy between the agent and your API will. Proper cache headers benefit the entire stack, not just the agent client. Additionally, agent orchestration platforms cache tool responses to avoid redundant calls within a single task — your headers tell them how long that cache is valid.
Should I cache differently for agents versus human API clients?
No. Good caching strategy is good caching strategy regardless of the client. The difference is that agents are more aggressive callers — they may hit the same endpoint 10 times in a single task flow where a human would call it once. This makes proper caching more impactful for agent traffic but the headers and strategies are identical. Do not try to detect agent user-agents and serve different cache headers.
What about Vary headers for agent-specific caching?
Use Vary: Accept to cache different representations (JSON vs HTML) separately. Use Vary: Authorization when responses differ per API key. Do not use Vary: User-Agent — this destroys cache hit rates because every agent framework sends a different user-agent string. The goal is to maximize cache hits, and User-Agent variation works against that.
How does caching interact with rate limiting?
They are complementary. Caching reduces the number of requests that hit your rate limiter. An agent that gets a cached 304 response does not count against rate limits on most API gateways. This means proper caching effectively increases your rate limit capacity without changing the limit — agents can do more work with fewer real requests.
My data changes every few seconds. Can I still use caching?
Yes, with stale-while-revalidate. Even a 10-second max-age with a 60-second stale-while-revalidate window means agents get instant responses for the common case while your origin only gets hit once per 10 seconds per unique resource. For truly real-time data (live auction prices, active order tracking), use no-store and consider WebSocket or SSE instead of polling.
Check your API caching score
See how your API scores on D8 Reliability and D9 Agent Experience. AgentHermes scans for Cache-Control headers, ETag support, and response consistency — free.