Inspecting Cache Usage
Cache usage is reflected in theusage object of every response:
| Field | Description |
|---|---|
prompt_tokens_details.cached_tokens | Tokens read from cache (cache hit — cheaper) |
prompt_tokens_details.cache_write_tokens | Tokens written to cache this request (one-time write cost) |
OpenAI Automatic Caching
OpenAI caches prompt prefixes automatically. No special request configuration is needed. How it works:- Caching happens server-side at OpenAI, triggered automatically when prompts are long enough
- Minimum prompt length: 1,024 tokens
- Cache entries expire after ~1 hour of inactivity
- Cached tokens are charged at a reduced rate (typically 50% discount)
Anthropic Claude Prompt Caching
Anthropic supports two caching modes:- Automatic caching (default): Claude caches the system prompt automatically. Minimum 1,024 tokens.
- Explicit caching (
cache_control): You mark specific content blocks with"cache_control": {"type": "ephemeral"}to control exactly what gets cached.
Cache TTL
| Cache Type | TTL |
|---|---|
| Automatic | 5 minutes |
Explicit (ephemeral) | 1 hour (Claude 3.5+) or 5 minutes (Claude 3) |
Supported Models
| Model | Min Tokens (text) | Min Tokens (images) |
|---|---|---|
anthropic/claude-sonnet-4.6 | 1,024 | 1,024 |
anthropic/claude-opus-4.5 | 1,024 | 1,024 |
anthropic/claude-haiku-3.5 | 2,048 | 2,048 |
anthropic/claude-3-5-sonnet | 1,024 | 1,024 |
Explicit Caching Example
Mark content withcache_control to control caching at the content-block level:
extra_body:
- Python (OpenAI)
- Node.js (OpenAI)
- Anthropic SDK
DeepSeek Automatic Caching
DeepSeek caches prompt prefixes automatically, similar to OpenAI. No configuration needed.Google Gemini Prompt Caching
Gemini supports both implicit (automatic) and explicit caching.Implicit Caching
Gemini 2.5 Flash and Pro cache large contexts automatically at no extra cost. Cache hits are visible in the response usage.Explicit Caching via Native Gemini API
For fine-grained control, use the native GeminicachedContents API. You create a cache object and reference it in subsequent requests:
name field (e.g., cachedContents/abc123) you reference in subsequent requests: