Skip to main content
The service_tier parameter lets you express a preference for how ARouter and the upstream provider should balance cost and latency for a request.

Usage

{
  "model": "openai/gpt-5.4",
  "messages": [{"role": "user", "content": "Hello"}],
  "service_tier": "auto"
}
ValueDescription
"auto" (default)Provider picks the appropriate tier based on availability
"default"Standard tier — best cost-performance balance
"flex"Reduced cost, best-effort latency — ideal for batch workloads

Provider Support

Service tier is currently supported by:
ProviderSupported valuesNotes
OpenAI"auto", "default", "flex""flex" offers reduced pricing for batch-like workloads
OthersIgnoredPassed through but has no effect

Response

The service_tier used is echoed back in the response:
{
  "id": "gen-...",
  "model": "openai/gpt-5.4",
  "service_tier": "default",
  "choices": [...],
  "usage": {...}
}

Use Cases

Batch processing (cost-optimized):
# Process large volumes at reduced cost
for document in documents:
    response = client.chat.completions.create(
        model="openai/gpt-5.4",
        messages=[{"role": "user", "content": f"Summarize: {document}"}],
        extra_body={"service_tier": "flex"},
    )
Interactive applications (latency-optimized):
# Real-time user-facing responses
response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": user_message}],
    extra_body={"service_tier": "default"},
)