Documentation Index
Fetch the complete documentation index at: https://docs.arouter.ai/llms.txt
Use this file to discover all available pages before exploring further.
Model Fallbacks let you specify multiple candidate models in priority order. If the first model returns an error, times out, or is unavailable, ARouter automatically retries with the next model — transparently, within the same request.
Quick Start
Pass a models array with your priority order:
{
"model": "openai/gpt-5.4-pro",
"models": [
"openai/gpt-5.4-pro",
"openai/gpt-5.4",
"anthropic/claude-sonnet-4.6"
],
"messages": [{"role": "user", "content": "Hello"}]
}
The first model in models is always attempted first. If it fails, ARouter tries the next, and so on.
from openai import OpenAI
client = OpenAI(
base_url="https://api.arouter.ai/v1",
api_key="lr_live_xxxx",
)
response = client.chat.completions.create(
model="openai/gpt-5.4-pro",
messages=[{"role": "user", "content": "Explain quantum entanglement simply."}],
extra_body={
"models": [
"openai/gpt-5.4-pro",
"openai/gpt-5.4",
"anthropic/claude-sonnet-4.6",
"google/gemini-2.5-pro",
]
},
)
print(f"Model used: {response.model}")
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.arouter.ai/v1",
apiKey: "lr_live_xxxx",
});
const response = await client.chat.completions.create({
model: "openai/gpt-5.4-pro",
messages: [{ role: "user", content: "Explain quantum entanglement simply." }],
// @ts-expect-error ARouter extension
models: [
"openai/gpt-5.4-pro",
"openai/gpt-5.4",
"anthropic/claude-sonnet-4.6",
"google/gemini-2.5-pro",
],
});
console.log(`Model used: ${response.model}`);
console.log(response.choices[0].message.content);
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4-pro",
"models": [
"openai/gpt-5.4-pro",
"openai/gpt-5.4",
"anthropic/claude-sonnet-4.6"
],
"messages": [{"role": "user", "content": "Explain quantum entanglement simply."}]
}'
Fallback Behavior
| Scenario | Action |
|---|
Primary model returns 5xx | Retry with next model in models |
Primary model is rate-limited (429) | Retry with next model |
| Primary model is unavailable | Retry with next model |
Primary model returns 4xx (bad request) | Return error immediately (do not retry — the request itself is invalid) |
| All models fail | Return error from last attempted model |
Identifying Which Model Responded
The model field in the response always reflects the model that actually generated the response:
{
"model": "anthropic/claude-sonnet-4.6",
"provider": "Anthropic",
"choices": [...]
}
If model in the response differs from your primary model in the request, a fallback occurred.
Pricing
You are billed for the model that actually responds. If your primary model fails and a fallback model responds, you pay the fallback model’s rate.
{
"usage": {
"prompt_tokens": 25,
"completion_tokens": 180,
"total_tokens": 205,
"cost": 0.00041
}
}
Failed attempts (models that returned errors before a successful fallback) are not charged.
Combining with Provider Fallbacks
models[] (model-level fallbacks) and provider.allow_fallbacks (provider-level fallbacks) operate independently:
{
"model": "openai/gpt-5.4-pro",
"models": ["openai/gpt-5.4-pro", "openai/gpt-5.4"],
"provider": {
"order": ["Azure", "OpenAI"],
"allow_fallbacks": true
}
}
Here, ARouter first tries gpt-5.4-pro via Azure, then gpt-5.4-pro via OpenAI, then gpt-5.4 via Azure, then gpt-5.4 via OpenAI.
Use Cases
High-availability production:
{
"models": [
"anthropic/claude-opus-4.5",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4-pro"
]
}
Cost optimization with quality floor:
{
"models": [
"openai/gpt-5.4:free",
"meta-llama/llama-4-maverick:free",
"openai/gpt-5.4"
]
}
Tries free variants first, falls back to paid if needed.