Skip to main content
Model Fallbacks let you specify multiple candidate models in priority order. If the first model returns an error, times out, or is unavailable, ARouter automatically retries with the next model — transparently, within the same request.

Quick Start

Pass a models array with your priority order:
{
  "model": "openai/gpt-5.4-pro",
  "models": [
    "openai/gpt-5.4-pro",
    "openai/gpt-5.4",
    "anthropic/claude-sonnet-4.6"
  ],
  "messages": [{"role": "user", "content": "Hello"}]
}
The first model in models is always attempted first. If it fails, ARouter tries the next, and so on.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4-pro",
    messages=[{"role": "user", "content": "Explain quantum entanglement simply."}],
    extra_body={
        "models": [
            "openai/gpt-5.4-pro",
            "openai/gpt-5.4",
            "anthropic/claude-sonnet-4.6",
            "google/gemini-2.5-pro",
        ]
    },
)

print(f"Model used: {response.model}")
print(response.choices[0].message.content)

Fallback Behavior

ScenarioAction
Primary model returns 5xxRetry with next model in models
Primary model is rate-limited (429)Retry with next model
Primary model is unavailableRetry with next model
Primary model returns 4xx (bad request)Return error immediately (do not retry — the request itself is invalid)
All models failReturn error from last attempted model

Identifying Which Model Responded

The model field in the response always reflects the model that actually generated the response:
{
  "model": "anthropic/claude-sonnet-4.6",
  "provider": "Anthropic",
  "choices": [...]
}
If model in the response differs from your primary model in the request, a fallback occurred.

Pricing

You are billed for the model that actually responds. If your primary model fails and a fallback model responds, you pay the fallback model’s rate.
{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 180,
    "total_tokens": 205,
    "cost": 0.00041
  }
}
Failed attempts (models that returned errors before a successful fallback) are not charged.

Combining with Provider Fallbacks

models[] (model-level fallbacks) and provider.allow_fallbacks (provider-level fallbacks) operate independently:
{
  "model": "openai/gpt-5.4-pro",
  "models": ["openai/gpt-5.4-pro", "openai/gpt-5.4"],
  "provider": {
    "order": ["Azure", "OpenAI"],
    "allow_fallbacks": true
  }
}
Here, ARouter first tries gpt-5.4-pro via Azure, then gpt-5.4-pro via OpenAI, then gpt-5.4 via Azure, then gpt-5.4 via OpenAI.

Use Cases

High-availability production:
{
  "models": [
    "anthropic/claude-opus-4.5",
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4-pro"
  ]
}
Cost optimization with quality floor:
{
  "models": [
    "openai/gpt-5.4:free",
    "meta-llama/llama-4-maverick:free",
    "openai/gpt-5.4"
  ]
}
Tries free variants first, falls back to paid if needed.