Model Fallbacks

Model Fallbacks let you specify multiple candidate models in priority order. If the first model returns an error, times out, or is unavailable, ARouter automatically retries with the next model — transparently, within the same request.

Quick Start

Pass a models array with your priority order:

{
  "model": "openai/gpt-5.4-pro",
  "models": [
    "openai/gpt-5.4-pro",
    "openai/gpt-5.4",
    "anthropic/claude-sonnet-4.6"
  ],
  "messages": [{"role": "user", "content": "Hello"}]
}

The first model in models is always attempted first. If it fails, ARouter tries the next, and so on.

Python
TypeScript
cURL

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4-pro",
    messages=[{"role": "user", "content": "Explain quantum entanglement simply."}],
    extra_body={
        "models": [
            "openai/gpt-5.4-pro",
            "openai/gpt-5.4",
            "anthropic/claude-sonnet-4.6",
            "google/gemini-2.5-pro",
        ]
    },
)

print(f"Model used: {response.model}")
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4-pro",
  messages: [{ role: "user", content: "Explain quantum entanglement simply." }],
  // @ts-expect-error ARouter extension
  models: [
    "openai/gpt-5.4-pro",
    "openai/gpt-5.4",
    "anthropic/claude-sonnet-4.6",
    "google/gemini-2.5-pro",
  ],
});

console.log(`Model used: ${response.model}`);
console.log(response.choices[0].message.content);

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-pro",
    "models": [
      "openai/gpt-5.4-pro",
      "openai/gpt-5.4",
      "anthropic/claude-sonnet-4.6"
    ],
    "messages": [{"role": "user", "content": "Explain quantum entanglement simply."}]
  }'

Fallback Behavior

Scenario	Action
Primary model returns `5xx`	Retry with next model in `models`
Primary model is rate-limited (`429`)	Retry with next model
Primary model is unavailable	Retry with next model
Primary model returns `4xx` (bad request)	Return error immediately (do not retry — the request itself is invalid)
All models fail	Return error from last attempted model

Identifying Which Model Responded

The model field in the response always reflects the model that actually generated the response:

{
  "model": "anthropic/claude-sonnet-4.6",
  "provider": "Anthropic",
  "choices": [...]
}

If model in the response differs from your primary model in the request, a fallback occurred.

Pricing

You are billed for the model that actually responds. If your primary model fails and a fallback model responds, you pay the fallback model’s rate.

{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 180,
    "total_tokens": 205,
    "cost": 0.00041
  }
}

Failed attempts (models that returned errors before a successful fallback) are not charged.

Combining with Provider Fallbacks

models[] (model-level fallbacks) and provider.allow_fallbacks (provider-level fallbacks) operate independently:

{
  "model": "openai/gpt-5.4-pro",
  "models": ["openai/gpt-5.4-pro", "openai/gpt-5.4"],
  "provider": {
    "order": ["Azure", "OpenAI"],
    "allow_fallbacks": true
  }
}

Here, ARouter first tries gpt-5.4-pro via Azure, then gpt-5.4-pro via OpenAI, then gpt-5.4 via Azure, then gpt-5.4 via OpenAI.

Use Cases

High-availability production:

{
  "models": [
    "anthropic/claude-opus-4.5",
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4-pro"
  ]
}

Cost optimization with quality floor:

{
  "models": [
    "openai/gpt-5.4:free",
    "meta-llama/llama-4-maverick:free",
    "openai/gpt-5.4"
  ]
}

Tries free variants first, falls back to paid if needed.

Provider Routing — Provider-level fallbacks with allow_fallbacks
Model Routing — How ARouter selects models
Uptime Optimization — Best practices for reliability

​Quick Start

​Fallback Behavior

​Identifying Which Model Responded

​Pricing

​Combining with Provider Fallbacks

​Use Cases

​Related