Quick Start
Pass amodels array with your priority order:
models is always attempted first. If it fails, ARouter tries the next, and so on.
- Python
- TypeScript
- cURL
Fallback Behavior
| Scenario | Action |
|---|---|
Primary model returns 5xx | Retry with next model in models |
Primary model is rate-limited (429) | Retry with next model |
| Primary model is unavailable | Retry with next model |
Primary model returns 4xx (bad request) | Return error immediately (do not retry — the request itself is invalid) |
| All models fail | Return error from last attempted model |
Identifying Which Model Responded
Themodel field in the response always reflects the model that actually generated the response:
model in the response differs from your primary model in the request, a fallback occurred.
Pricing
You are billed for the model that actually responds. If your primary model fails and a fallback model responds, you pay the fallback model’s rate.Combining with Provider Fallbacks
models[] (model-level fallbacks) and provider.allow_fallbacks (provider-level fallbacks) operate independently:
gpt-5.4-pro via Azure, then gpt-5.4-pro via OpenAI, then gpt-5.4 via Azure, then gpt-5.4 via OpenAI.
Use Cases
High-availability production:Related
- Provider Routing — Provider-level fallbacks with
allow_fallbacks - Model Routing — How ARouter selects models
- Uptime Optimization — Best practices for reliability