ARouter routes each request to the optimal upstream provider based on model availability, provider health, and cost efficiency. This happens automatically — no configuration required for most use cases.
For advanced control, pass a provider object in the request body to customize how routing decisions are made.
The provider Object
Include a provider object in any /v1/chat/completions request to override routing defaults:
{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello!" }],
"provider": {
"sort": "throughput",
"allow_fallbacks": true
}
}
Full Field Reference
| Field | Type | Default | Description |
|---|
order | string[] | — | List of provider slugs to try in order, e.g. ["openai", "azure"] |
allow_fallbacks | boolean | true | Whether to allow backup providers when the primary is unavailable |
require_parameters | boolean | false | Only use providers that support all parameters in your request |
data_collection | "allow" | "deny" | "allow" | Control whether to use providers that may store request data |
zdr | boolean | — | Restrict routing to Zero Data Retention endpoints only |
only | string[] | — | List of provider slugs to allow for this request |
ignore | string[] | — | List of provider slugs to skip for this request |
quantizations | string[] | — | Filter by quantization level, e.g. ["int4", "int8"] |
sort | string | object | — | Sort providers by "price", "throughput", or "latency" |
preferred_min_throughput | number | object | — | Preferred minimum throughput in tokens/sec |
preferred_max_latency | number | object | — | Preferred maximum latency in seconds |
max_price | object | — | Maximum price you are willing to pay per token |
Default Strategy: Cost-Based Load Balancing
By default, ARouter load balances requests across healthy providers, prioritizing cost. The algorithm:
- Exclude providers with significant outages in the last 30 seconds
- Among stable providers, weight selection by the inverse square of the price
- Use remaining providers as automatic fallbacks
Example: If Provider A costs 1/Mtokens,ProviderBcosts2/M, and Provider C costs $3/M:
- Provider A is 9× more likely to be chosen than Provider C (inverse square weighting)
- If Provider A fails, Provider C is tried next
- Provider B (recently degraded) is tried last
If you set sort or order, load balancing is disabled and providers are tried in strict order.
Provider Sorting
Use the sort field to explicitly prioritize a provider attribute. Load balancing is disabled and providers are tried in order.
Available sort values:
"price" — prioritize lowest cost per token
"throughput" — prioritize highest tokens/sec
"latency" — prioritize lowest time-to-first-token
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.arouter.ai/v1",
apiKey: "lr_live_xxxx",
});
const response = await client.chat.completions.create({
model: "openai/gpt-5.4",
messages: [{ role: "user", content: "Hello" }],
// @ts-ignore
provider: { sort: "throughput" },
});
from openai import OpenAI
client = OpenAI(
base_url="https://api.arouter.ai/v1",
api_key="lr_live_xxxx",
)
response = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[{"role": "user", "content": "Hello"}],
extra_body={"provider": {"sort": "throughput"}},
)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4",
"messages": [{"role": "user", "content": "Hello"}],
"provider": {"sort": "throughput"}
}'
:nitro and :floor Shortcuts
Append a suffix to the model slug as a shorthand for sorting:
| Suffix | Equivalent to |
|---|
:nitro | provider.sort = "throughput" |
:floor | provider.sort = "price" |
{"model": "openai/gpt-5.4:nitro"} // sort by throughput
{"model": "openai/gpt-5.4:floor"} // sort by price
Advanced Sorting with Partition
When using candidate model lists (models[]), the sort field can be an object with a partition option to control how endpoints are sorted across models.
| Field | Type | Default | Description |
|---|
sort.by | string | — | "price", "throughput", or "latency" |
sort.partition | string | "model" | "model" (try primary model first) or "none" (sort globally) |
By default (partition: "model"), endpoints are grouped by model — the first model’s endpoints are always tried before the second model’s. Setting partition: "none" removes this grouping, allowing global sorting across all candidate models.
Use Case 1: Route to Highest Throughput Across Multiple Models
When you have multiple acceptable models and want whichever is fastest right now:
const response = await client.chat.completions.create({
// @ts-ignore
models: [
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-2.5-flash",
],
messages: [{ role: "user", content: "Hello" }],
provider: {
sort: { by: "throughput", partition: "none" },
},
});
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4.6",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"models": [
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-2.5-flash",
],
"provider": {
"sort": {"by": "throughput", "partition": "none"},
},
},
)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"models": [
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-2.5-flash"
],
"messages": [{"role": "user", "content": "Hello"}],
"provider": {
"sort": {"by": "throughput", "partition": "none"}
}
}'
Combine partition: "none" with performance thresholds to find the lowest-cost option that still meets your SLA:
const response = await client.chat.completions.create({
// @ts-ignore
models: [
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-2.5-flash",
],
messages: [{ role: "user", content: "Hello" }],
provider: {
sort: { by: "price", partition: "none" },
preferred_min_throughput: { p90: 50 },
},
});
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4.6",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"models": [
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-2.5-flash",
],
"provider": {
"sort": {"by": "price", "partition": "none"},
"preferred_min_throughput": {"p90": 50},
},
},
)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"models": [
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-2.5-flash"
],
"messages": [{"role": "user", "content": "Hello"}],
"provider": {
"sort": {"by": "price", "partition": "none"},
"preferred_min_throughput": {"p90": 50}
}
}'
Set minimum throughput or maximum latency preferences to filter providers. Providers that don’t meet thresholds are deprioritized (moved to the end), not excluded entirely.
| Field | Description |
|---|
preferred_min_throughput | Minimum tokens per second. Can be a number (applies to p50) or an object with percentile keys |
preferred_max_latency | Maximum time-to-first-token in seconds. Can be a number or an object with percentile keys |
How Percentiles Work
ARouter tracks provider performance over a rolling 5-minute window:
| Percentile | Meaning |
|---|
p50 | 50% of requests perform better than this value (median) |
p75 | 75% of requests perform better |
p90 | 90% of requests perform better |
p99 | 99% of requests perform better |
Higher percentiles (p90/p99) give confidence about worst-case performance. All specified percentile cutoffs must be met for a provider to be in the preferred group.
{
"provider": {
"preferred_min_throughput": {
"p50": 100,
"p90": 50
},
"preferred_max_latency": {
"p99": 3.0
}
}
}
preferred_min_throughput and preferred_max_latency are soft preferences — they never prevent a request from being served. This is different from max_price, which is a hard limit.
Ordering Specific Providers
Use order to specify which providers to try and in what sequence. Load balancing is disabled when order is set.
const response = await client.chat.completions.create({
model: "openai/gpt-5.4",
messages: [{ role: "user", content: "Hello" }],
// @ts-ignore
provider: {
order: ["openai", "azure"],
},
});
response = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"provider": {
"order": ["openai", "azure"],
}
},
)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4",
"messages": [{"role": "user", "content": "Hello"}],
"provider": {"order": ["openai", "azure"]}
}'
Allowing Only Specific Providers
Use only to restrict routing to a specific set of providers:
{
"model": "meta-llama/llama-4-maverick",
"messages": [{ "role": "user", "content": "Hello" }],
"provider": {
"only": ["groq", "together"]
}
}
Ignoring Providers
Use ignore to skip specific providers for this request:
{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello" }],
"provider": {
"ignore": ["azure"]
}
}
Disabling Fallbacks
By default, ARouter falls back to alternative providers if the primary is unavailable. Set allow_fallbacks: false to require the exact provider:
{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello" }],
"provider": {
"order": ["openai"],
"allow_fallbacks": false
}
}
If the specified provider is unavailable, ARouter returns a 503 error rather than routing elsewhere.
Requiring Parameter Support
Set require_parameters: true to only route to providers that support all parameters in your request. By default, ARouter may route to providers that ignore unsupported parameters.
{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello" }],
"tools": [...],
"provider": {
"require_parameters": true
}
}
Quantization Filtering
Filter providers by the model quantization level they serve. Useful when you need specific precision/performance tradeoffs:
{
"model": "meta-llama/llama-4-maverick",
"messages": [{ "role": "user", "content": "Hello" }],
"provider": {
"quantizations": ["fp16", "bf16"]
}
}
Common quantization values: "fp32", "fp16", "bf16", "int8", "int4".
Data Collection Policy
Control whether ARouter routes to providers that may store your request data:
| Value | Behavior |
|---|
"allow" (default) | Route to any provider, including those that may store data |
"deny" | Only route to providers that do not store request/response data |
{
"model": "anthropic/claude-sonnet-4.6",
"messages": [{ "role": "user", "content": "Sensitive content" }],
"provider": {
"data_collection": "deny"
}
}
Zero Data Retention (ZDR)
For maximum privacy, restrict routing to providers with Zero Data Retention guarantees:
{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello" }],
"provider": {
"zdr": true
}
}
ZDR providers do not log, store, or use request data for training. See Data Collection for more details.
Maximum Price
Set a hard limit on how much you are willing to pay per token. Requests will fail rather than be routed to providers above this price:
{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello" }],
"provider": {
"max_price": {
"prompt": "0.000010",
"completion": "0.000030"
}
}
}
Unlike performance thresholds, max_price is a hard limit. If no provider meets the price requirement, the request returns an error.
Provider Health and Availability
ARouter continuously tracks provider health using a circuit-breaker mechanism:
| Status | Behavior |
|---|
| Healthy | Provider is accepting requests normally |
| Degraded | Recent failures detected; requests may be retried with a different key |
| Unavailable | All keys circuit-broken; ARouter returns 503 |
This is fully transparent — your application does not need to implement provider-level retry logic.
Specifying a Provider via Model Prefix
The primary way to control which provider handles your request is via the provider/model format:
{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello!" }]
}
See Model Routing for the full list of supported formats.
Native Provider Proxy
For complete control, use the provider proxy endpoint /{provider}/{path} to bypass ARouter’s model-routing layer entirely:
# Direct to OpenAI
curl https://api.arouter.ai/openai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-d '{"model": "gpt-5.4", "messages": [...]}'
# Direct to Anthropic
curl https://api.arouter.ai/anthropic/v1/messages \
-H "Authorization: Bearer lr_live_xxxx" \
-d '{"model": "claude-sonnet-4.6", "messages": [...]}'
See Provider Proxy for the full reference.
Supported Providers
| Provider | Prefix | Example Model |
|---|
| OpenAI | openai | openai/gpt-5.4 |
| Anthropic | anthropic | anthropic/claude-sonnet-4.6 |
| Google | google | google/gemini-2.5-flash |
| DeepSeek | deepseek | deepseek/deepseek-v3.2 |
| xAI | x-ai | x-ai/grok-4.20 |
| Mistral | mistralai | mistralai/mistral-large-2512 |
| Meta | meta-llama | meta-llama/llama-4-maverick |
| Qwen | qwen | qwen/qwen3-235b |
| MiniMax | minimax | minimax/minimax-m2.7 |
| Groq | groq | groq/llama-3.3-70b-versatile |
| Kimi | moonshotai | moonshotai/kimi-k2.5 |
| Dashscope | dashscope | dashscope/qwen-max |
See Providers for the full list with capabilities.