Skip to main content
ARouter routes each request to the optimal upstream provider based on model availability, provider health, and cost efficiency. This happens automatically — no configuration required for most use cases. For advanced control, pass a provider object in the request body to customize how routing decisions are made.

The provider Object

Include a provider object in any /v1/chat/completions request to override routing defaults:
{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "provider": {
    "sort": "throughput",
    "allow_fallbacks": true
  }
}

Full Field Reference

FieldTypeDefaultDescription
orderstring[]List of provider slugs to try in order, e.g. ["openai", "azure"]
allow_fallbacksbooleantrueWhether to allow backup providers when the primary is unavailable
require_parametersbooleanfalseOnly use providers that support all parameters in your request
data_collection"allow" | "deny""allow"Control whether to use providers that may store request data
zdrbooleanRestrict routing to Zero Data Retention endpoints only
onlystring[]List of provider slugs to allow for this request
ignorestring[]List of provider slugs to skip for this request
quantizationsstring[]Filter by quantization level, e.g. ["int4", "int8"]
sortstring | objectSort providers by "price", "throughput", or "latency"
preferred_min_throughputnumber | objectPreferred minimum throughput in tokens/sec
preferred_max_latencynumber | objectPreferred maximum latency in seconds
max_priceobjectMaximum price you are willing to pay per token

Default Strategy: Cost-Based Load Balancing

By default, ARouter load balances requests across healthy providers, prioritizing cost. The algorithm:
  1. Exclude providers with significant outages in the last 30 seconds
  2. Among stable providers, weight selection by the inverse square of the price
  3. Use remaining providers as automatic fallbacks
Example: If Provider A costs 1/Mtokens,ProviderBcosts1/M tokens, Provider B costs 2/M, and Provider C costs $3/M:
  • Provider A is 9× more likely to be chosen than Provider C (inverse square weighting)
  • If Provider A fails, Provider C is tried next
  • Provider B (recently degraded) is tried last
If you set sort or order, load balancing is disabled and providers are tried in strict order.

Provider Sorting

Use the sort field to explicitly prioritize a provider attribute. Load balancing is disabled and providers are tried in order. Available sort values:
  • "price" — prioritize lowest cost per token
  • "throughput" — prioritize highest tokens/sec
  • "latency" — prioritize lowest time-to-first-token
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [{ role: "user", content: "Hello" }],
  // @ts-ignore
  provider: { sort: "throughput" },
});

:nitro and :floor Shortcuts

Append a suffix to the model slug as a shorthand for sorting:
SuffixEquivalent to
:nitroprovider.sort = "throughput"
:floorprovider.sort = "price"
{"model": "openai/gpt-5.4:nitro"}  // sort by throughput
{"model": "openai/gpt-5.4:floor"}  // sort by price

Advanced Sorting with Partition

When using candidate model lists (models[]), the sort field can be an object with a partition option to control how endpoints are sorted across models.
FieldTypeDefaultDescription
sort.bystring"price", "throughput", or "latency"
sort.partitionstring"model""model" (try primary model first) or "none" (sort globally)
By default (partition: "model"), endpoints are grouped by model — the first model’s endpoints are always tried before the second model’s. Setting partition: "none" removes this grouping, allowing global sorting across all candidate models.

Use Case 1: Route to Highest Throughput Across Multiple Models

When you have multiple acceptable models and want whichever is fastest right now:
const response = await client.chat.completions.create({
  // @ts-ignore
  models: [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash",
  ],
  messages: [{ role: "user", content: "Hello" }],
  provider: {
    sort: { by: "throughput", partition: "none" },
  },
});

Use Case 2: Cheapest Model That Meets Performance Requirements

Combine partition: "none" with performance thresholds to find the lowest-cost option that still meets your SLA:
const response = await client.chat.completions.create({
  // @ts-ignore
  models: [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash",
  ],
  messages: [{ role: "user", content: "Hello" }],
  provider: {
    sort: { by: "price", partition: "none" },
    preferred_min_throughput: { p90: 50 },
  },
});

Performance Thresholds

Set minimum throughput or maximum latency preferences to filter providers. Providers that don’t meet thresholds are deprioritized (moved to the end), not excluded entirely.
FieldDescription
preferred_min_throughputMinimum tokens per second. Can be a number (applies to p50) or an object with percentile keys
preferred_max_latencyMaximum time-to-first-token in seconds. Can be a number or an object with percentile keys

How Percentiles Work

ARouter tracks provider performance over a rolling 5-minute window:
PercentileMeaning
p5050% of requests perform better than this value (median)
p7575% of requests perform better
p9090% of requests perform better
p9999% of requests perform better
Higher percentiles (p90/p99) give confidence about worst-case performance. All specified percentile cutoffs must be met for a provider to be in the preferred group.
{
  "provider": {
    "preferred_min_throughput": {
      "p50": 100,
      "p90": 50
    },
    "preferred_max_latency": {
      "p99": 3.0
    }
  }
}
preferred_min_throughput and preferred_max_latency are soft preferences — they never prevent a request from being served. This is different from max_price, which is a hard limit.

Ordering Specific Providers

Use order to specify which providers to try and in what sequence. Load balancing is disabled when order is set.
const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [{ role: "user", content: "Hello" }],
  // @ts-ignore
  provider: {
    order: ["openai", "azure"],
  },
});

Allowing Only Specific Providers

Use only to restrict routing to a specific set of providers:
{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "only": ["groq", "together"]
  }
}

Ignoring Providers

Use ignore to skip specific providers for this request:
{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "ignore": ["azure"]
  }
}

Disabling Fallbacks

By default, ARouter falls back to alternative providers if the primary is unavailable. Set allow_fallbacks: false to require the exact provider:
{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "order": ["openai"],
    "allow_fallbacks": false
  }
}
If the specified provider is unavailable, ARouter returns a 503 error rather than routing elsewhere.

Requiring Parameter Support

Set require_parameters: true to only route to providers that support all parameters in your request. By default, ARouter may route to providers that ignore unsupported parameters.
{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "tools": [...],
  "provider": {
    "require_parameters": true
  }
}

Quantization Filtering

Filter providers by the model quantization level they serve. Useful when you need specific precision/performance tradeoffs:
{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "quantizations": ["fp16", "bf16"]
  }
}
Common quantization values: "fp32", "fp16", "bf16", "int8", "int4".

Data Collection Policy

Control whether ARouter routes to providers that may store your request data:
ValueBehavior
"allow" (default)Route to any provider, including those that may store data
"deny"Only route to providers that do not store request/response data
{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{ "role": "user", "content": "Sensitive content" }],
  "provider": {
    "data_collection": "deny"
  }
}

Zero Data Retention (ZDR)

For maximum privacy, restrict routing to providers with Zero Data Retention guarantees:
{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "zdr": true
  }
}
ZDR providers do not log, store, or use request data for training. See Data Collection for more details.

Maximum Price

Set a hard limit on how much you are willing to pay per token. Requests will fail rather than be routed to providers above this price:
{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "max_price": {
      "prompt": "0.000010",
      "completion": "0.000030"
    }
  }
}
Unlike performance thresholds, max_price is a hard limit. If no provider meets the price requirement, the request returns an error.

Provider Health and Availability

ARouter continuously tracks provider health using a circuit-breaker mechanism:
StatusBehavior
HealthyProvider is accepting requests normally
DegradedRecent failures detected; requests may be retried with a different key
UnavailableAll keys circuit-broken; ARouter returns 503
This is fully transparent — your application does not need to implement provider-level retry logic.

Specifying a Provider via Model Prefix

The primary way to control which provider handles your request is via the provider/model format:
{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello!" }]
}
See Model Routing for the full list of supported formats.

Native Provider Proxy

For complete control, use the provider proxy endpoint /{provider}/{path} to bypass ARouter’s model-routing layer entirely:
# Direct to OpenAI
curl https://api.arouter.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -d '{"model": "gpt-5.4", "messages": [...]}'

# Direct to Anthropic
curl https://api.arouter.ai/anthropic/v1/messages \
  -H "Authorization: Bearer lr_live_xxxx" \
  -d '{"model": "claude-sonnet-4.6", "messages": [...]}'
See Provider Proxy for the full reference.

Supported Providers

ProviderPrefixExample Model
OpenAIopenaiopenai/gpt-5.4
Anthropicanthropicanthropic/claude-sonnet-4.6
Googlegooglegoogle/gemini-2.5-flash
DeepSeekdeepseekdeepseek/deepseek-v3.2
xAIx-aix-ai/grok-4.20
Mistralmistralaimistralai/mistral-large-2512
Metameta-llamameta-llama/llama-4-maverick
Qwenqwenqwen/qwen3-235b
MiniMaxminimaxminimax/minimax-m2.7
Groqgroqgroq/llama-3.3-70b-versatile
Kimimoonshotaimoonshotai/kimi-k2.5
Dashscopedashscopedashscope/qwen-max
See Providers for the full list with capabilities.