Provider Routing

ARouter routes each request to the optimal upstream provider based on model availability, provider health, and cost efficiency. This happens automatically — no configuration required for most use cases. For advanced control, pass a provider object in the request body to customize how routing decisions are made.

The `provider` Object

Include a provider object in any /v1/chat/completions request to override routing defaults:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "provider": {
    "sort": "throughput",
    "allow_fallbacks": true
  }
}

Full Field Reference

Field	Type	Default	Description
`order`	`string[]`	—	List of provider slugs to try in order, e.g. `["openai", "azure"]`
`allow_fallbacks`	`boolean`	`true`	Whether to allow backup providers when the primary is unavailable
`require_parameters`	`boolean`	`false`	Only use providers that support all parameters in your request
`data_collection`	`"allow" \| "deny"`	`"allow"`	Control whether to use providers that may store request data
`zdr`	`boolean`	—	Restrict routing to Zero Data Retention endpoints only
`only`	`string[]`	—	List of provider slugs to allow for this request
`ignore`	`string[]`	—	List of provider slugs to skip for this request
`quantizations`	`string[]`	—	Filter by quantization level, e.g. `["int4", "int8"]`
`sort`	`string \| object`	—	Sort providers by `"price"`, `"throughput"`, or `"latency"`
`preferred_min_throughput`	`number \| object`	—	Preferred minimum throughput in tokens/sec
`preferred_max_latency`	`number \| object`	—	Preferred maximum latency in seconds
`max_price`	`object`	—	Maximum price you are willing to pay per token

Default Strategy: Cost-Based Load Balancing

By default, ARouter load balances requests across healthy providers, prioritizing cost. The algorithm:

Exclude providers with significant outages in the last 30 seconds
Among stable providers, weight selection by the inverse square of the price
Use remaining providers as automatic fallbacks

Example: If Provider A costs

1/M tokens, Provider B costs

2/M, and Provider C costs $3/M:

Provider A is 9× more likely to be chosen than Provider C (inverse square weighting)
If Provider A fails, Provider C is tried next
Provider B (recently degraded) is tried last

If you set sort or order, load balancing is disabled and providers are tried in strict order.

Provider Sorting

Use the sort field to explicitly prioritize a provider attribute. Load balancing is disabled and providers are tried in order. Available sort values:

"price" — prioritize lowest cost per token
"throughput" — prioritize highest tokens/sec
"latency" — prioritize lowest time-to-first-token

TypeScript
Python
cURL

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [{ role: "user", content: "Hello" }],
  // @ts-ignore
  provider: { sort: "throughput" },
});

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"provider": {"sort": "throughput"}},
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {"sort": "throughput"}
  }'

`:nitro` and `:floor` Shortcuts

Append a suffix to the model slug as a shorthand for sorting:

Suffix	Equivalent to
`:nitro`	`provider.sort = "throughput"`
`:floor`	`provider.sort = "price"`

{"model": "openai/gpt-5.4:nitro"}  // sort by throughput
{"model": "openai/gpt-5.4:floor"}  // sort by price

Advanced Sorting with Partition

When using candidate model lists (models[]), the sort field can be an object with a partition option to control how endpoints are sorted across models.

Field	Type	Default	Description
`sort.by`	`string`	—	`"price"`, `"throughput"`, or `"latency"`
`sort.partition`	`string`	`"model"`	`"model"` (try primary model first) or `"none"` (sort globally)

By default (partition: "model"), endpoints are grouped by model — the first model’s endpoints are always tried before the second model’s. Setting partition: "none" removes this grouping, allowing global sorting across all candidate models.

Use Case 1: Route to Highest Throughput Across Multiple Models

When you have multiple acceptable models and want whichever is fastest right now:

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  // @ts-ignore
  models: [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash",
  ],
  messages: [{ role: "user", content: "Hello" }],
  provider: {
    sort: { by: "throughput", partition: "none" },
  },
});

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "models": [
            "anthropic/claude-sonnet-4.6",
            "openai/gpt-5.4",
            "google/gemini-2.5-flash",
        ],
        "provider": {
            "sort": {"by": "throughput", "partition": "none"},
        },
    },
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      "anthropic/claude-sonnet-4.6",
      "openai/gpt-5.4",
      "google/gemini-2.5-flash"
    ],
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": {"by": "throughput", "partition": "none"}
    }
  }'

Use Case 2: Cheapest Model That Meets Performance Requirements

Combine partition: "none" with performance thresholds to find the lowest-cost option that still meets your SLA:

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  // @ts-ignore
  models: [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash",
  ],
  messages: [{ role: "user", content: "Hello" }],
  provider: {
    sort: { by: "price", partition: "none" },
    preferred_min_throughput: { p90: 50 },
  },
});

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "models": [
            "anthropic/claude-sonnet-4.6",
            "openai/gpt-5.4",
            "google/gemini-2.5-flash",
        ],
        "provider": {
            "sort": {"by": "price", "partition": "none"},
            "preferred_min_throughput": {"p90": 50},
        },
    },
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      "anthropic/claude-sonnet-4.6",
      "openai/gpt-5.4",
      "google/gemini-2.5-flash"
    ],
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": {"by": "price", "partition": "none"},
      "preferred_min_throughput": {"p90": 50}
    }
  }'

Performance Thresholds

Set minimum throughput or maximum latency preferences to filter providers. Providers that don’t meet thresholds are deprioritized (moved to the end), not excluded entirely.

Field	Description
`preferred_min_throughput`	Minimum tokens per second. Can be a number (applies to p50) or an object with percentile keys
`preferred_max_latency`	Maximum time-to-first-token in seconds. Can be a number or an object with percentile keys

How Percentiles Work

ARouter tracks provider performance over a rolling 5-minute window:

Percentile	Meaning
`p50`	50% of requests perform better than this value (median)
`p75`	75% of requests perform better
`p90`	90% of requests perform better
`p99`	99% of requests perform better

Higher percentiles (p90/p99) give confidence about worst-case performance. All specified percentile cutoffs must be met for a provider to be in the preferred group.

{
  "provider": {
    "preferred_min_throughput": {
      "p50": 100,
      "p90": 50
    },
    "preferred_max_latency": {
      "p99": 3.0
    }
  }
}

preferred_min_throughput and preferred_max_latency are soft preferences — they never prevent a request from being served. This is different from max_price, which is a hard limit.

Ordering Specific Providers

Use order to specify which providers to try and in what sequence. Load balancing is disabled when order is set.

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [{ role: "user", content: "Hello" }],
  // @ts-ignore
  provider: {
    order: ["openai", "azure"],
  },
});

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "provider": {
            "order": ["openai", "azure"],
        }
    },
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {"order": ["openai", "azure"]}
  }'

Allowing Only Specific Providers

Use only to restrict routing to a specific set of providers:

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "only": ["groq", "together"]
  }
}

Ignoring Providers

Use ignore to skip specific providers for this request:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "ignore": ["azure"]
  }
}

Disabling Fallbacks

By default, ARouter falls back to alternative providers if the primary is unavailable. Set allow_fallbacks: false to require the exact provider:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "order": ["openai"],
    "allow_fallbacks": false
  }
}

If the specified provider is unavailable, ARouter returns a 503 error rather than routing elsewhere.

Requiring Parameter Support

Set require_parameters: true to only route to providers that support all parameters in your request. By default, ARouter may route to providers that ignore unsupported parameters.

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "tools": [...],
  "provider": {
    "require_parameters": true
  }
}

Quantization Filtering

Filter providers by the model quantization level they serve. Useful when you need specific precision/performance tradeoffs:

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "quantizations": ["fp16", "bf16"]
  }
}

Common quantization values: "fp32", "fp16", "bf16", "int8", "int4".

Data Collection Policy

Control whether ARouter routes to providers that may store your request data:

Value	Behavior
`"allow"` (default)	Route to any provider, including those that may store data
`"deny"`	Only route to providers that do not store request/response data

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{ "role": "user", "content": "Sensitive content" }],
  "provider": {
    "data_collection": "deny"
  }
}

Zero Data Retention (ZDR)

For maximum privacy, restrict routing to providers with Zero Data Retention guarantees:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "zdr": true
  }
}

ZDR providers do not log, store, or use request data for training. See Data Collection for more details.

Maximum Price

Set a hard limit on how much you are willing to pay per token. Requests will fail rather than be routed to providers above this price:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "max_price": {
      "prompt": "0.000010",
      "completion": "0.000030"
    }
  }
}

Unlike performance thresholds, max_price is a hard limit. If no provider meets the price requirement, the request returns an error.

Provider Health and Availability

ARouter continuously tracks provider health using a circuit-breaker mechanism:

Status	Behavior
Healthy	Provider is accepting requests normally
Degraded	Recent failures detected; requests may be retried with a different key
Unavailable	All keys circuit-broken; ARouter returns `503`

This is fully transparent — your application does not need to implement provider-level retry logic.

Specifying a Provider via Model Prefix

The primary way to control which provider handles your request is via the provider/model format:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

See Model Routing for the full list of supported formats.

Native Provider Proxy

For complete control, use the provider proxy endpoint /{provider}/{path} to bypass ARouter’s model-routing layer entirely:

# Direct to OpenAI
curl https://api.arouter.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -d '{"model": "gpt-5.4", "messages": [...]}'

# Direct to Anthropic
curl https://api.arouter.ai/anthropic/v1/messages \
  -H "Authorization: Bearer lr_live_xxxx" \
  -d '{"model": "claude-sonnet-4.6", "messages": [...]}'

See Provider Proxy for the full reference.

Supported Providers

Provider	Prefix	Example Model
OpenAI	`openai`	`openai/gpt-5.4`
Anthropic	`anthropic`	`anthropic/claude-sonnet-4.6`
Google	`google`	`google/gemini-2.5-flash`
DeepSeek	`deepseek`	`deepseek/deepseek-v3.2`
xAI	`x-ai`	`x-ai/grok-4.20`
Mistral	`mistralai`	`mistralai/mistral-large-2512`
Meta	`meta-llama`	`meta-llama/llama-4-maverick`
Qwen	`qwen`	`qwen/qwen3-235b`
MiniMax	`minimax`	`minimax/minimax-m2.7`
Groq	`groq`	`groq/llama-3.3-70b-versatile`
Kimi	`moonshotai`	`moonshotai/kimi-k2.5`
Dashscope	`dashscope`	`dashscope/qwen-max`

See Providers for the full list with capabilities.

​The provider Object

​Full Field Reference

​Default Strategy: Cost-Based Load Balancing

​Provider Sorting

​:nitro and :floor Shortcuts

​Advanced Sorting with Partition

​Use Case 1: Route to Highest Throughput Across Multiple Models

​Use Case 2: Cheapest Model That Meets Performance Requirements

​Performance Thresholds

​How Percentiles Work

​Ordering Specific Providers

​Allowing Only Specific Providers

​Ignoring Providers

​Disabling Fallbacks

​Requiring Parameter Support

​Quantization Filtering

​Data Collection Policy

​Zero Data Retention (ZDR)

​Maximum Price

​Provider Health and Availability

​Specifying a Provider via Model Prefix

​Native Provider Proxy

​Supported Providers

The `provider` Object

Full Field Reference

Default Strategy: Cost-Based Load Balancing

Provider Sorting

`:nitro` and `:floor` Shortcuts

Advanced Sorting with Partition

Use Case 1: Route to Highest Throughput Across Multiple Models

Use Case 2: Cheapest Model That Meets Performance Requirements

Performance Thresholds

How Percentiles Work

Ordering Specific Providers

Allowing Only Specific Providers

Ignoring Providers

Disabling Fallbacks

Requiring Parameter Support

Quantization Filtering

Data Collection Policy

Zero Data Retention (ZDR)

Maximum Price

Provider Health and Availability

Specifying a Provider via Model Prefix

Native Provider Proxy

Supported Providers