Model variants let you modify routing behavior by appending a suffix to any model ID. Instead of configuring a separate provider object, variants are a concise shorthand embedded directly in the model string.
{"model": "openai/gpt-5.4:nitro"}
:nitro — Maximum Throughput
Append :nitro to route to the highest-throughput instance of a model. Equivalent to setting provider.sort = "throughput".
Best for: Real-time applications, interactive chat, streaming UI
{"model": "openai/gpt-5.4:nitro"}
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.arouter.ai/v1",
apiKey: "lr_live_xxxx",
});
const response = await client.chat.completions.create({
model: "openai/gpt-5.4:nitro",
messages: [{ role: "user", content: "Hello!" }],
});
from openai import OpenAI
client = OpenAI(
base_url="https://api.arouter.ai/v1",
api_key="lr_live_xxxx",
)
response = client.chat.completions.create(
model="openai/gpt-5.4:nitro",
messages=[{"role": "user", "content": "Hello!"}],
)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-5.4:nitro", "messages": [{"role": "user", "content": "Hello!"}]}'
:floor — Minimum Cost
Append :floor to route to the lowest-cost instance of a model. Equivalent to setting provider.sort = "price".
Best for: Batch processing, offline workloads, cost-sensitive pipelines
{"model": "openai/gpt-5.4:floor"}
const response = await client.chat.completions.create({
model: "openai/gpt-5.4:floor",
messages: [{ role: "user", content: "Summarize this document." }],
});
response = client.chat.completions.create(
model="openai/gpt-5.4:floor",
messages=[{"role": "user", "content": "Summarize this document."}],
)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-5.4:floor", "messages": [{"role": "user", "content": "Summarize this document."}]}'
:free — Free Tier
Append :free to route to the free-tier instance of a model. Free-tier instances are available for many popular models with rate limits applied.
Best for: Prototyping, development, low-volume testing
{"model": "meta-llama/llama-4-maverick:free"}
const response = await client.chat.completions.create({
model: "meta-llama/llama-4-maverick:free",
messages: [{ role: "user", content: "Hello!" }],
});
response = client.chat.completions.create(
model="meta-llama/llama-4-maverick:free",
messages=[{"role": "user", "content": "Hello!"}],
)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{"model": "meta-llama/llama-4-maverick:free", "messages": [{"role": "user", "content": "Hello!"}]}'
Free-tier models apply stricter rate limits and may have reduced context windows. See Rate Limits for details.
:thinking — Extended Reasoning
Append :thinking to enable extended chain-of-thought reasoning on models that support it (e.g. DeepSeek R1, Claude with extended thinking, Gemini Flash Thinking).
Best for: Complex reasoning, math, coding, multi-step problems
{"model": "deepseek/deepseek-r1:thinking"}
const response = await client.chat.completions.create({
model: "deepseek/deepseek-r1:thinking",
messages: [
{ role: "user", content: "Prove that √2 is irrational." }
],
});
// Reasoning tokens are returned in usage
console.log(response.usage);
response = client.chat.completions.create(
model="deepseek/deepseek-r1:thinking",
messages=[{"role": "user", "content": "Prove that √2 is irrational."}],
)
# Reasoning tokens are returned in usage
print(response.usage)
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-r1:thinking",
"messages": [{"role": "user", "content": "Prove that √2 is irrational."}]
}'
When :thinking is enabled, reasoning tokens appear in the response usage.completion_tokens_details:
{
"usage": {
"prompt_tokens": 20,
"completion_tokens": 850,
"total_tokens": 870,
"completion_tokens_details": {
"reasoning_tokens": 720
}
}
}
See Reasoning Tokens for billing and usage details.
:extended — Extended Context
Append :extended to access versions of a model with a larger context window than the default.
Best for: Long document processing, large codebases, extended conversations
{"model": "google/gemini-2.5-flash:extended"}
const response = await client.chat.completions.create({
model: "google/gemini-2.5-flash:extended",
messages: [{ role: "user", content: "Analyze this 500-page report..." }],
});
response = client.chat.completions.create(
model="google/gemini-2.5-flash:extended",
messages=[{"role": "user", "content": "Analyze this 500-page report..."}],
)
Append :exacto to explicitly activate quality-ranked routing for tool-calling requests. ARouter selects the provider endpoint with the highest tool-calling quality score, rather than the cheapest option.
Best for: Production tool-calling pipelines where schema adherence and argument accuracy matter more than cost
{"model": "openai/gpt-5.4:exacto"}
const response = await client.chat.completions.create({
model: "openai/gpt-5.4:exacto",
messages: [{ role: "user", content: "Get the weather in Shanghai and Tokyo" }],
tools: [weatherTool],
});
response = client.chat.completions.create(
model="openai/gpt-5.4:exacto",
messages=[{"role": "user", "content": "Get the weather in Shanghai and Tokyo"}],
tools=[weather_tool],
)
Difference from Auto Exacto: Auto Exacto activates automatically whenever tools is present. :exacto forces quality-ranked routing even for requests without tools — useful when you want consistent provider selection behavior regardless of whether the model invokes tools.
Combining Variants
Some variants can be combined:
{"model": "meta-llama/llama-4-maverick:free:nitro"}
Not all combinations are valid. ARouter returns an error if the requested combination is unavailable for a model.
Variant Reference
| Suffix | Effect | Equivalent provider setting | Best for |
|---|
:nitro | Highest throughput | provider.sort = "throughput" | Real-time / interactive |
:floor | Lowest cost | provider.sort = "price" | Batch / offline |
:free | Free tier (rate limited) | — | Dev / prototyping |
:thinking | Extended reasoning mode | — | Complex reasoning |
:extended | Larger context window | — | Long documents |
:online | Web search (deprecated) | plugins: [{id: "web"}] | Use Server Tools instead |
:exacto | Tool-calling quality routing | provider.sort = "quality" | Production tool-calling |
How Variants Affect Routing
Variants are parsed on the server and influence which endpoint ARouter selects:
- The base model ID (e.g.
openai/gpt-5.4) identifies the model family
- The suffix modifies the endpoint selection criteria
- ARouter returns the actual model used in
response.model
Always check response.model to see exactly which model variant was served:
{
"model": "openai/gpt-5.4:nitro",
"choices": [...],
"usage": {...}
}
See Provider Routing for full provider object options when you need more granular control than variants provide.