Model Variants

Model variants let you modify routing behavior by appending a suffix to any model ID. Instead of configuring a separate provider object, variants are a concise shorthand embedded directly in the model string.

{"model": "openai/gpt-5.4:nitro"}

`:nitro` — Maximum Throughput

Append :nitro to route to the highest-throughput instance of a model. Equivalent to setting provider.sort = "throughput". Best for: Real-time applications, interactive chat, streaming UI

{"model": "openai/gpt-5.4:nitro"}

TypeScript
Python
cURL

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4:nitro",
  messages: [{ role: "user", content: "Hello!" }],
});

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4:nitro",
    messages=[{"role": "user", "content": "Hello!"}],
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-5.4:nitro", "messages": [{"role": "user", "content": "Hello!"}]}'

`:floor` — Minimum Cost

Append :floor to route to the lowest-cost instance of a model. Equivalent to setting provider.sort = "price". Best for: Batch processing, offline workloads, cost-sensitive pipelines

{"model": "openai/gpt-5.4:floor"}

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4:floor",
  messages: [{ role: "user", content: "Summarize this document." }],
});

response = client.chat.completions.create(
    model="openai/gpt-5.4:floor",
    messages=[{"role": "user", "content": "Summarize this document."}],
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-5.4:floor", "messages": [{"role": "user", "content": "Summarize this document."}]}'

`:free` — Free Tier

Append :free to route to the free-tier instance of a model. Free-tier instances are available for many popular models with rate limits applied. Best for: Prototyping, development, low-volume testing

{"model": "meta-llama/llama-4-maverick:free"}

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  model: "meta-llama/llama-4-maverick:free",
  messages: [{ role: "user", content: "Hello!" }],
});

response = client.chat.completions.create(
    model="meta-llama/llama-4-maverick:free",
    messages=[{"role": "user", "content": "Hello!"}],
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/llama-4-maverick:free", "messages": [{"role": "user", "content": "Hello!"}]}'

Free-tier models apply stricter rate limits and may have reduced context windows. See Rate Limits for details.

`:thinking` — Extended Reasoning

Append :thinking to enable extended chain-of-thought reasoning on models that support it (e.g. DeepSeek R1, Claude with extended thinking, Gemini Flash Thinking). Best for: Complex reasoning, math, coding, multi-step problems

{"model": "deepseek/deepseek-r1:thinking"}

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  model: "deepseek/deepseek-r1:thinking",
  messages: [
    { role: "user", content: "Prove that √2 is irrational." }
  ],
});

// Reasoning tokens are returned in usage
console.log(response.usage);

response = client.chat.completions.create(
    model="deepseek/deepseek-r1:thinking",
    messages=[{"role": "user", "content": "Prove that √2 is irrational."}],
)

# Reasoning tokens are returned in usage
print(response.usage)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-r1:thinking",
    "messages": [{"role": "user", "content": "Prove that √2 is irrational."}]
  }'

When :thinking is enabled, reasoning tokens appear in the response usage.completion_tokens_details:

{
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 850,
    "total_tokens": 870,
    "completion_tokens_details": {
      "reasoning_tokens": 720
    }
  }
}

See Reasoning Tokens for billing and usage details.

`:extended` — Extended Context

Append :extended to access versions of a model with a larger context window than the default. Best for: Long document processing, large codebases, extended conversations

{"model": "google/gemini-2.5-flash:extended"}

TypeScript
Python

const response = await client.chat.completions.create({
  model: "google/gemini-2.5-flash:extended",
  messages: [{ role: "user", content: "Analyze this 500-page report..." }],
});

response = client.chat.completions.create(
    model="google/gemini-2.5-flash:extended",
    messages=[{"role": "user", "content": "Analyze this 500-page report..."}],
)

`:exacto` — Tool-Calling Quality

Append :exacto to explicitly activate quality-ranked routing for tool-calling requests. ARouter selects the provider endpoint with the highest tool-calling quality score, rather than the cheapest option. Best for: Production tool-calling pipelines where schema adherence and argument accuracy matter more than cost

{"model": "openai/gpt-5.4:exacto"}

TypeScript
Python

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4:exacto",
  messages: [{ role: "user", content: "Get the weather in Shanghai and Tokyo" }],
  tools: [weatherTool],
});

response = client.chat.completions.create(
    model="openai/gpt-5.4:exacto",
    messages=[{"role": "user", "content": "Get the weather in Shanghai and Tokyo"}],
    tools=[weather_tool],
)

Difference from Auto Exacto: Auto Exacto activates automatically whenever tools is present. :exacto forces quality-ranked routing even for requests without tools — useful when you want consistent provider selection behavior regardless of whether the model invokes tools.

Combining Variants

Some variants can be combined:

{"model": "meta-llama/llama-4-maverick:free:nitro"}

Not all combinations are valid. ARouter returns an error if the requested combination is unavailable for a model.

Variant Reference

Suffix	Effect	Equivalent `provider` setting	Best for
`:nitro`	Highest throughput	`provider.sort = "throughput"`	Real-time / interactive
`:floor`	Lowest cost	`provider.sort = "price"`	Batch / offline
`:free`	Free tier (rate limited)	—	Dev / prototyping
`:thinking`	Extended reasoning mode	—	Complex reasoning
`:extended`	Larger context window	—	Long documents
`:online`	Web search (deprecated)	`plugins: [{id: "web"}]`	Use Server Tools instead
`:exacto`	Tool-calling quality routing	`provider.sort = "quality"`	Production tool-calling

How Variants Affect Routing

Variants are parsed on the server and influence which endpoint ARouter selects:

The base model ID (e.g. openai/gpt-5.4) identifies the model family
The suffix modifies the endpoint selection criteria
ARouter returns the actual model used in response.model

Always check response.model to see exactly which model variant was served:

{
  "model": "openai/gpt-5.4:nitro",
  "choices": [...],
  "usage": {...}
}

See Provider Routing for full provider object options when you need more granular control than variants provide.

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

`:nitro` — Maximum Throughput

`:floor` — Minimum Cost

`:free` — Free Tier

`:thinking` — Extended Reasoning

`:extended` — Extended Context

`:exacto` — Tool-Calling Quality

Combining Variants

Variant Reference

How Variants Affect Routing

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

​:nitro — Maximum Throughput

​:floor — Minimum Cost

​:free — Free Tier

​:thinking — Extended Reasoning

​:extended — Extended Context

​:exacto — Tool-Calling Quality

​Combining Variants

​Variant Reference

​How Variants Affect Routing

`:nitro` — Maximum Throughput

`:floor` — Minimum Cost

`:free` — Free Tier

`:thinking` — Extended Reasoning

`:extended` — Extended Context

`:exacto` — Tool-Calling Quality

Combining Variants

Variant Reference

How Variants Affect Routing