Skip to main content
Model variants let you modify routing behavior by appending a suffix to any model ID. Instead of configuring a separate provider object, variants are a concise shorthand embedded directly in the model string.
{"model": "openai/gpt-5.4:nitro"}

:nitro — Maximum Throughput

Append :nitro to route to the highest-throughput instance of a model. Equivalent to setting provider.sort = "throughput". Best for: Real-time applications, interactive chat, streaming UI
{"model": "openai/gpt-5.4:nitro"}
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4:nitro",
  messages: [{ role: "user", content: "Hello!" }],
});

:floor — Minimum Cost

Append :floor to route to the lowest-cost instance of a model. Equivalent to setting provider.sort = "price". Best for: Batch processing, offline workloads, cost-sensitive pipelines
{"model": "openai/gpt-5.4:floor"}
const response = await client.chat.completions.create({
  model: "openai/gpt-5.4:floor",
  messages: [{ role: "user", content: "Summarize this document." }],
});

:free — Free Tier

Append :free to route to the free-tier instance of a model. Free-tier instances are available for many popular models with rate limits applied. Best for: Prototyping, development, low-volume testing
{"model": "meta-llama/llama-4-maverick:free"}
const response = await client.chat.completions.create({
  model: "meta-llama/llama-4-maverick:free",
  messages: [{ role: "user", content: "Hello!" }],
});
Free-tier models apply stricter rate limits and may have reduced context windows. See Rate Limits for details.

:thinking — Extended Reasoning

Append :thinking to enable extended chain-of-thought reasoning on models that support it (e.g. DeepSeek R1, Claude with extended thinking, Gemini Flash Thinking). Best for: Complex reasoning, math, coding, multi-step problems
{"model": "deepseek/deepseek-r1:thinking"}
const response = await client.chat.completions.create({
  model: "deepseek/deepseek-r1:thinking",
  messages: [
    { role: "user", content: "Prove that √2 is irrational." }
  ],
});

// Reasoning tokens are returned in usage
console.log(response.usage);
When :thinking is enabled, reasoning tokens appear in the response usage.completion_tokens_details:
{
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 850,
    "total_tokens": 870,
    "completion_tokens_details": {
      "reasoning_tokens": 720
    }
  }
}
See Reasoning Tokens for billing and usage details.

:extended — Extended Context

Append :extended to access versions of a model with a larger context window than the default. Best for: Long document processing, large codebases, extended conversations
{"model": "google/gemini-2.5-flash:extended"}
const response = await client.chat.completions.create({
  model: "google/gemini-2.5-flash:extended",
  messages: [{ role: "user", content: "Analyze this 500-page report..." }],
});

:exacto — Tool-Calling Quality

Append :exacto to explicitly activate quality-ranked routing for tool-calling requests. ARouter selects the provider endpoint with the highest tool-calling quality score, rather than the cheapest option. Best for: Production tool-calling pipelines where schema adherence and argument accuracy matter more than cost
{"model": "openai/gpt-5.4:exacto"}
const response = await client.chat.completions.create({
  model: "openai/gpt-5.4:exacto",
  messages: [{ role: "user", content: "Get the weather in Shanghai and Tokyo" }],
  tools: [weatherTool],
});
Difference from Auto Exacto: Auto Exacto activates automatically whenever tools is present. :exacto forces quality-ranked routing even for requests without tools — useful when you want consistent provider selection behavior regardless of whether the model invokes tools.

Combining Variants

Some variants can be combined:
{"model": "meta-llama/llama-4-maverick:free:nitro"}
Not all combinations are valid. ARouter returns an error if the requested combination is unavailable for a model.

Variant Reference

SuffixEffectEquivalent provider settingBest for
:nitroHighest throughputprovider.sort = "throughput"Real-time / interactive
:floorLowest costprovider.sort = "price"Batch / offline
:freeFree tier (rate limited)Dev / prototyping
:thinkingExtended reasoning modeComplex reasoning
:extendedLarger context windowLong documents
:onlineWeb search (deprecated)plugins: [{id: "web"}]Use Server Tools instead
:exactoTool-calling quality routingprovider.sort = "quality"Production tool-calling

How Variants Affect Routing

Variants are parsed on the server and influence which endpoint ARouter selects:
  1. The base model ID (e.g. openai/gpt-5.4) identifies the model family
  2. The suffix modifies the endpoint selection criteria
  3. ARouter returns the actual model used in response.model
Always check response.model to see exactly which model variant was served:
{
  "model": "openai/gpt-5.4:nitro",
  "choices": [...],
  "usage": {...}
}
See Provider Routing for full provider object options when you need more granular control than variants provide.