Skip to main content

The provider/model Format

When using the OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings), specify the model using the provider/model format:
{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{"role": "user", "content": "Hello!"}]
}
ARouter parses the provider prefix, routes the request to the correct upstream, and rewrites the model field to the provider’s native format before forwarding.

Examples

You sendProviderUpstream model
openai/gpt-5.4OpenAIgpt-5.4
anthropic/claude-sonnet-4.6Anthropicclaude-sonnet-4.6
google/gemini-2.5-flashGooglegemini-2.5-flash
deepseek/deepseek-v3.2DeepSeekdeepseek-v3.2
x-ai/grok-4.20xAIgrok-4.20
mistralai/mistral-large-2512Mistralmistral-large-2512
meta-llama/llama-4-maverickMetallama-4-maverick
gpt-5.4OpenAI (default)gpt-5.4
If you omit the provider prefix, ARouter defaults to OpenAI. So "model": "gpt-5.4" is equivalent to "model": "openai/gpt-5.4".

Native SDK Endpoints

For providers with their own SDK format, use the native endpoints directly. The provider is determined by the endpoint path, not the model field:
SDKEndpointModel format
AnthropicPOST /v1/messagesNative: claude-sonnet-4.6
GeminiPOST /v1beta/models/{model}:generateContentNative: gemini-2.5-flash
MiniMaxPOST /v1/text/chatcompletion_v2Native: minimax-m2.7
Native endpoints do not use the provider/model prefix — they use the provider’s original model names since the provider is already implied by the endpoint path.

Generic Provider Proxy

For any provider, you can also use the catch-all proxy format:
POST /{provider}/v1/chat/completions
For example:
  • POST /openai/v1/chat/completions → proxied to OpenAI
  • POST /deepseek/v1/chat/completions → proxied to DeepSeek
  • POST /anthropic/v1/messages → proxied to Anthropic
This is useful when you want to bypass model-field parsing and explicitly control which provider receives the request.

Auto Routing

Set model to "auto" and ARouter will automatically select the best available model for your prompt. No model configuration needed.
{
  "model": "auto",
  "messages": [{ "role": "user", "content": "Explain quantum entanglement in simple terms" }]
}

How It Works

  1. ARouter’s routing service analyzes your request (prompt complexity, task type, required modalities, etc.)
  2. The optimal model is selected from available healthy providers based on cost efficiency and quality
  3. Your request is forwarded to the selected model
  4. The response includes the model field showing exactly which model was used

Restricting Allowed Models

Use the auto-router plugin to restrict which models auto can select from, using wildcard patterns:
const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Explain quantum entanglement" }],
  // @ts-ignore
  plugins: [
    {
      id: "auto-router",
      allowed_models: ["anthropic/*", "openai/gpt-5.4"],
    },
  ],
});
Pattern syntax:
PatternMatches
anthropic/*All Anthropic models
openai/gpt-5*All GPT-5 variants
google/*All Google models
openai/gpt-5.4Exact match only
*/claude-*Any provider with “claude” in model name
{
  "id": "chatcmpl-xxx",
  "model": "anthropic/claude-sonnet-4.6",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 120,
    "total_tokens": 135
  }
}
Always check response.model to see which model was actually used.

Code Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum entanglement in simple terms"}],
)

print(response.choices[0].message.content)
print("Model used:", response.model)

Use Cases

  • General-purpose apps — When you don’t know what types of prompts users will send
  • Cost optimization — Let ARouter route simpler tasks to efficient models automatically
  • Zero-config prototyping — Get started without choosing a specific model
  • Adaptive routing — Let ARouter choose first, and switch to ordered candidate lists only when you need explicit control

Limitations

  • Auto routing uses the standard messages request format
  • Auto routing selects from models available to your account
  • Streaming is fully supported with "model": "auto"
  • You pay the normal rate for the model ARouter selects; there is no additional routing fee
  • The selected model is always reflected in the response model field

Candidate Model Lists

Use the models array together with route when you want ARouter to work through an ordered candidate list.
{
  "models": [
    "anthropic/claude-opus-4.5",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash"
  ],
  "route": "fallback",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

How It Works

  1. ARouter tries the first model in the list
  2. If it cannot serve the request (provider error, rate limit, key unavailable), it moves to the next
  3. If all models fail, ARouter returns an error with the last failure reason

Routing Behavior

TriggerBehavior
Provider unavailableMove to next model
Rate limited (429)Move to next model
Model not foundMove to next model
Bad request (400)Stop and return the request error

Controlling Partition Behavior

By default, when using a candidate list, endpoints are grouped by model — the first model’s endpoints are always tried before the second model’s. You can change this with provider.sort.partition:
{
  "models": [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash"
  ],
  "route": "fallback",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "provider": {
    "sort": {
      "by": "throughput",
      "partition": "none"
    }
  }
}
Setting partition: "none" sorts endpoints globally across all candidate models — useful when you want whichever model is currently fastest, regardless of which is listed first. See Provider Routing for the full reference.

Using Candidate Lists with the OpenAI SDK

The OpenAI SDK doesn’t have a models parameter natively. Use extra_body to pass it:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4.5",  # First candidate
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "models": [
            "anthropic/claude-opus-4.5",
            "openai/gpt-5.4",
            "google/gemini-2.5-flash",
        ],
        "route": "fallback",
    },
)
print(response.choices[0].message.content)

Assistant Prefill

ARouter supports asking models to complete a partial response. Include a message with role: "assistant" at the end of your messages array to continue from where you left off:
{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [
    { "role": "user", "content": "Name 3 popular programming languages." },
    { "role": "assistant", "content": "1." }
  ]
}
The model will continue from the prefilled assistant message. This technique is useful for:
  • Forcing a specific output format
  • Resuming multi-turn completions
  • Guiding the model into a specific response structure
Not all models support assistant prefill. Anthropic Claude and most open-source models support it. OpenAI models have limited support.

How Routing Works Under the Hood

1. Parse model field → extract provider ID
2. Check API key permissions → is this provider allowed?
3. Call router-service → select region, pick healthy key from pool
4. Rewrite model field → strip provider prefix
5. Reverse proxy → forward to upstream with provider's API key
6. Stream response back → record usage asynchronously
ARouter handles provider API key injection, health checking, and failover completely transparently. Your application never sees the upstream provider’s credentials.