Model Routing - ARouter

The `provider/model` Format

When using the OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings), specify the model using the provider/model format:

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{"role": "user", "content": "Hello!"}]
}

ARouter parses the provider prefix, routes the request to the correct upstream, and rewrites the model field to the provider’s native format before forwarding.

Examples

You send	Provider	Upstream model
`openai/gpt-5.4`	OpenAI	`gpt-5.4`
`anthropic/claude-sonnet-4.6`	Anthropic	`claude-sonnet-4.6`
`google/gemini-2.5-flash`	Google	`gemini-2.5-flash`
`deepseek/deepseek-v3.2`	DeepSeek	`deepseek-v3.2`
`x-ai/grok-4.20`	xAI	`grok-4.20`
`mistralai/mistral-large-2512`	Mistral	`mistral-large-2512`
`meta-llama/llama-4-maverick`	Meta	`llama-4-maverick`
`gpt-5.4`	OpenAI (default)	`gpt-5.4`

If you omit the provider prefix, ARouter defaults to OpenAI. So "model": "gpt-5.4" is equivalent to "model": "openai/gpt-5.4".

Native SDK Endpoints

For providers with their own SDK format, use the native endpoints directly. The provider is determined by the endpoint path, not the model field:

SDK	Endpoint	Model format
Anthropic	`POST /v1/messages`	Native: `claude-sonnet-4.6`
Gemini	`POST /v1beta/models/{model}:generateContent`	Native: `gemini-2.5-flash`
MiniMax	`POST /v1/text/chatcompletion_v2`	Native: `minimax-m2.7`

Native endpoints do not use the provider/model prefix — they use the provider’s original model names since the provider is already implied by the endpoint path.

Generic Provider Proxy

For any provider, you can also use the catch-all proxy format:

POST /{provider}/v1/chat/completions

For example:

POST /openai/v1/chat/completions → proxied to OpenAI
POST /deepseek/v1/chat/completions → proxied to DeepSeek
POST /anthropic/v1/messages → proxied to Anthropic

This is useful when you want to bypass model-field parsing and explicitly control which provider receives the request.

Auto Routing

Set model to "auto" and ARouter will automatically select the best available model for your prompt. No model configuration needed.

{
  "model": "auto",
  "messages": [{ "role": "user", "content": "Explain quantum entanglement in simple terms" }]
}

How It Works

ARouter’s routing service analyzes your request (prompt complexity, task type, required modalities, etc.)
The optimal model is selected from available healthy providers based on cost efficiency and quality
Your request is forwarded to the selected model
The response includes the model field showing exactly which model was used

Restricting Allowed Models

Use the auto-router plugin to restrict which models auto can select from, using wildcard patterns:

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Explain quantum entanglement" }],
  // @ts-ignore
  plugins: [
    {
      id: "auto-router",
      allowed_models: ["anthropic/*", "openai/gpt-5.4"],
    },
  ],
});

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum entanglement"}],
    extra_body={
        "plugins": [
            {
                "id": "auto-router",
                "allowed_models": ["anthropic/*", "openai/gpt-5.4"],
            }
        ]
    },
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
    "plugins": [
      {
        "id": "auto-router",
        "allowed_models": ["anthropic/*", "openai/gpt-5.4"]
      }
    ]
  }'

Pattern syntax:

Pattern	Matches
`anthropic/*`	All Anthropic models
`openai/gpt-5*`	All GPT-5 variants
`google/*`	All Google models
`openai/gpt-5.4`	Exact match only
`/claude-`	Any provider with “claude” in model name

{
  "id": "chatcmpl-xxx",
  "model": "anthropic/claude-sonnet-4.6",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 120,
    "total_tokens": 135
  }
}

Always check response.model to see which model was actually used.

Code Example

Python (OpenAI)
Node.js (OpenAI)
Go
cURL

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum entanglement in simple terms"}],
)

print(response.choices[0].message.content)
print("Model used:", response.model)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Explain quantum entanglement in simple terms" }],
});

console.log(response.choices[0].message.content);
console.log("Model used:", response.model);

resp, err := client.CreateChatCompletion(ctx, arouter.ChatCompletionRequest{
    Model: "auto",
    Messages: []arouter.Message{
        {Role: "user", Content: "Explain quantum entanglement in simple terms"},
    },
})
if err != nil {
    log.Fatal(err)
}
fmt.Println(resp.Choices[0].Message.Content)
fmt.Println("Model used:", resp.Model)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Explain quantum entanglement in simple terms"}]
  }'

Use Cases

General-purpose apps — When you don’t know what types of prompts users will send
Cost optimization — Let ARouter route simpler tasks to efficient models automatically
Zero-config prototyping — Get started without choosing a specific model
Adaptive routing — Let ARouter choose first, and switch to ordered candidate lists only when you need explicit control

Limitations

Auto routing uses the standard messages request format
Auto routing selects from models available to your account
Streaming is fully supported with "model": "auto"
You pay the normal rate for the model ARouter selects; there is no additional routing fee
The selected model is always reflected in the response model field

Candidate Model Lists

Use the models array together with route when you want ARouter to work through an ordered candidate list.

{
  "models": [
    "anthropic/claude-opus-4.5",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash"
  ],
  "route": "fallback",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

How It Works

ARouter tries the first model in the list
If it cannot serve the request (provider error, rate limit, key unavailable), it moves to the next
If all models fail, ARouter returns an error with the last failure reason

Routing Behavior

Trigger	Behavior
Provider unavailable	Move to next model
Rate limited (429)	Move to next model
Model not found	Move to next model
Bad request (400)	Stop and return the request error

Controlling Partition Behavior

By default, when using a candidate list, endpoints are grouped by model — the first model’s endpoints are always tried before the second model’s. You can change this with provider.sort.partition:

{
  "models": [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash"
  ],
  "route": "fallback",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "provider": {
    "sort": {
      "by": "throughput",
      "partition": "none"
    }
  }
}

Setting partition: "none" sorts endpoints globally across all candidate models — useful when you want whichever model is currently fastest, regardless of which is listed first. See Provider Routing for the full reference.

Using Candidate Lists with the OpenAI SDK

The OpenAI SDK doesn’t have a models parameter natively. Use extra_body to pass it:

Python (OpenAI)
Node.js (OpenAI)
cURL

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4.5",  # First candidate
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "models": [
            "anthropic/claude-opus-4.5",
            "openai/gpt-5.4",
            "google/gemini-2.5-flash",
        ],
        "route": "fallback",
    },
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4.5", // First candidate
  messages: [{ role: "user", content: "Hello!" }],
  // @ts-ignore — extra_body is not in the type definitions
  models: [
    "anthropic/claude-opus-4.5",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash",
  ],
  route: "fallback",
});
console.log(response.choices[0].message.content);

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      "anthropic/claude-opus-4.5",
      "openai/gpt-5.4",
      "google/gemini-2.5-flash"
    ],
    "route": "fallback",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Assistant Prefill

ARouter supports asking models to complete a partial response. Include a message with role: "assistant" at the end of your messages array to continue from where you left off:

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [
    { "role": "user", "content": "Name 3 popular programming languages." },
    { "role": "assistant", "content": "1." }
  ]
}

The model will continue from the prefilled assistant message. This technique is useful for:

Forcing a specific output format
Resuming multi-turn completions
Guiding the model into a specific response structure

Not all models support assistant prefill. Anthropic Claude and most open-source models support it. OpenAI models have limited support.

How Routing Works Under the Hood

Parse model field → extract provider ID
Check API key permissions → is this provider allowed?
Call router-service → select region, pick healthy key from pool
Rewrite model field → strip provider prefix
Reverse proxy → forward to upstream with provider's API key
Stream response back → record usage asynchronously

ARouter handles provider API key injection, health checking, and failover completely transparently. Your application never sees the upstream provider’s credentials.

​The provider/model Format

​Examples

​Native SDK Endpoints

​Generic Provider Proxy

​Auto Routing

​How It Works

​Restricting Allowed Models

​Code Example

​Use Cases

​Limitations

​Candidate Model Lists

​How It Works

​Routing Behavior

​Controlling Partition Behavior

​Using Candidate Lists with the OpenAI SDK

​Assistant Prefill

​How Routing Works Under the Hood

The `provider/model` Format

Examples

Native SDK Endpoints

Generic Provider Proxy

Auto Routing

How It Works

Restricting Allowed Models

Code Example

Use Cases

Limitations

Candidate Model Lists

How It Works

Routing Behavior

Controlling Partition Behavior

Using Candidate Lists with the OpenAI SDK

Assistant Prefill

How Routing Works Under the Hood