Reasoning Tokens

Some models perform internal chain-of-thought reasoning before generating a final response. These reasoning steps consume tokens — called reasoning tokens — which affect cost and latency but are not visible to the user by default.

Supported Models

Model	Reasoning Support
`openai/o4-mini`	Always-on reasoning
`openai/o3`	Always-on reasoning
`anthropic/claude-sonnet-4-6`	Optional extended thinking
`anthropic/claude-opus-4-6`	Optional extended thinking
`deepseek/deepseek-r1`	Always-on reasoning
`google/gemini-2.5-pro`	Optional thinking mode
`google/gemini-2.5-flash`	Optional thinking mode

How Reasoning Tokens Appear in Usage

Reasoning tokens are reported in the usage object as part of completion_tokens_details:

{
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 520,
    "total_tokens": 670,
    "completion_tokens_details": {
      "reasoning_tokens": 400
    }
  }
}

In this example, 400 of the 520 completion tokens were used for internal reasoning. Only the remaining 120 tokens appear in the visible response.

Billing for Reasoning Tokens

Reasoning tokens are billed at the completion token rate for that model. They are included in completion_tokens for billing purposes — the breakdown is informational. ARouter passes through the upstream provider’s reasoning token counts without modification.

Controlling Reasoning Behavior

OpenAI o-series (o4-mini, o3)

Reasoning is always on for o-series models. Use reasoning_effort to control how much reasoning the model does:

{
  "model": "openai/o4-mini",
  "reasoning_effort": "high",
  "messages": [...]
}

Valid values: "low", "medium", "high". Higher effort = more reasoning tokens = higher quality and cost.

Anthropic Extended Thinking

Enable extended thinking by passing thinking in your request:

import anthropic

client = anthropic.Anthropic(
    api_key="your-arouter-key",
    base_url="https://api.arouter.ai/anthropic",
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,
    },
    messages=[{"role": "user", "content": "Solve this step by step: ..."}],
)

budget_tokens caps how many tokens can be used for thinking. The thinking content is returned as a separate block in the response.

DeepSeek R1

Reasoning is always on for DeepSeek R1. The model returns a reasoning_content field alongside the regular content:

from openai import OpenAI

client = OpenAI(
    api_key="your-arouter-key",
    base_url="https://api.arouter.ai/v1",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Prove that √2 is irrational."}],
)

# Reasoning content (if exposed by provider)
print(response.choices[0].message.reasoning_content)
# Final answer
print(response.choices[0].message.content)

Google Gemini Thinking

Enable thinking for Gemini 2.5 models via the thinking parameter:

response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[...],
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 5000
        }
    }
)

Activity Export and Reasoning Tokens

The Activity Export includes a breakdown of reasoning tokens, so you can accurately track their contribution to total costs. Reasoning tokens are included in completion tokens in the export summary.

Best Practices

Start with "low" or "medium" effort for o-series models unless you need maximum reasoning quality. This reduces cost and latency significantly.
Set a budget_tokens cap for Anthropic and Gemini thinking models to avoid unexpectedly large bills on complex queries.
Monitor reasoning token ratios in your activity feed. A high ratio of reasoning to output tokens is normal for complex tasks but may indicate the model is overthinking simple queries.
Don’t disable reasoning to save costs on tasks that genuinely require multi-step reasoning — output quality degrades significantly.

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

Reasoning Tokens

Supported Models

How Reasoning Tokens Appear in Usage

Billing for Reasoning Tokens

Controlling Reasoning Behavior

OpenAI o-series (o4-mini, o3)

Anthropic Extended Thinking

DeepSeek R1

Google Gemini Thinking

Activity Export and Reasoning Tokens

Best Practices

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

​Supported Models

​How Reasoning Tokens Appear in Usage

​Billing for Reasoning Tokens

​Controlling Reasoning Behavior

​OpenAI o-series (o4-mini, o3)

​Anthropic Extended Thinking

​DeepSeek R1

​Google Gemini Thinking

​Activity Export and Reasoning Tokens

​Best Practices

Supported Models

How Reasoning Tokens Appear in Usage

Billing for Reasoning Tokens

Controlling Reasoning Behavior

OpenAI o-series (o4-mini, o3)

Anthropic Extended Thinking

DeepSeek R1

Google Gemini Thinking

Activity Export and Reasoning Tokens

Best Practices