Supported Models
| Model | Reasoning Support |
|---|---|
openai/o4-mini | Always-on reasoning |
openai/o3 | Always-on reasoning |
anthropic/claude-sonnet-4-6 | Optional extended thinking |
anthropic/claude-opus-4-6 | Optional extended thinking |
deepseek/deepseek-r1 | Always-on reasoning |
google/gemini-2.5-pro | Optional thinking mode |
google/gemini-2.5-flash | Optional thinking mode |
How Reasoning Tokens Appear in Usage
Reasoning tokens are reported in theusage object as part of completion_tokens_details:
Billing for Reasoning Tokens
Reasoning tokens are billed at the completion token rate for that model. They are included incompletion_tokens for billing purposes — the breakdown is informational.
ARouter passes through the upstream provider’s reasoning token counts without modification.
Controlling Reasoning Behavior
OpenAI o-series (o4-mini, o3)
Reasoning is always on for o-series models. Usereasoning_effort to control how much reasoning the model does:
"low", "medium", "high". Higher effort = more reasoning tokens = higher quality and cost.
Anthropic Extended Thinking
Enable extended thinking by passingthinking in your request:
budget_tokens caps how many tokens can be used for thinking. The thinking content is returned as a separate block in the response.
DeepSeek R1
Reasoning is always on for DeepSeek R1. The model returns areasoning_content field alongside the regular content:
Google Gemini Thinking
Enable thinking for Gemini 2.5 models via thethinking parameter:
Activity Export and Reasoning Tokens
The Activity Export includes a breakdown of reasoning tokens, so you can accurately track their contribution to total costs. Reasoning tokens are included in completion tokens in the export summary.Best Practices
- Start with
"low"or"medium"effort for o-series models unless you need maximum reasoning quality. This reduces cost and latency significantly. - Set a
budget_tokenscap for Anthropic and Gemini thinking models to avoid unexpectedly large bills on complex queries. - Monitor reasoning token ratios in your activity feed. A high ratio of reasoning to output tokens is normal for complex tasks but may indicate the model is overthinking simple queries.
- Don’t disable reasoning to save costs on tasks that genuinely require multi-step reasoning — output quality degrades significantly.