Message Transforms

When a prompt exceeds a model’s context length, ARouter can automatically compress it using the context-compression plugin — rather than failing the request.

Context Compression

Enable context compression per-request by passing the plugin in the request body:

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [...],
  "plugins": [{"id": "context-compression"}]
}

TypeScript
Python
cURL

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4.6",
  messages: veryLongConversation,
  // @ts-ignore
  plugins: [{ id: "context-compression" }],
});

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=very_long_conversation,
    extra_body={"plugins": [{"id": "context-compression"}]},
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.6",
    "messages": [...],
    "plugins": [{"id": "context-compression"}]
  }'

How It Works

The plugin removes or truncates messages from the middle of the conversation until the prompt fits within the model’s context window. This strategy is based on research showing that LLMs pay less attention to the middle of long sequences. Preserving the beginning (system instructions, initial context) and end (most recent messages) of a conversation generally produces better results than truncating from either end. Compression steps:

Check if total tokens (prompt + estimated completion) exceed the model’s context length
If over limit: remove or truncate messages from the middle of messages[]
Repeat until the prompt fits
Forward the compressed prompt to the model

Message Count Limits

Some models enforce a maximum number of messages regardless of token count. For example, Anthropic Claude models have a maximum message count. When this limit is exceeded with context compression enabled, the plugin keeps half of the messages from the start and half from the end of the conversation.

Default Behavior for Small Context Models

All models with 8,192 tokens or fewer context length have context compression enabled by default. To explicitly disable compression for these models:

{
  "model": "some-small-context-model",
  "messages": [...],
  "plugins": [{"id": "context-compression", "enabled": false}]
}

Without compression enabled, if your total tokens exceed the model’s context length, the request fails with an error suggesting you reduce input length or enable compression.

Model Selection with Compression

When context compression is active, ARouter first tries to find models whose context length is at least half of your total required tokens (input + estimated completion). For example, if your prompt requires 10,000 tokens total:

Models with at least 5,000 context length are considered
If no models meet this threshold, ARouter uses the model with the highest available context length

When to Use

Context compression is useful when:

You have long multi-turn conversations that grow over time
You’re processing documents that may occasionally exceed the context window
You want resilient behavior without manually managing context length

Context compression is not ideal when:

Perfect recall of all conversation history is required (e.g. document Q&A where any message may contain the answer)
You need deterministic behavior (compression is non-deterministic in which messages are removed)

For use cases requiring full context retention, consider models with larger context windows (see Model Variants :extended).

Combining with Other Plugins

Context compression can be combined with other plugins:

{
  "model": "openai/gpt-5.4:online",
  "messages": [...],
  "plugins": [
    {"id": "context-compression"},
    {"id": "web"}
  ]
}

See Plugins Overview for the complete list of available plugins.

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

Message Transforms

Context Compression

How It Works

Message Count Limits

Default Behavior for Small Context Models

Model Selection with Compression

When to Use

Combining with Other Plugins

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

​Context Compression

​How It Works

​Message Count Limits

​Default Behavior for Small Context Models

​Model Selection with Compression

​When to Use

​Combining with Other Plugins

Context Compression

How It Works

Message Count Limits

Default Behavior for Small Context Models

Model Selection with Compression

When to Use

Combining with Other Plugins