Overview
ARouter supports streaming responses for all major providers.
When streaming is enabled, tokens are delivered in real time as they’re generated,
giving your users a much more responsive experience.
Streaming works identically to the upstream provider — ARouter transparently proxies
the SSE stream while asynchronously counting tokens for usage tracking.
OpenAI-Compatible Streaming
Set stream: true in your request body:
from openai import OpenAI
client = OpenAI(
base_url="https://api.arouter.com/v1",
api_key="lr_live_xxxx",
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
const stream = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Tell me a story." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
stream, err := client.ChatCompletionStream(ctx, arouter.ChatCompletionRequest{
Model: "gpt-4o",
Messages: []arouter.Message{
{Role: "user", Content: "Tell me a story."},
},
})
if err != nil {
log.Fatal(err)
}
defer stream.Close()
for {
chunk, err := stream.Recv()
if err == arouter.ErrStreamDone {
break
}
if err != nil {
log.Fatal(err)
}
fmt.Print(chunk.Choices[0].Delta.Content)
}
curl -N https://api.arouter.com/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": true
}'
Anthropic Streaming
The Anthropic SDK uses its own streaming format:
import anthropic
client = anthropic.Anthropic(
base_url="https://api.arouter.com",
api_key="lr_live_xxxx",
)
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Gemini Streaming
Gemini uses streamGenerateContent instead of generateContent:
import google.generativeai as genai
genai.configure(
api_key="lr_live_xxxx",
transport="rest",
client_options={"api_endpoint": "https://api.arouter.com"},
)
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Tell me a story.", stream=True)
for chunk in response:
print(chunk.text, end="", flush=True)
Under the hood, streaming uses Server-Sent Events. Each event looks like:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
ARouter transparently parses the SSE stream to extract token counts for usage tracking,
then forwards the stream unmodified to your client.