Streaming

Overview

ARouter supports streaming responses for all major providers. When streaming is enabled, tokens are delivered in real time as they’re generated, giving your users a much more responsive experience. Streaming works identically to the upstream provider — ARouter transparently proxies the SSE stream while asynchronously counting tokens for usage tracking.

OpenAI-Compatible Streaming

Set stream: true in your request body:

Python
Node.js
Go
cURL

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.com/v1",
    api_key="lr_live_xxxx",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

const stream = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a story." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

stream, err := client.ChatCompletionStream(ctx, arouter.ChatCompletionRequest{
    Model: "gpt-4o",
    Messages: []arouter.Message{
        {Role: "user", Content: "Tell me a story."},
    },
})
if err != nil {
    log.Fatal(err)
}
defer stream.Close()

for {
    chunk, err := stream.Recv()
    if err == arouter.ErrStreamDone {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    fmt.Print(chunk.Choices[0].Delta.Content)
}

curl -N https://api.arouter.com/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a story."}],
    "stream": true
  }'

Anthropic Streaming

The Anthropic SDK uses its own streaming format:

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.com",
    api_key="lr_live_xxxx",
)

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Gemini Streaming

Gemini uses streamGenerateContent instead of generateContent:

import google.generativeai as genai

genai.configure(
    api_key="lr_live_xxxx",
    transport="rest",
    client_options={"api_endpoint": "https://api.arouter.com"},
)

model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Tell me a story.", stream=True)

for chunk in response:
    print(chunk.text, end="", flush=True)

SSE Format

Under the hood, streaming uses Server-Sent Events. Each event looks like:

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

ARouter transparently parses the SSE stream to extract token counts for usage tracking, then forwards the stream unmodified to your client.

Get Started

Core Concepts

SDKs

Guides

Overview

OpenAI-Compatible Streaming

Anthropic Streaming

Gemini Streaming

SSE Format

Get Started

Core Concepts

SDKs

Guides

​Overview

​OpenAI-Compatible Streaming

​Anthropic Streaming

​Gemini Streaming

​SSE Format

Overview

OpenAI-Compatible Streaming

Anthropic Streaming

Gemini Streaming

SSE Format