Skip to main content

Overview

ARouter supports streaming responses for all major providers. When streaming is enabled, tokens are delivered in real time as they’re generated, giving your users a much more responsive experience. Streaming works identically to the upstream provider — ARouter transparently proxies the SSE stream while asynchronously counting tokens for usage tracking.

OpenAI-Compatible Streaming

Set stream: true in your request body:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.com/v1",
    api_key="lr_live_xxxx",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Anthropic Streaming

The Anthropic SDK uses its own streaming format:
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.com",
    api_key="lr_live_xxxx",
)

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Gemini Streaming

Gemini uses streamGenerateContent instead of generateContent:
import google.generativeai as genai

genai.configure(
    api_key="lr_live_xxxx",
    transport="rest",
    client_options={"api_endpoint": "https://api.arouter.com"},
)

model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Tell me a story.", stream=True)

for chunk in response:
    print(chunk.text, end="", flush=True)

SSE Format

Under the hood, streaming uses Server-Sent Events. Each event looks like:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
ARouter transparently parses the SSE stream to extract token counts for usage tracking, then forwards the stream unmodified to your client.