跳转到主要内容
ARouter 支持所有模型的流式响应。启用流式传输后,token 会在生成时实时传递。 要启用流式传输,请在请求体中设置 stream: true
from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

stream = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "How would you build the tallest building ever?"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

# Final chunk includes usage stats
# Access via: stream.get_final_completion().usage

Anthropic 流式传输

Anthropic SDK 使用其自有的流式传输格式:
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.ai",
    api_key="lr_live_xxxx",
)

with client.messages.stream(
    model="claude-sonnet-4.6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "How would you build the tallest building ever?"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Gemini 流式传输

Gemini 使用 streamGenerateContent 而非 generateContent
import google.generativeai as genai

genai.configure(
    api_key="lr_live_xxxx",
    transport="rest",
    client_options={"api_endpoint": "https://api.arouter.ai"},
)

model = genai.GenerativeModel("gemini-2.5-flash")
response = model.generate_content("How would you build the tallest building ever?", stream=True)

for chunk in response:
    print(chunk.text, end="", flush=True)

SSE 格式

底层流式传输使用 Server-Sent Events。每个内容事件的格式如下:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","model":"openai/gpt-5.4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","model":"openai/gpt-5.4","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","model":"openai/gpt-5.4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
[DONE] 之前的最后一个数据块包含用量数据,且 choices 数组为空:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","model":"openai/gpt-5.4","choices":[],"usage":{"prompt_tokens":10,"completion_tokens":20,"total_tokens":30,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":0}}}

data: [DONE]
ARouter 可能偶尔发送 SSE 注释(以 : 开头的行)以防止连接超时。根据 SSE 规范,这些注释可以安全忽略。

推荐的 SSE 客户端库

部分 SSE 客户端实现可能无法正确解析数据。我们推荐:

取消流式请求

流式请求可通过中断连接来取消。对于支持的提供商,这将立即停止模型处理。
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const controller = new AbortController();

try {
  const stream = await client.chat.completions.create(
    {
      model: "openai/gpt-5.4",
      messages: [{ role: "user", content: "Write a long story" }],
      stream: true,
    },
    { signal: controller.signal },
  );

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) process.stdout.write(content);
  }
} catch (error) {
  if (error.name === "AbortError") {
    console.log("Stream cancelled");
  } else {
    throw error;
  }
}

// To cancel:
controller.abort();

流式传输中的错误处理

ARouter 根据错误发生的时间,以不同方式处理流式传输中的错误。

发送任何 Token 之前出现的错误

如果在开始流式传输之前发生错误,ARouter 会返回带有适当 HTTP 状态码的标准 JSON 错误响应:
{
  "error": {
    "code": 400,
    "message": "Invalid model specified"
  }
}
常见 HTTP 状态码:
代码含义
400Bad Request — 参数无效
401Unauthorized — API key 无效
402Payment Required — 额度不足
429Too Many Requests — 已被限速
502Bad Gateway — 提供商错误
503Service Unavailable — 无可用提供商

已发送部分 Token 后出现的错误(流中途)

如果在已传输部分 token 后发生错误,ARouter 无法更改 HTTP 状态码(此时已为 200 OK)。错误将以 SSE 事件的形式发送:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","model":"openai/gpt-5.4","error":{"code":"server_error","message":"Provider disconnected unexpectedly"},"choices":[{"index":0,"delta":{"content":""},"finish_reason":"error"}]}
关键特征:
  • 错误出现在顶层,与标准响应字段并列
  • choices 数组包含 finish_reason: "error" 以终止流
  • 由于响应头已发送,HTTP 状态保持 200 OK

错误处理代码示例

from openai import OpenAI, APIStatusError

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

try:
    stream = client.chat.completions.create(
        model="openai/gpt-5.4",
        messages=[{"role": "user", "content": "Write a story"}],
        stream=True,
    )
    for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
except APIStatusError as e:
    print(f"\nError {e.status_code}: {e.message}")