多模態

支援的模態

模態	方向	備註
文字	輸入 + 輸出	所有模型
圖像（URL / base64）	輸入	視覺模型 — JPEG、PNG、GIF、WebP
PDF（base64）	輸入	Anthropic Claude、Google Gemini
音訊（base64）	輸入	多模態音訊模型
圖像生成	輸出	DALL-E 3、Flux、Stable Diffusion
音訊輸出（TTS / 語音）	輸出	TTS 模型、音訊對話模型

# 支援圖像輸入的模型
GET /v1/models?supported_parameters=vision
# 輸出圖像的模型
GET /v1/models?output_modalities=image
# 輸出音訊的模型
GET /v1/models?output_modalities=audio

圖像

使用圖像 URL

{
  "model": "openai/gpt-5.4",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What's in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
    ]
  }]
}

圖像細節級別

值	描述
`auto`（預設）	提供商根據圖像大小決定
`low`	更快、更便宜 — 85 個 token，調整為 512×512
`high`	全解析度 — 對圖像進行分塊，消耗更多 token

模型	圖像 URL	圖像 Base64	PDF	音訊輸入
`openai/gpt-5.4`	✓	✓	—	—
`anthropic/claude-sonnet-4.6`	✓	✓	✓	—
`google/gemini-2.5-flash`	✓	✓	✓	✓
`google/gemini-2.5-pro`	✓	✓	✓	✓

模型

圖像 URL

圖像 Base64

PDF

音訊輸入

openai/gpt-5.4

✓

—

anthropic/claude-sonnet-4.6

✓

—

google/gemini-2.5-flash

✓

google/gemini-2.5-pro

✓

圖像 token 計入提示詞 token 限制。使用 detail: "high" 的大型高解析度圖像可能比文字消耗多得多的 token。

支援的模態

圖像

使用圖像 URL

圖像細節級別

模型相容性

其他模態

​支援的模態

​圖像

​使用圖像 URL

​圖像細節級別

​模型相容性

​其他模態

支援的模態

圖像

使用圖像 URL

圖像細節級別

模型相容性

其他模態