ARouter supports multimodal inputs — you can send images and PDFs alongside text messages. The model processes visual content and responds in text.
Supported Modalities
| Modality | Supported | Notes |
|---|
| Text | All models | Default |
| Images (URL) | Vision models | JPEG, PNG, GIF, WebP |
| Images (base64) | Vision models | Same formats |
| PDFs | Select models | Anthropic Claude, Gemini |
Images
Using an Image URL
Pass a publicly accessible image URL in the image_url content part:
{
"model": "openai/gpt-5.4",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]
}
Using Base64-Encoded Images
For private images or when you don’t have a public URL, encode the image as base64:
{
"model": "openai/gpt-5.4",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..."
}
}
]
}
]
}
Image Detail Level
Use the detail parameter to control resolution. Higher detail costs more tokens:
| Value | Description |
|---|
auto (default) | Provider decides based on image size |
low | Faster, cheaper — 85 tokens, resize to 512×512 |
high | Full resolution — tiles the image, more tokens |
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg",
"detail": "high"
}
}
Full Example — Vision
Python (OpenAI)
Node.js (OpenAI)
Anthropic SDK
cURL
import base64
from openai import OpenAI
client = OpenAI(
base_url="https://api.arouter.ai/v1",
api_key="lr_live_xxxx",
)
# Option 1: Image URL
response = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
"detail": "auto",
},
},
],
}
],
)
print(response.choices[0].message.content)
# Option 2: Base64 image
with open("image.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}",
},
},
],
}
],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
import * as fs from "fs";
const client = new OpenAI({
baseURL: "https://api.arouter.ai/v1",
apiKey: "lr_live_xxxx",
});
// Option 1: Image URL
const response = await client.chat.completions.create({
model: "openai/gpt-5.4",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: {
url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
detail: "auto",
},
},
],
},
],
});
console.log(response.choices[0].message.content);
// Option 2: Base64 image
const imageData = fs.readFileSync("image.jpg").toString("base64");
const response2 = await client.chat.completions.create({
model: "openai/gpt-5.4",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Describe this image." },
{
type: "image_url",
image_url: { url: `data:image/jpeg;base64,${imageData}` },
},
],
},
],
});
console.log(response2.choices[0].message.content);
import base64
import anthropic
client = anthropic.Anthropic(
base_url="https://api.arouter.ai",
api_key="lr_live_xxxx",
)
# Option 1: Image URL
response = client.messages.create(
model="claude-sonnet-4.6",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
},
{"type": "text", "text": "What's in this image?"},
],
}
],
)
print(response.content[0].text)
# Option 2: Base64 image
with open("image.jpg", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4.6",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data,
},
},
{"type": "text", "text": "Describe this image."},
],
}
],
)
print(response.content[0].text)
# Image URL
curl https://api.arouter.ai/v1/chat/completions \
-H "Authorization: Bearer lr_live_xxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]
}'
PDFs
Some models can process PDF documents directly. PDFs are passed as base64-encoded content.
Anthropic Claude — PDF Support
import base64
import anthropic
client = anthropic.Anthropic(
base_url="https://api.arouter.ai",
api_key="lr_live_xxxx",
)
with open("document.pdf", "rb") as f:
pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4.6",
max_tokens=2048,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data,
},
},
{"type": "text", "text": "Summarize the key points of this document."},
],
}
],
)
print(response.content[0].text)
Google Gemini — PDF Support
import base64
import google.generativeai as genai
genai.configure(
api_key="lr_live_xxxx",
transport="rest",
client_options={"api_endpoint": "https://api.arouter.ai"},
)
with open("document.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode("utf-8")
model = genai.GenerativeModel("gemini-2.5-flash")
response = model.generate_content([
{
"inline_data": {
"mime_type": "application/pdf",
"data": pdf_data,
}
},
"Summarize the key points of this document.",
])
print(response.text)
Model Compatibility
| Model | Image URL | Image Base64 | PDF |
|---|
openai/gpt-5.4 | ✓ | ✓ | — |
openai/gpt-5.4-pro | ✓ | ✓ | — |
anthropic/claude-sonnet-4.6 | ✓ | ✓ | ✓ |
anthropic/claude-opus-4.5 | ✓ | ✓ | ✓ |
google/gemini-2.5-flash | ✓ | ✓ | ✓ |
google/gemini-2.5-pro | ✓ | ✓ | ✓ |
Use GET /v1/models to query the latest capability information.
| Format | When to Use |
|---|
| Image URL | Public images accessible on the internet |
| Image base64 | Private images, local files, or when URL is not available |
| PDF base64 | Document analysis (Claude and Gemini only) |
Image tokens count toward the prompt token limit. Large, high-resolution images with detail: "high" can consume significantly more tokens than text. Always check usage.prompt_tokens to monitor consumption.