Multimodal

ARouter supports multimodal inputs — you can send images and PDFs alongside text messages. The model processes visual content and responds in text.

Supported Modalities

Modality	Supported	Notes
Text	All models	Default
Images (URL)	Vision models	JPEG, PNG, GIF, WebP
Images (base64)	Vision models	Same formats
PDFs	Select models	Anthropic Claude, Gemini

Images

Using an Image URL

Pass a publicly accessible image URL in the image_url content part:

{
  "model": "openai/gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ]
}

Using Base64-Encoded Images

For private images or when you don’t have a public URL, encode the image as base64:

{
  "model": "openai/gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..."
          }
        }
      ]
    }
  ]
}

Image Detail Level

Use the detail parameter to control resolution. Higher detail costs more tokens:

Value	Description
`auto` (default)	Provider decides based on image size
`low`	Faster, cheaper — 85 tokens, resize to 512×512
`high`	Full resolution — tiles the image, more tokens

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/image.jpg",
    "detail": "high"
  }
}

Full Example — Vision

Python (OpenAI)
Node.js (OpenAI)
Anthropic SDK
cURL

import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

# Option 1: Image URL
response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                        "detail": "auto",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)

# Option 2: Base64 image
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";
import * as fs from "fs";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

// Option 1: Image URL
const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            detail: "auto",
          },
        },
      ],
    },
  ],
});
console.log(response.choices[0].message.content);

// Option 2: Base64 image
const imageData = fs.readFileSync("image.jpg").toString("base64");

const response2 = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Describe this image." },
        {
          type: "image_url",
          image_url: { url: `data:image/jpeg;base64,${imageData}` },
        },
      ],
    },
  ],
});
console.log(response2.choices[0].message.content);

import base64
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.ai",
    api_key="lr_live_xxxx",
)

# Option 1: Image URL
response = client.messages.create(
    model="claude-sonnet-4.6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                },
                {"type": "text", "text": "What's in this image?"},
            ],
        }
    ],
)
print(response.content[0].text)

# Option 2: Base64 image
with open("image.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4.6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data,
                    },
                },
                {"type": "text", "text": "Describe this image."},
            ],
        }
    ],
)
print(response.content[0].text)

# Image URL
curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ]
  }'

PDFs

Some models can process PDF documents directly. PDFs are passed as base64-encoded content.

Anthropic Claude — PDF Support

import base64
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.ai",
    api_key="lr_live_xxxx",
)

with open("document.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4.6",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data,
                    },
                },
                {"type": "text", "text": "Summarize the key points of this document."},
            ],
        }
    ],
)
print(response.content[0].text)

Google Gemini — PDF Support

import base64
import google.generativeai as genai

genai.configure(
    api_key="lr_live_xxxx",
    transport="rest",
    client_options={"api_endpoint": "https://api.arouter.ai"},
)

with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")

model = genai.GenerativeModel("gemini-2.5-flash")
response = model.generate_content([
    {
        "inline_data": {
            "mime_type": "application/pdf",
            "data": pdf_data,
        }
    },
    "Summarize the key points of this document.",
])
print(response.text)

Model Compatibility

Model	Image URL	Image Base64	PDF
`openai/gpt-5.4`	✓	✓	—
`openai/gpt-5.4-pro`	✓	✓	—
`anthropic/claude-sonnet-4.6`	✓	✓	✓
`anthropic/claude-opus-4.5`	✓	✓	✓
`google/gemini-2.5-flash`	✓	✓	✓
`google/gemini-2.5-pro`	✓	✓	✓

Use GET /v1/models to query the latest capability information.

Input Format Support

Format	When to Use
Image URL	Public images accessible on the internet
Image base64	Private images, local files, or when URL is not available
PDF base64	Document analysis (Claude and Gemini only)

Image tokens count toward the prompt token limit. Large, high-resolution images with detail: "high" can consume significantly more tokens than text. Always check usage.prompt_tokens to monitor consumption.

Get Started

Core Concepts

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

Support

Supported Modalities

Images

Using an Image URL

Using Base64-Encoded Images

Image Detail Level

Full Example — Vision

PDFs

Anthropic Claude — PDF Support

Google Gemini — PDF Support

Model Compatibility

Input Format Support

Get Started

Core Concepts

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

Support

​Supported Modalities

​Images

​Using an Image URL

​Using Base64-Encoded Images

​Image Detail Level

​Full Example — Vision

​PDFs

​Anthropic Claude — PDF Support

​Google Gemini — PDF Support

​Model Compatibility

​Input Format Support

Supported Modalities

Images

Using an Image URL

Using Base64-Encoded Images

Image Detail Level

Full Example — Vision

PDFs

Anthropic Claude — PDF Support

Google Gemini — PDF Support

Model Compatibility

Input Format Support