Skip to main content
ARouter supports multimodal inputs — you can send images and PDFs alongside text messages. The model processes visual content and responds in text.

Supported Modalities

ModalitySupportedNotes
TextAll modelsDefault
Images (URL)Vision modelsJPEG, PNG, GIF, WebP
Images (base64)Vision modelsSame formats
PDFsSelect modelsAnthropic Claude, Gemini

Images

Using an Image URL

Pass a publicly accessible image URL in the image_url content part:
{
  "model": "openai/gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ]
}

Using Base64-Encoded Images

For private images or when you don’t have a public URL, encode the image as base64:
{
  "model": "openai/gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..."
          }
        }
      ]
    }
  ]
}

Image Detail Level

Use the detail parameter to control resolution. Higher detail costs more tokens:
ValueDescription
auto (default)Provider decides based on image size
lowFaster, cheaper — 85 tokens, resize to 512×512
highFull resolution — tiles the image, more tokens
{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/image.jpg",
    "detail": "high"
  }
}

Full Example — Vision

import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

# Option 1: Image URL
response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                        "detail": "auto",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)

# Option 2: Base64 image
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)

PDFs

Some models can process PDF documents directly. PDFs are passed as base64-encoded content.

Anthropic Claude — PDF Support

import base64
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.ai",
    api_key="lr_live_xxxx",
)

with open("document.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4.6",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data,
                    },
                },
                {"type": "text", "text": "Summarize the key points of this document."},
            ],
        }
    ],
)
print(response.content[0].text)

Google Gemini — PDF Support

import base64
import google.generativeai as genai

genai.configure(
    api_key="lr_live_xxxx",
    transport="rest",
    client_options={"api_endpoint": "https://api.arouter.ai"},
)

with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")

model = genai.GenerativeModel("gemini-2.5-flash")
response = model.generate_content([
    {
        "inline_data": {
            "mime_type": "application/pdf",
            "data": pdf_data,
        }
    },
    "Summarize the key points of this document.",
])
print(response.text)

Model Compatibility

ModelImage URLImage Base64PDF
openai/gpt-5.4
openai/gpt-5.4-pro
anthropic/claude-sonnet-4.6
anthropic/claude-opus-4.5
google/gemini-2.5-flash
google/gemini-2.5-pro
Use GET /v1/models to query the latest capability information.

Input Format Support

FormatWhen to Use
Image URLPublic images accessible on the internet
Image base64Private images, local files, or when URL is not available
PDF base64Document analysis (Claude and Gemini only)
Image tokens count toward the prompt token limit. Large, high-resolution images with detail: "high" can consume significantly more tokens than text. Always check usage.prompt_tokens to monitor consumption.