Multimodal - ARouter

ARouter supports multimodal inputs and outputs — you can send images, PDFs, and audio alongside text messages, and receive images or spoken audio as output.

Supported Modalities

Modality	Direction	Notes
Text	Input + Output	All models
Images (URL / base64)	Input	Vision models — JPEG, PNG, GIF, WebP
PDFs (base64)	Input	Anthropic Claude, Google Gemini
Audio (base64)	Input	Multimodal audio models
Image generation	Output	DALL-E 3, Flux, Stable Diffusion
Audio output (TTS / spoken)	Output	TTS models, audio chat models

Use GET /v1/models with query parameters to discover models supporting specific modalities:

# Models that accept image input
GET /v1/models?supported_parameters=vision

# Models that output images
GET /v1/models?output_modalities=image

# Models that output audio
GET /v1/models?output_modalities=audio

Images

Using an Image URL

Pass a publicly accessible image URL in the image_url content part:

{
  "model": "openai/gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ]
}

Using Base64-Encoded Images

For private images or when you don’t have a public URL, encode the image as base64:

{
  "model": "openai/gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..."
          }
        }
      ]
    }
  ]
}

Image Detail Level

Use the detail parameter to control resolution. Higher detail costs more tokens:

Value	Description
`auto` (default)	Provider decides based on image size
`low`	Faster, cheaper — 85 tokens, resize to 512×512
`high`	Full resolution — tiles the image, more tokens

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/image.jpg",
    "detail": "high"
  }
}

Full Example — Vision

Python (OpenAI)
Node.js (OpenAI)
Anthropic SDK
cURL

import base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

# Option 1: Image URL
response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                        "detail": "auto",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)

# Option 2: Base64 image
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";
import * as fs from "fs";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

// Option 1: Image URL
const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            detail: "auto",
          },
        },
      ],
    },
  ],
});
console.log(response.choices[0].message.content);

// Option 2: Base64 image
const imageData = fs.readFileSync("image.jpg").toString("base64");

const response2 = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Describe this image." },
        {
          type: "image_url",
          image_url: { url: `data:image/jpeg;base64,${imageData}` },
        },
      ],
    },
  ],
});
console.log(response2.choices[0].message.content);

import base64
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.ai",
    api_key="lr_live_xxxx",
)

# Option 1: Image URL
response = client.messages.create(
    model="claude-sonnet-4.6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                },
                {"type": "text", "text": "What's in this image?"},
            ],
        }
    ],
)
print(response.content[0].text)

# Option 2: Base64 image
with open("image.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4.6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data,
                    },
                },
                {"type": "text", "text": "Describe this image."},
            ],
        }
    ],
)
print(response.content[0].text)

# Image URL
curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ]
  }'

PDFs

Some models can process PDF documents directly. PDFs are passed as base64-encoded content.

Anthropic Claude — PDF Support

import base64
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.arouter.ai",
    api_key="lr_live_xxxx",
)

with open("document.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4.6",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data,
                    },
                },
                {"type": "text", "text": "Summarize the key points of this document."},
            ],
        }
    ],
)
print(response.content[0].text)

Google Gemini — PDF Support

import base64
import google.generativeai as genai

genai.configure(
    api_key="lr_live_xxxx",
    transport="rest",
    client_options={"api_endpoint": "https://api.arouter.ai"},
)

with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode("utf-8")

model = genai.GenerativeModel("gemini-2.5-flash")
response = model.generate_content([
    {
        "inline_data": {
            "mime_type": "application/pdf",
            "data": pdf_data,
        }
    },
    "Summarize the key points of this document.",
])
print(response.text)

Model Compatibility

Model	Image URL	Image Base64	PDF	Audio Input
`openai/gpt-5.4`	✓	✓	—	—
`openai/gpt-5.4-pro`	✓	✓	—	—
`openai/gpt-5.4-audio-preview`	✓	✓	—	✓
`anthropic/claude-sonnet-4.6`	✓	✓	✓	—
`anthropic/claude-opus-4.5`	✓	✓	✓	—
`google/gemini-2.5-flash`	✓	✓	✓	✓
`google/gemini-2.5-pro`	✓	✓	✓	✓

Use GET /v1/models to query the latest capability information.

Input Format Support

Format	When to Use
Image URL	Public images accessible on the internet
Image base64	Private images, local files, or when URL is not available
PDF base64	Document analysis (Claude and Gemini only)
Audio base64	Voice input for audio chat models

Image tokens count toward the prompt token limit. Large, high-resolution images with detail: "high" can consume significantly more tokens than text. Always check usage.prompt_tokens to monitor consumption.

Other Modalities

For dedicated audio and image generation documentation:

Audio — Speech-to-text, text-to-speech, and audio chat models
Image Generation — Generate images from text prompts using DALL-E, Flux, and more

Documentation Index

​Supported Modalities

​Images

​Using an Image URL

​Using Base64-Encoded Images

​Image Detail Level

​Full Example — Vision

​PDFs

​Anthropic Claude — PDF Support

​Google Gemini — PDF Support

​Model Compatibility

​Input Format Support

​Other Modalities

Supported Modalities

Images

Using an Image URL

Using Base64-Encoded Images

Image Detail Level

Full Example — Vision

PDFs

Anthropic Claude — PDF Support

Google Gemini — PDF Support

Model Compatibility

Input Format Support

Other Modalities