Latency and Performance

ARouter is designed with performance as a top priority. The gateway is heavily optimized to add as little overhead as possible to your requests.

Minimal Overhead

ARouter adds minimal latency through:

Edge computing: Gateway nodes are deployed globally to stay as close as possible to your application
Efficient caching: User credentials and API key data are cached at the edge to avoid database roundtrips on every request
Optimized routing: Provider selection and key pool lookup are designed to complete in single-digit milliseconds

The gateway overhead on a typical request is well under 50ms.

Performance Considerations

Cache Warming

When edge caches are cold (typically during the first 1–2 minutes after a deployment or in a new region), you may experience slightly higher latency as caches warm up. This normalizes quickly.

Credit Balance Checks

To maintain accurate billing and prevent overages, ARouter performs additional database checks when:

A user’s credit balance is in single-digit dollars
An API key is approaching its configured credit limit

Caches are invalidated more aggressively under these conditions, increasing latency until more credits are added. To avoid this:

Maintain a healthy credit balance (recommended minimum: $10–20)
Set up auto-topup or periodic billing alerts

Multi-Model Routing Latency

When using ordered candidate model lists, if the first candidate is unavailable, ARouter routes to the next model. A failed first attempt adds latency to that request. ARouter tracks provider health continuously and routes around known-unavailable providers to minimize how often this occurs.

Best Practices

1. Use Streaming

For user-facing applications, use streaming responses to reduce perceived latency. The first token arrives sooner than the full response, making the application feel faster even if total generation time is the same. See Streaming.

2. Use Prompt Caching

For requests with repetitive prefixes (system prompts, few-shot examples, large documents), enable prompt caching. Cached tokens are served at significantly lower latency and reduced cost. See Prompt Caching.

3. Choose the Right Model

Smaller models are faster. If your use case doesn’t require the highest capability, a smaller model (e.g., google/gemini-2.5-flash, anthropic/claude-haiku-4-5) can reduce latency by 2–5x compared to the largest variants.

4. Use `:nitro` Provider Variants

For latency-critical workloads, append :nitro to a model ID to prefer high-throughput provider endpoints:

{ "model": "anthropic/claude-sonnet-4-6:nitro" }

:nitro routes to the provider configuration optimized for maximum throughput and lowest time-to-first-token. See Provider Routing for details.

5. Maintain a Healthy Credit Balance

Keeping your credit balance above a reasonable threshold (≥$10) prevents aggressive cache invalidation during billing checks, which can add measurable latency.

Measuring Performance

Use the x-response-time response header (if present) or measure round-trip time in your client to benchmark. ARouter also surfaces per-request latency data in your Activity feed.

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

Latency and Performance

Minimal Overhead

Performance Considerations

Cache Warming

Credit Balance Checks

Multi-Model Routing Latency

Best Practices

1. Use Streaming

2. Use Prompt Caching

3. Choose the Right Model

4. Use `:nitro` Provider Variants

5. Maintain a Healthy Credit Balance

Measuring Performance

Get Started

Core Concepts

Routing

Features

Guides

Privacy

Administration

Best Practices

Frameworks & Integrations

For Providers

Support

​Minimal Overhead

​Performance Considerations

​Cache Warming

​Credit Balance Checks

​Multi-Model Routing Latency

​Best Practices

​1. Use Streaming

​2. Use Prompt Caching

​3. Choose the Right Model

​4. Use :nitro Provider Variants

​5. Maintain a Healthy Credit Balance

​Measuring Performance

Minimal Overhead

Performance Considerations

Cache Warming

Credit Balance Checks

Multi-Model Routing Latency

Best Practices

1. Use Streaming

2. Use Prompt Caching

3. Choose the Right Model

4. Use `:nitro` Provider Variants

5. Maintain a Healthy Credit Balance

Measuring Performance