프로바이더 라우팅

ARouter는 모델 가용성, 프로바이더 상태, 비용 효율성을 기반으로 각 요청을 최적의 업스트림 프로바이더로 자동 라우팅합니다. 대부분의 사용 사례에서 설정 없이 자동으로 이루어집니다. 고급 제어가 필요한 경우 요청 본문에 provider 객체를 전달하여 라우팅 결정을 커스터마이징할 수 있습니다.

`provider` 객체

/v1/chat/completions 요청에 provider 객체를 포함하여 라우팅 기본값을 재정의합니다:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "provider": {
    "sort": "throughput",
    "allow_fallbacks": true
  }
}

전체 필드 참조

필드	타입	기본값	설명
`order`	`string[]`	—	순서대로 시도할 프로바이더 슬러그 목록. 예: `["openai", "azure"]`
`allow_fallbacks`	`boolean`	`true`	기본 프로바이더를 사용할 수 없을 때 백업 프로바이더를 허용할지 여부
`require_parameters`	`boolean`	`false`	요청의 모든 파라미터를 지원하는 프로바이더만 사용
`data_collection`	`"allow" \| "deny"`	`"allow"`	요청 데이터를 저장할 수 있는 프로바이더 사용 여부 제어
`zdr`	`boolean`	—	Zero Data Retention 엔드포인트로만 라우팅 제한
`only`	`string[]`	—	이 요청에 허용할 프로바이더 슬러그 목록
`ignore`	`string[]`	—	이 요청에서 건너뛸 프로바이더 슬러그 목록
`quantizations`	`string[]`	—	양자화 수준으로 필터링. 예: `["int4", "int8"]`
`sort`	`string \| object`	—	`"price"`, `"throughput"`, `"latency"`로 프로바이더 정렬
`preferred_min_throughput`	`number \| object`	—	선호하는 최소 처리량(tokens/초)
`preferred_max_latency`	`number \| object`	—	선호하는 최대 레이턴시(초)
`max_price`	`object`	—	토큰당 최대 지불 가격

기본 전략: 비용 기반 로드 밸런싱

기본적으로 ARouter는 비용을 우선시하면서 정상 프로바이더 간에 요청을 로드 밸런싱합니다. 알고리즘:

지난 30초 동안 심각한 장애가 있었던 프로바이더 제외
안정적인 프로바이더 중에서 가격의 역제곱으로 가중치를 두어 선택
나머지 프로바이더를 자동 폴백으로 사용

예시: 프로바이더 A가

1/M 토큰, 프로바이더 B가

2/M, 프로바이더 C가 $3/M인 경우:

프로바이더 A는 프로바이더 C보다 9배 더 선택될 가능성이 높습니다 (역제곱 가중치)
프로바이더 A가 실패하면 프로바이더 C를 다음으로 시도
프로바이더 B(최근 저하됨)는 마지막으로 시도

sort 또는 order를 설정하면 로드 밸런싱이 비활성화되고 프로바이더는 엄격한 순서로 시도됩니다.

프로바이더 정렬

sort 필드를 사용하여 프로바이더 속성에 명시적으로 우선순위를 부여합니다. 로드 밸런싱이 비활성화되고 프로바이더는 순서대로 시도됩니다. 사용 가능한 정렬 값:

"price" — 최저 토큰 비용 우선
"throughput" — 최고 tokens/초 우선
"latency" — 최저 Time-to-first-token 우선

TypeScript
Python
cURL

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arouter.ai/v1",
  apiKey: "lr_live_xxxx",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [{ role: "user", content: "Hello" }],
  // @ts-ignore
  provider: { sort: "throughput" },
});

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arouter.ai/v1",
    api_key="lr_live_xxxx",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"provider": {"sort": "throughput"}},
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {"sort": "throughput"}
  }'

`:nitro` 및 `:floor` 단축키

모델 슬러그에 접미사를 추가하여 정렬의 줄임말로 사용:

접미사	동등한 설정
`:nitro`	`provider.sort = "throughput"`
`:floor`	`provider.sort = "price"`

{"model": "openai/gpt-5.4:nitro"}  // 처리량으로 정렬
{"model": "openai/gpt-5.4:floor"}  // 가격으로 정렬

파티션을 사용한 고급 정렬

후보 모델 목록(models[])을 사용할 때, sort 필드는 partition 옵션이 있는 객체가 되어 엔드포인트가 모델 전반에 걸쳐 어떻게 정렬되는지 제어할 수 있습니다.

필드	타입	기본값	설명
`sort.by`	`string`	—	`"price"`, `"throughput"`, `"latency"`
`sort.partition`	`string`	`"model"`	`"model"`(기본 모델 먼저 시도) 또는 `"none"`(전역 정렬)

기본값(partition: "model")에서 엔드포인트는 모델별로 그룹화됩니다——첫 번째 모델의 엔드포인트는 항상 두 번째 모델보다 먼저 시도됩니다. partition: "none"을 설정하면 이 그룹화가 제거되어 모든 후보 모델에 걸쳐 전역 정렬이 가능해집니다.

사용 사례 1: 여러 모델에 걸쳐 최고 처리량으로 라우팅

여러 허용 가능한 모델이 있고 현재 가장 빠른 것을 사용하고 싶을 때:

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  // @ts-ignore
  models: [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash",
  ],
  messages: [{ role: "user", content: "Hello" }],
  provider: {
    sort: { by: "throughput", partition: "none" },
  },
});

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "models": [
            "anthropic/claude-sonnet-4.6",
            "openai/gpt-5.4",
            "google/gemini-2.5-flash",
        ],
        "provider": {
            "sort": {"by": "throughput", "partition": "none"},
        },
    },
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      "anthropic/claude-sonnet-4.6",
      "openai/gpt-5.4",
      "google/gemini-2.5-flash"
    ],
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": {"by": "throughput", "partition": "none"}
    }
  }'

사용 사례 2: 성능 요구사항을 충족하는 최저가 모델

partition: "none"과 성능 임계값을 결합하여 SLA를 충족하면서도 최저 비용 옵션을 찾기:

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  // @ts-ignore
  models: [
    "anthropic/claude-sonnet-4.6",
    "openai/gpt-5.4",
    "google/gemini-2.5-flash",
  ],
  messages: [{ role: "user", content: "Hello" }],
  provider: {
    sort: { by: "price", partition: "none" },
    preferred_min_throughput: { p90: 50 },
  },
});

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "models": [
            "anthropic/claude-sonnet-4.6",
            "openai/gpt-5.4",
            "google/gemini-2.5-flash",
        ],
        "provider": {
            "sort": {"by": "price", "partition": "none"},
            "preferred_min_throughput": {"p90": 50},
        },
    },
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      "anthropic/claude-sonnet-4.6",
      "openai/gpt-5.4",
      "google/gemini-2.5-flash"
    ],
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {
      "sort": {"by": "price", "partition": "none"},
      "preferred_min_throughput": {"p90": 50}
    }
  }'

성능 임계값

프로바이더를 필터링하기 위한 최소 처리량 또는 최대 레이턴시 선호도 설정. 임계값을 충족하지 못하는 프로바이더는 완전히 제외되지 않고 우선순위가 낮아집니다(끝으로 이동).

필드	설명
`preferred_min_throughput`	최소 tokens/초. 숫자(p50에 적용) 또는 백분위 키가 있는 객체
`preferred_max_latency`	최대 Time-to-first-token(초). 숫자 또는 백분위 키가 있는 객체

백분위수 작동 방식

ARouter는 롤링 5분 창에서 프로바이더 성능을 추적합니다:

백분위수	의미
`p50`	요청의 50%가 이 값보다 좋은 성능(중앙값)
`p75`	요청의 75%가 이 값보다 좋은 성능
`p90`	요청의 90%가 이 값보다 좋은 성능
`p99`	요청의 99%가 이 값보다 좋은 성능

높은 백분위수(p90/p99)는 최악의 경우 성능에 대한 신뢰도를 높입니다. 지정된 모든 백분위수 컷오프를 충족해야 프로바이더가 선호 그룹에 포함됩니다.

{
  "provider": {
    "preferred_min_throughput": {
      "p50": 100,
      "p90": 50
    },
    "preferred_max_latency": {
      "p99": 3.0
    }
  }
}

preferred_min_throughput과 preferred_max_latency는 소프트 선호도입니다——요청이 처리되는 것을 막지 않습니다. 이는 하드 제한인 max_price와 다릅니다.

특정 프로바이더 순서 지정

order를 사용하여 시도할 프로바이더와 순서를 지정합니다. order가 설정되면 로드 밸런싱이 비활성화됩니다.

TypeScript
Python
cURL

const response = await client.chat.completions.create({
  model: "openai/gpt-5.4",
  messages: [{ role: "user", content: "Hello" }],
  // @ts-ignore
  provider: {
    order: ["openai", "azure"],
  },
});

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "provider": {
            "order": ["openai", "azure"],
        }
    },
)

curl https://api.arouter.ai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{"role": "user", "content": "Hello"}],
    "provider": {"order": ["openai", "azure"]}
  }'

특정 프로바이더만 허용

only를 사용하여 특정 프로바이더 집합으로 라우팅을 제한:

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "only": ["groq", "together"]
  }
}

프로바이더 무시

ignore를 사용하여 이 요청에서 특정 프로바이더를 건너뛰기:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "ignore": ["azure"]
  }
}

폴백 비활성화

기본적으로 기본 프로바이더를 사용할 수 없는 경우 ARouter는 대체 프로바이더로 폴백합니다. allow_fallbacks: false를 설정하면 정확한 프로바이더를 요구합니다:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "order": ["openai"],
    "allow_fallbacks": false
  }
}

지정한 프로바이더를 사용할 수 없는 경우 ARouter는 다른 곳으로 라우팅하는 대신 503 오류를 반환합니다.

파라미터 지원 요구

require_parameters: true를 설정하면 요청의 모든 파라미터를 지원하는 프로바이더에만 라우팅합니다. 기본적으로 ARouter는 지원되지 않는 파라미터를 무시하는 프로바이더에 라우팅할 수 있습니다.

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "tools": [...],
  "provider": {
    "require_parameters": true
  }
}

양자화 필터링

프로바이더가 제공하는 모델 양자화 수준으로 필터링합니다. 특정 정밀도/성능 트레이드오프가 필요할 때 유용합니다:

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "quantizations": ["fp16", "bf16"]
  }
}

일반적인 양자화 값: "fp32", "fp16", "bf16", "int8", "int4".

데이터 수집 정책

ARouter가 요청 데이터를 저장할 수 있는 프로바이더로 라우팅하는지 여부를 제어:

값	동작
`"allow"`(기본값)	데이터를 저장할 수 있는 프로바이더를 포함한 모든 프로바이더로 라우팅
`"deny"`	요청/응답 데이터를 저장하지 않는 프로바이더에만 라우팅

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{ "role": "user", "content": "Sensitive content" }],
  "provider": {
    "data_collection": "deny"
  }
}

Zero Data Retention (ZDR)

최대 개인정보 보호를 위해 Zero Data Retention 보장이 있는 프로바이더로만 라우팅을 제한:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "zdr": true
  }
}

ZDR 프로바이더는 요청 데이터를 로깅, 저장하거나 훈련에 사용하지 않습니다. 자세한 내용은 데이터 수집을 참조하세요.

최대 가격

토큰당 지불할 금액의 하드 제한을 설정합니다. 이 가격 요구사항을 충족하는 프로바이더가 없으면 더 비싼 프로바이더로 라우팅되는 대신 요청이 실패합니다:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello" }],
  "provider": {
    "max_price": {
      "prompt": "0.000010",
      "completion": "0.000030"
    }
  }
}

성능 임계값과 달리 max_price는 하드 제한입니다. 가격 요구사항을 충족하는 프로바이더가 없으면 요청은 오류를 반환합니다.

프로바이더 상태 및 가용성

ARouter는 서킷 브레이커 메커니즘을 사용하여 프로바이더 상태를 지속적으로 추적합니다:

상태	동작
정상	프로바이더가 정상적으로 요청을 수락하고 있음
저하	최근 오류 감지됨; 다른 키로 요청이 재시도될 수 있음
사용 불가	모든 키가 서킷 브레이크됨; ARouter가 `503`을 반환

이는 완전히 투명합니다——애플리케이션이 프로바이더 수준의 재시도 로직을 구현할 필요가 없습니다.

모델 접두사를 통한 프로바이더 지정

어떤 프로바이더가 요청을 처리하는지 제어하는 주요 방법은 provider/model 형식입니다:

{
  "model": "openai/gpt-5.4",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

지원되는 형식의 전체 목록은 모델 라우팅을 참조하세요.

네이티브 프로바이더 프록시

완전한 제어를 위해 프로바이더 프록시 엔드포인트 /{provider}/{path}를 사용하여 ARouter의 모델 라우팅 레이어를 완전히 우회합니다:

# OpenAI로 직접
curl https://api.arouter.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer lr_live_xxxx" \
  -d '{"model": "gpt-5.4", "messages": [...]}'

# Anthropic으로 직접
curl https://api.arouter.ai/anthropic/v1/messages \
  -H "Authorization: Bearer lr_live_xxxx" \
  -d '{"model": "claude-sonnet-4.6", "messages": [...]}'

전체 참조는 프로바이더 프록시를 참조하세요.

지원되는 프로바이더

프로바이더	접두사	모델 예시
OpenAI	`openai`	`openai/gpt-5.4`
Anthropic	`anthropic`	`anthropic/claude-sonnet-4.6`
Google	`google`	`google/gemini-2.5-flash`
DeepSeek	`deepseek`	`deepseek/deepseek-v3.2`
xAI	`x-ai`	`x-ai/grok-4.20`
Mistral	`mistralai`	`mistralai/mistral-large-2512`
Meta	`meta-llama`	`meta-llama/llama-4-maverick`
Qwen	`qwen`	`qwen/qwen3-235b`
MiniMax	`minimax`	`minimax/minimax-m2.7`
Groq	`groq`	`groq/llama-3.3-70b-versatile`
Kimi	`moonshotai`	`moonshotai/kimi-k2.5`
Dashscope	`dashscope`	`dashscope/qwen-max`

기능의 전체 목록은 프로바이더를 참조하세요.

Documentation Index

​provider 객체

​전체 필드 참조

​기본 전략: 비용 기반 로드 밸런싱

​프로바이더 정렬

​:nitro 및 :floor 단축키

​파티션을 사용한 고급 정렬

​사용 사례 1: 여러 모델에 걸쳐 최고 처리량으로 라우팅

​사용 사례 2: 성능 요구사항을 충족하는 최저가 모델

​성능 임계값

​백분위수 작동 방식

​특정 프로바이더 순서 지정

​특정 프로바이더만 허용

​프로바이더 무시

​폴백 비활성화

​파라미터 지원 요구

​양자화 필터링

​데이터 수집 정책

​Zero Data Retention (ZDR)

​최대 가격

​프로바이더 상태 및 가용성

​모델 접두사를 통한 프로바이더 지정

​네이티브 프로바이더 프록시

​지원되는 프로바이더

`provider` 객체

전체 필드 참조

기본 전략: 비용 기반 로드 밸런싱

프로바이더 정렬

`:nitro` 및 `:floor` 단축키

파티션을 사용한 고급 정렬

사용 사례 1: 여러 모델에 걸쳐 최고 처리량으로 라우팅

사용 사례 2: 성능 요구사항을 충족하는 최저가 모델

성능 임계값

백분위수 작동 방식

특정 프로바이더 순서 지정

특정 프로바이더만 허용

프로바이더 무시

폴백 비활성화

파라미터 지원 요구

양자화 필터링

데이터 수집 정책

Zero Data Retention (ZDR)

최대 가격

프로바이더 상태 및 가용성

모델 접두사를 통한 프로바이더 지정

네이티브 프로바이더 프록시

지원되는 프로바이더