Providers & models
The five supported provider surfaces and how to register a new model.
REVAL supports five LLM provider surfaces, all hidden behind a single
LLMProvider async ABC:
| Surface | Module | Auth |
|---|---|---|
| Bedrock | reval.providers.bedrock |
AWS IAM (env vars or profile) |
| Anthropic | reval.providers.anthropic_direct |
ANTHROPIC_API_KEY |
| OpenAI | reval.providers.openai_compat |
OPENAI_API_KEY |
| MiniMax | reval.providers.minimax |
MINIMAX_API_KEY |
| Ollama | reval.providers.ollama |
none — local daemon on :11434 |
The provider registry lives in reval/src/reval/providers/factory.py:
_REGISTRY: dict[str, type[LLMProvider]] = {
"bedrock": BedrockProvider,
"anthropic": AnthropicProvider,
"minimax": MinimaxProvider,
"openai": OpenAIProvider,
"ollama": OllamaProvider,
}A sixth entry (openai_compat) reuses OpenAIProvider against
third-party endpoints like Together, Groq, OpenRouter, and
Fireworks — any service that speaks the OpenAI API shape. Point
base_url at your endpoint via environment variables.
Choosing a judge and embeddings
Judge and embeddings are just more entries in the same
evals/config.yaml catalog. Any registered model can play any
role — target, judge, or embeddings — depending on which CLI flag
references it:
# Bedrock target, Bedrock judge, Bedrock embeddings (the defaults)
reval run --model claude-haiku-3-5
# Anthropic target, Bedrock judge (explicit), Bedrock embeddings
reval run --model claude-sonnet-4 --judge-model nova-pro
# OpenAI target, Anthropic judge, Ollama embeddings
reval run --model gpt-4o \
--judge-model claude-opus-4 \
--embeddings-model nomic-embedThe defaults are nova-lite (judge) and titan-v2 (embeddings),
both on Bedrock — so running without any judge/embeddings flags
needs AWS credentials. Override both with non-Bedrock entries if
you want to run fully off-AWS.
Adding a new model
Edit reval/evals/config.yaml and add a handle under models::
models:
# … existing entries …
my-new-model:
provider: openai # must match a _REGISTRY key
model_id: gpt-5-turbo # the string the provider's SDK acceptsThe provider: field must be one of the keys in _REGISTRY above.
The model_id: value is passed verbatim to the provider — for
Bedrock, that's a Bedrock ARN (e.g. amazon.nova-lite-v1:0); for
Anthropic, it's a model name (claude-sonnet-4-20250514); for
Ollama, it's whatever tag you've pulled locally (gemma4:e2b).
Once the handle is in the catalog, it's usable immediately:
reval run --model my-new-modelNo code changes required — the factory wires up LLMProvider
instances lazily based on the catalog entry.
Adding a new provider surface
If you need a surface that's not in the registry (a new cloud, a vendor-specific API), you'll need to write a provider implementation:
- Subclass
reval.providers.base.LLMProvider. - Implement the async
generate()method. - Register the class in
_REGISTRYinreval/src/reval/providers/factory.py. - Add a config entry in
evals/config.yamlthat uses your newprovider:key. - Add a test under
reval/tests/test_provider_<name>.py.
The five existing providers are good templates — they all share
the same async error-handling and retry pattern via the base
class. The shortest one to crib from is providers/minimax.py,
which delegates to the Anthropic-compatible wire format and adds
~50 lines of provider-specific request shaping.
Fully local (no cloud credentials)
If you want to run REVAL with zero cloud credits, use Ollama for target, judge, AND embeddings:
ollama pull gemma4:e2b
ollama pull nomic-embed-text
reval run --model gemma4-e2b-local \
--judge-model gemma4-e2b-local \
--embeddings-model nomic-embedEverything runs on localhost. The judge quality is markedly lower
than nova-lite or claude-opus-4, so local scores should be
compared against other local runs, not against cloud-judged runs.