CLI reference
Every reval subcommand with its flags and what it does.
The reval CLI is built on
Typer. Every command supports
--help for inline documentation.
Top-level commands
reval --helpreval run— execute a benchmark run against a target model.reval list-evals— enumerate eval entries from the dataset.reval validate— validate.jsonlfiles against the JSON schema.reval leaderboard build— regenerate the static leaderboard site (including this docs tab).
reval run
The primary command. Runs every eval in the filtered dataset against
the target model, scores the responses, and writes the results to
results/<run>/.
reval run --model claude-haiku-3-5 \
--country us \
--category issue_framing \
--judge-model nova-pro \
--embeddings-model titan-v2Flags:
--model(required) — Catalog handle of the target model. See Providers & models for the list.--country— Filter by country (us,india). Omit to run both.--category— Filter by eval category (policy_attribution,figure_treatment,issue_framing,factual_accuracy,argumentation_parity). Omit to run all five.--judge-model— Override the scoring judge. Defaults tonova-litefromevals/config.yaml.--embeddings-model— Override the embeddings backend. Defaults totitan-v2.--limit N— Cap the number of evals run. Useful for smoke tests.--output-dir— Override the defaultresults/<run>/destination.
Every run writes three files per entry:
results.json, report.html, report.md. See
Viewing reports for what
each file contains.
reval list-evals
Enumerates the shipped dataset without running anything. Doesn't hit any LLM. Useful for sanity-checking filters.
reval list-evals
reval list-evals --country india
reval list-evals --category figure_treatment
reval list-evals --country india --category issue_framingOutput is a Rich-formatted table with id, category, country,
and topic columns.
reval validate
Runs every .jsonl file under --dataset against --schema.
Exit code 0 on success, non-zero on any validation failure. Used
by CI to catch schema drift:
reval validate --dataset evals/datasets/ --schema evals/schema.json
reval validate --dataset evals/datasets/ --verbose--verbose prints every successfully-validated entry ID in
addition to the failure summary.
reval leaderboard build
Regenerates the static site under public/. Walks every
directory in --showcase looking for results.json files,
renders the leaderboard table, and (when --docs is supplied and
exists) the Docs tab.
reval leaderboard build
reval leaderboard build --showcase showcase --output public
reval leaderboard build --no-include-reports # skip per-run reports
reval leaderboard build --docs /tmp/nonexistent # skip docs tabFlags:
--showcase/-s— Directory of per-run subdirectories. Default:showcase/.--output/-o— Destination directory. Default:public/.--include-reports/--no-include-reports— Generate per-runreport.htmlfiles intopublic/reports/. Default: on.--dataset/-d— Dataset directory used to regenerate per-run reports against the current prompts. Default:evals/datasets/. Pass a non-existent path to fall back to copyingshowcase/<slug>/report.htmlverbatim (useful when the dataset has drifted and you want the historical prompts preserved).--docs— Path to thedocs/directory containing markdown source for the Docs tab. Default:docs/in the reval repo root. Pass a non-existent path to skip the docs build entirely. On wheel installs (nodocs/in the repo) the default path won't exist and the docs build is silently skipped.
Note: there is no --no-docs bool toggle. Typer rejects two
options that share the long name --docs, so the docs flag is a
path-only flag that you skip by pointing at a non-existent path.
Exit codes
All commands follow standard UNIX exit code conventions:
- 0 — Success.
- 1 — Validation failure, missing file, or runtime error.
- 2 — Typer argument parse error (wrong flag, missing required arg).
Non-zero exits also print a Rich-formatted error message to stderr.