CLI reference

Every reval subcommand with its flags and what it does.

The reval CLI is built on Typer. Every command supports --help for inline documentation.

Top-level commands

reval --help

reval run — execute a benchmark run against a target model.
reval info — print installed version, categories, countries, and ground truth levels.
reval list-evals — enumerate eval entries from the dataset.
reval validate — validate .jsonl files against the JSON schema.
reval leaderboard build — regenerate the static leaderboard site (including this docs tab).

`reval run`

The primary command. Runs every eval in the filtered dataset against the target model, scores the responses, and writes the results to results/<run>/.

reval run --model claude-haiku-3-5 \
          --country us \
          --category issue_framing \
          --judge-model nova-pro \
          --embeddings-model titan-v2

Flags:

--model (required) — Catalog handle of the target model. See Providers & models for the list.
--country — Filter by country (us, india). Omit to run both.
--category — Filter by eval category (policy_attribution, figure_treatment, issue_framing, factual_accuracy, argumentation_parity). Omit to run all five.
--judge-model — Override the scoring judge. Defaults to nova-lite from evals/config.yaml.
--embeddings-model — Override the embeddings backend. Defaults to titan-v2.
--output / -o — Override the default results/<run>/ destination.

Every run writes three files per entry: results.json, report.html, report.md. See Viewing reports for what each file contains.

`reval info`

Prints the installed version, the five evaluation categories, supported countries, and the four ground truth levels. No API calls — useful as a quick install check and a reference for the fact-aligned taxonomy.

reval info

`reval list-evals`

Enumerates the shipped dataset without running anything. Doesn't hit any LLM. Useful for sanity-checking filters.

reval list-evals
reval list-evals --country india
reval list-evals --category figure_treatment
reval list-evals --country india --category issue_framing

Output is a Rich-formatted table with id, category, country, and topic columns.

`reval validate`

Runs every .jsonl file under --dataset against --schema. Exit code 0 on success, non-zero on any validation failure. Used by CI to catch schema drift:

reval validate --dataset evals/datasets/ --schema evals/schema.json
reval validate --dataset evals/datasets/ --verbose

--verbose prints every successfully-validated entry ID in addition to the failure summary.

`reval leaderboard build`

Regenerates the static site under public/. Walks every directory in --showcase looking for results.json files, renders the leaderboard table, and (when --docs is supplied and exists) the Docs tab.

reval leaderboard build
reval leaderboard build --showcase showcase --output public
reval leaderboard build --no-include-reports   # skip per-run reports
reval leaderboard build --docs /tmp/nonexistent # skip docs tab

Flags:

--showcase/-s — Directory of per-run subdirectories. Default: showcase/.
--output/-o — Destination directory. Default: public/.
--include-reports/--no-include-reports — Generate per-run report.html files into public/reports/. Default: on.
--dataset/-d — Dataset directory used to regenerate per-run reports against the current prompts. Default: evals/datasets/. Pass a non-existent path to fall back to copying showcase/<slug>/report.html verbatim (useful when the dataset has drifted and you want the historical prompts preserved).
--docs — Path to the docs/ directory containing markdown source for the Docs tab. Default: docs/ in the reval repo root. Pass a non-existent path to skip the docs build entirely. On wheel installs (no docs/ in the repo) the default path won't exist and the docs build is silently skipped.

Note: there is no --no-docs bool toggle. Typer rejects two options that share the long name --docs, so the docs flag is a path-only flag that you skip by pointing at a non-existent path.

Exit codes

All commands follow standard UNIX exit code conventions:

0 — Success.
1 — Validation failure, missing file, or runtime error.
2 — Typer argument parse error (wrong flag, missing required arg).

Non-zero exits also print a Rich-formatted error message to stderr.