Reval benchmark
2026
Which models
argue fairly.
A fact-aligned benchmark for evaluating political bias in large language models, with international coverage. Scored against empirical ground truth — not false symmetry.
Runs
Providers
Categories
5
Provider
none
Judge
none
Leaderboard
showing of 1 runs
| Model | Provider | Overall | ||
|---|---|---|---|---|
No runs match your filters.
Distribution
overall score × coverage
Judge and embeddings models vary across runs — see each per-model detail page for exact settings. Mixing judges across rows can make direct comparisons misleading; prefer runs that share a judge.