Reval benchmark 2026

Which models
argue fairly.

A fact-aligned benchmark for evaluating political bias in large language models, with international coverage. Scored against empirical ground truth — not false symmetry.

Runs
Providers
Categories
5

Provider
none
Judge
none
Leaderboard
showing of 1 runs
Model Overall
No runs match your filters.
Distribution
overall score × coverage

Judge and embeddings models vary across runs — see each per-model detail page for exact settings. Mixing judges across rows can make direct comparisons misleading; prefer runs that share a judge.