Upcoming features
What's deferred, what's planned, and what's committed.
REVAL is an active research project. This page tracks what's on the roadmap and what's been intentionally deferred, with honest status flags so you know which items are "real soon now" versus "probably never without more contributors".
reval-webui
Status: not yet scaffolded.
The long-term plan has three sibling Python projects:
reval (authoritative) ──▶ reval-collector ──▶ reval-webui (planned)reval owns the contracts, providers, runner, and the static
leaderboard that powers revalbench.com.
reval-collector is a LangGraph pipeline that generates
evidence-grounded political-bias test cases and depends on reval
as a library. reval-webui does not exist yet — it's a
placeholder for a future interactive tab that consumes reval's
results.json and the leaderboard JSON data.
When reval-webui lands, it will:
- Import
reval.contractsdirectly (the contracts module is zero-dep, so webui can pick it up without dragging in AWS or HTTP libraries). - Re-use
reval.leaderboard.build.load_rows()to read showcase data without duplicating parsing logic. - Possibly consume
reval.leaderboard.docs.load_docs()to render the same docs tab inside the webui, so documentation doesn't fragment across two deploy surfaces.
Until it exists, the static leaderboard at revalbench.com — which you're reading right now — is the only user-facing surface.
Publish-results button
Status: design sketch, deferred to webui scaffolding.
When reval-webui is built, it needs a "Publish results"
affordance that takes a completed reval run (results.json +
per-run HTML report) and pushes it to the public leaderboard
pipeline. Open design questions:
- One-click GitHub Action trigger?
- Direct write to the leaderboard data directory?
- Manifest-signing step that the deploy job picks up?
Capturing the requirement now so it lands in the initial webui scaffold rather than as an after-the-fact bolt-on.
PyPI publication + version pinning
Status: deferred ~4 weeks from docs-tab planning (revisit mid-2026-05).
Today reval-collector depends on reval via an editable install:
# reval-collector/pyproject.toml
[tool.uv.sources]
reval = { path = "../reval", editable = true }That works because the two sub-repos are siblings inside
~/Documents/reval-workspace/. Once reval hits a stable minor
version, the plan is:
- Publish reval to PyPI (or at least a git tag).
- Switch collector's
tool.uv.sourcesto a semver range likereval>=0.3,<0.4. - Adopt a
CHANGELOG.md [CONTRACT]section for breaking changes, so downstream consumers (collector, webui) can pin against a known-good minor version.
This unblocks distributing reval independently of the sibling- repo layout.
Release coordinator agent
Status: deferred until version pinning.
Once reval is published and collector pins against a release, the
plan is to add a release-coordinator agent to the workspace-level
.claude/agents/ directory. It would handle:
- Bumping reval's version.
- Regenerating the CHANGELOG for any
[CONTRACT]breakage. - Opening a coordinated PR in collector (and eventually webui) to bump the pinned range.
For now, the existing reval-architect, contract-impact, and
cross-repo-pr-reviewer agents are sufficient — they catch
breakage, they just don't automate the version bump.
Dataset expansion
Status: phase 2 in progress, phase 3 and 4 planned.
| Phase | Target | Today |
|---|---|---|
| 1 | 54 evals across US + India | ✅ Shipped |
| 2 | ~500 evals — expanded US + India, add UK, Germany, Brazil, Global | In progress |
| 3 | Judge calibration against human labels, cross-model consistency testing | Planned |
| 4 | ~1000 evals, public benchmark leaderboard, integrations | Planned |
See reval/README.md#Roadmap for the authoritative phase
breakdown. New evals land via PRs that touch
evals/datasets/<country>/ and get validated by CI.
Drift tests
Status: open follow-up.
Several docs pages duplicate prose from reval/README.md and the
rubric YAML files (install steps, scoring formulas, rubric
criteria). This was a deliberate v1 tradeoff — building a full
README-to-docs extractor would have tripled the docs tab scope.
The follow-up is a lightweight grep-based drift test: assert that
specific substrings ("pip install -e", "0.85", each rubric
criterion name) appear in both the README and the corresponding
docs page. If they drift apart, CI fails and you know to update
both.
Known gaps
Things that are NOT on the roadmap yet but have been flagged for future consideration:
- Full-text search across the docs tab (lunr.js or pagefind). Today's Alpine sidebar filter matches on titles only.
- Scrollspy TOC — highlighting the current h2 as you scroll. The right-hand TOC is static in v1.
- Versioned docs — only
latestis served; no/v0.3/archive. - i18n / translated docs. All docs are English-only.
- Edit-on-GitHub link per page.
If any of these would unblock your work, open an issue on the reval repo.