ADR-010 — VerificationReport gains citations_checked (schema v3)¶
Status: Accepted (2026-04-23).
Supersedes parts of: ADR-008 on the specific point of “how consumers tell
no citationsapart fromall citations supported”.
Context¶
VerificationReport.support_rate is defined as “fraction of citations with
supported == True, in [0, 1].” To keep the type simple (a plain
float rather than float | None), we also defined it as 1.0 when
per_citation is empty — the vacuous “no unsupported claims exist”
reading.
In practice this makes benchmark charts misleading: runs where the baseline
emits zero citations (common on non-instruct models like gpt2, and on small
instruct models with short budgets) show support_rate = 100%, next to
runs where citeformer emitted many citations and genuinely scored ~30% of
them as supported. Reader interpretation is “baseline is perfect, citeformer
is mid”, which inverts the actual story.
Options considered:
Change
support_ratesemantics to 0.0 when empty. Breaks the vacuous-entailment reading and makes “no citations” look like “all claims unsupported” — also misleading, just in a different direction.Make
support_ratefloat | None. Honest but breaks anyone reading the field as a float. Requires a major bump on a field that downstream tooling probably treats as a primitive.Keep
support_rate = 1.0as the semantic default, add a sibling field so consumers can detect the “nothing to check” case explicitly. Additive / minor change.
Decision¶
Take option 3. Add citations_checked: int to VerificationReport.
citations_checked == len(per_citation)by construction — literally the count of citations the verifier scored.citations_checked == 0is the honest “no citations existed” signal. Consumers that report aggregatesupport_ratenumbers should gate oncitations_checked > 0to avoid publishing “100% supported” for zero-citation runs.support_rateitself unchanged — still1.0for emptyper_citation(backward-compat with anything reading it as>= thresholdboolean).
Schema version: 2 → 3. Additive/minor per the
§10.3 ceremony: a new optional field with a
default. Snapshot test in
tests/integration/test_schemas.py regenerated with the new field;
test_verification_report_schema_version_is_3 pins the version explicitly.
Consequences¶
benchmarks/plot.py can now filter out “no cites” entries instead of averaging them into misleadingly-high support rates. (Applied in the same PR as this ADR.)
External consumers that deserialize
VerificationReportfrom JSON need to be aware thatschema_versionis 3 now. The extra field has a default of 0, so older JSON deserializes fine (pydantic populates the default), but comparisons against a pinned schema may flag the bump.No breaking change to the
support_ratesemantics — the “1.0 when empty” quirk stays for now. Callers that want strict semantics write ``report.citations_checked > 0 and report.support_rate= threshold``.
CHANGELOG
Contracts (§10)entry documents the additive field.
Follow-on work¶
benchmarks/plot.pyannotates zero-cite bars asn/arather than drawing a full-height “100%” bar.benchmarks/_common.py::analyze_runnow includescitations_checkedin the row it emits to the sweep JSON log (tracked via the existingbaseline_cites/constrained_citesfields, so no separate plumbing).