ADR-012 — GenerationResult bumped to schema_version=3 with optional usage¶
Status: Accepted and implemented (2026-04-25).
Context¶
Every API backend (OpenAIBackend, AnthropicBackend, GeminiBackend,
MistralBackend, plus the new OpenRouterBackend) receives a per-call
usage payload from the provider — input tokens, output tokens, and
(for Anthropic) cache-creation / cache-read tokens, plus (for OpenRouter)
the per-call USD cost. citeformer was discarding all of it. Users running
real RAG pipelines on API backends had no way to get token / cost
visibility without poking around in private SDK objects.
The natural home is GenerationResult. It’s already the canonical typed
output of Citeformer.generate(). Adding a usage: TokenUsage | None
field makes the data accessible without changing any other shape; local
backends leave it None (they don’t bill per token).
Decision¶
Introduce
citeformer.core.TokenUsage— a frozen pydantic model withinput_tokens,output_tokens, optionalcache_creation_input_tokens,cache_read_input_tokens, andcost_creditsfields. Top-level export alongside the other types. (cost_credits, notcost_usd, because OpenRouter — the only provider exposing per-call cost today — reports in credits, not dollars.)Add
GenerationResult.usage: TokenUsage | None = None.Bump
GenerationResult.schema_versionfrom 2 → 3.Each API backend exposes a
last_usage: TokenUsage | Noneinstance attribute populated at the end of everygenerate()/stream()call. TheCiteformerorchestrator reads it viagetattr(backend, "last_usage", None)and threads it ontoresult.usage.
The getattr (rather than a typed Backend.last_usage property on the
ABC) keeps local backends from having to implement a no-op getter and
keeps the Backend ABC unchanged — useful since out-of-tree backends
people have written against the v0.1 ABC keep working untouched.
Consequences¶
GenerationResultsnapshot intests/integration/test_schemas.pyregenerated. The schema_version assertion was bumped from 2 to 3.Pre-bump serialised results (schema_version=2) deserialise cleanly into the new model —
usagedefaults toNone. No migration shim needed.Users hand-constructing
GenerationResult(custom backends, tests, demo scripts) getusage=Nonefor free; they can opt-in to filling it themselves when they care.Backends that newly populate
last_usage:OpenAIBackend— fromcompletion.usage.{prompt,completion}_tokens.AnthropicBackend— frommessage.usage.{input,output,cache_creation_input,cache_read_input}_tokens.GeminiBackend— fromresponse.usage_metadata.{prompt,candidates}_token_count.MistralBackend— fromresponse.usage.{prompt,completion}_tokens.OpenRouterBackend— same as OpenAI plususage.cost(OpenRouter credits, surfaced oncost_credits).
Local backends (
HFBackend,VLLMBackend,LlamaCppBackend,MockBackend) do not setlast_usage; the orchestrator surfacesusage = Nonefor those.Non-
Citeformerconsumers callingBackend.generate()directly still seestrreturned — they pullbackend.last_usagethemselves if they want it.CHANGELOG documents the bump under
Contracts (§10).