ADR-017 — Defer per-model cost tables; surface tokens, let consumers price¶
Status: Decided (2026-04-25); not planned.
Context¶
TokenUsage.cost_credits (added in ADR-012) is populated only when the
provider returns a per-call cost directly. Today that’s only OpenRouter
(usage.cost, in OpenRouter credits). For OpenAI / Anthropic / Gemini /
Mistral / Fireworks / Together, cost_credits stays None — consumers
have to multiply tokens by their own pricing knowledge to get a dollar
figure.
A natural-feeling addition would be a per-model pricing table built
into citeformer — PRICING["gpt-4o-mini"] = {"input": 0.15e-6, "output": 0.60e-6} style — so we could compute and surface cost_credits (in
inferred USD) for every backend.
Decision¶
Don’t ship a pricing table. Reasons:
Pricing changes constantly. OpenAI alone re-priced gpt-4o twice in 2024-2025 and added/deprecated tiers (cached input, batch discount, structured-output surcharge). A baked-in table would be stale within months and create a maintenance burden disproportionate to its value.
The data is one multiply away from
usage.input_tokensandusage.output_tokens. Consumers who care about cost already know their pricing (it’s on their bill). A library shipping inferred USD pricing is just a convenience that’s wrong half the time and right the other half.OpenRouter solved it correctly by reporting cost server-side. The right shape is the provider returning the cost, not the client guessing it. We should expose
cost_creditswhen the provider gives us one, and stay silent otherwise.Cached / batch pricing tiers compound the error. Anthropic’s
cache_creation_input_tokens/cache_read_input_tokensare priced 1.25× and 0.1× of fresh input respectively. OpenAI’s batch API is 50% off. A pricing table that doesn’t account for these would systematically over- or under-count.
What we ship instead¶
Token counts on
TokenUsage— accurate, provider-reported, composes with whatever pricing model the consumer wants.cost_creditspopulated when the provider returns it. Today that’s OpenRouter’susage.costfield. As other providers add cost-in-response (Anthropic announced this is coming; Bedrock’s CloudWatch metrics already expose it) we extract it and surface it on the same field.Documentation in the
TokenUsagedocstring noting that consumers needing dollar figures should consult the provider’s pricing page or use a third-party calculator (tokencost,tokonomy, etc.) rather than expecting a table from us.
When we’d reconsider¶
A vendor-neutral pricing source with a stable API and refresh guarantee (LiteLLM ships one; if it stays maintained and accurate, pulling it in as an optional integration could be reasonable — separate PR).
A specific user requirement for in-library pricing that can’t be met by computing from token counts.
Until then: tokens in, tokens out, cost when the provider cares to tell us.
Consequences¶
TokenUsagestays as defined in ADR-012. No new fields.No
pricing.pymodule, no per-model dict.cost_creditsdocstring already notes the unit and that “OpenRouter is the only provider exposing this today.”A
pricingextra is intentionally NOT created.