citeformer.backends.anthropic¶
Anthropic backend — adapter over Anthropic’s native Citations API.
Anthropic’s Messages API has first-class Citations support (launched
Jan 2025): pass documents as {"type": "document", ..., "citations": {"enabled": true}} and every assistant-side text block is decorated
with an optional citations array referencing the document index +
character span.
This backend is an adapter, not an enforcement layer. Claude’s own system ensures the returned citation references point at a document that was actually provided — fabricating a reference is provider-side impossible. We translate Anthropic’s native shape back into citeformer’s
- class:
~citeformer.core.Citation/ :class:~citeformer.core.Referencetypes so downstream code can mix Anthropic output with local-backend output in the same pipeline.
Because the enforcement is native, marker_style is advisory on this
backend — we render Claude’s citations in the chosen shape for
consistency with the rest of citeformer, but the provider itself doesn’t
know about marker styles; it emits a structured citation block per
assertion.
Prompt caching (cache_control) is on by default for the document
blocks. Claude prices cache-read tokens at ~10% of fresh input tokens,
so for any RAG pipeline that reuses the same source list across calls
the saving is substantial. Disable with use_prompt_cache=False if
the documents are one-shot.
True per-block streaming via :meth:stream is wired to the SDK’s
messages.stream() context manager — text deltas are batched per
block so the citation markers attach to the right block when the block
finishes (the per-token delta path doesn’t carry citation info on the
wire; you only see citations at content_block_stop).
Requires the anthropic extra: pip install citeformer[anthropic].
Module Contents¶
Classes¶
Anthropic Messages API backend with native citation support. |
API¶
- class citeformer.backends.anthropic.AnthropicBackend(model: str = _DEFAULT_MODEL, *, client: Any | None = None, async_client: Any | None = None, **client_kwargs: Any)¶
Bases:
citeformer.backends.base.BackendAnthropic Messages API backend with native citation support.
Attributes: model: Anthropic model id (e.g.
"claude-sonnet-4-6"). client: Theanthropic.Anthropicclient. last_usage: Token-usage payload from the most recentgenerate()/stream()call.Nonebefore the first call. The orchestrator threads this onto :attr:GenerationResult.usage. last_rich_citations: One dict per marker emitted in the most recent call, in left-to-right output order. Each carries thesource_id,cited_text(the exact span Claude cited from),source_span(offsets into the source content), anddocument_titlereturned by the Citations API. The orchestrator zips this with the parsed marker list and populates :attr:Citation.cited_text/source_span/document_title. Empty list when the call emitted no citations.Initialization
Construct an Anthropic backend.
Args: model: Anthropic model id supporting Citations (any 3.5+ or Claude 4 family). client: Pre-built
anthropic.Anthropicclient. IfNone, one is constructed from the environment (picks upANTHROPIC_API_KEY). async_client: Pre-builtanthropic.AsyncAnthropicclient used by :meth:agenerate/ :meth:astream(ADR-014). WhenNone, one is built lazily fromclient_kwargson the first async call — sync-only callers don’t pay the construction cost. **client_kwargs: Forwarded toAnthropic()/AsyncAnthropic()when the respective client isNone.- last_usage: citeformer.core.TokenUsage | None¶
None
- property client: Any¶
Lazy
anthropic.Anthropicclient used by the sync surface.Built on first access from
client_kwargs(or returns the constructor-supplied override). Async-only callers never trigger construction.
- property async_client: Any¶
Lazy
anthropic.AsyncAnthropicclient used by the async surface.Built on first access from the same
client_kwargsthe sync client used. Sync-only callers never trigger construction.
- generate(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) str¶
Call Messages API with citations enabled; flatten to marker-decorated text.
Args: prompt: User prompt. sources: Sources in scope. Each becomes one document block. policy: Citation policy — threaded into the system prompt so Claude sees the caller’s enforcement intent. The provider itself doesn’t have a typed policy, so we rely on the system prompt to shape behaviour. **options:
max_tokens(default 1024),temperature(default Anthropic’s own default — passed through only when explicitly supplied),system_prompt(extra system content),marker_style(default BRACKET — advisory, used to render citation markers),use_prompt_cache(defaultTrue; setscache_control: ephemeralon every document block so repeat-source RAG pays cache-read prices on subsequent calls),extra_headers(forwarded to the SDK).Returns: Flattened text carrying the configured marker style for every assertion Claude cited.
Raises: ValueError: If
sourcesis empty.
- async agenerate(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) str¶
Native-async counterpart of :meth:
generate(ADR-014).Uses
self.async_client(the lazyAsyncAnthropic) so concurrent callers don’t tie up executor threads on the SDK’s HTTP wait. Same request-shape construction, prompt-caching,last_usageandlast_rich_citationscapture as the sync path — only the client call is awaited.
- async astream(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) collections.abc.AsyncIterator[str]¶
Native-async block-level streaming via
AsyncAnthropic.messages.stream.Mirrors :meth:
streamexactly but uses the SDK’s async stream context manager (async with ...+async for event in streamawait stream.get_final_message()). One yielded chunk per completed text block, with citation markers attached.
Falls back to a single-chunk yield via :meth:
ageneratewhen the async client doesn’t exposestream(test stand-ins that only mockmessages.create).
- stream(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) collections.abc.Iterator[str]¶
Stream block-sized chunks via Anthropic’s native
messages.stream().Each yielded chunk corresponds to one finished text block from Claude — text + the marker(s) for any citations attached to that block. Yielding per-block (rather than per-token) is the natural granularity for the Citations API: citation events only arrive at
content_block_stop, so per-token text deltas would have to be rewritten in-place when the citations land. The per-block path is honest and produces clean output.Falls back to the non-streaming path on SDKs that don’t expose
messages.stream(very old client versions or test stand-ins that mock onlymessages.create).Args: prompt: See :meth:
generate. sources: See :meth:generate. policy: See :meth:generate. **options: Same options as :meth:generate.Yields: Per-block text chunks (each terminated by a single space) carrying any citation markers that landed on the block.