citeformer.backends.anthropic

Anthropic backend — adapter over Anthropic’s native Citations API.

Anthropic’s Messages API has first-class Citations support (launched Jan 2025): pass documents as {"type": "document", ..., "citations": {"enabled": true}} and every assistant-side text block is decorated with an optional citations array referencing the document index + character span.

This backend is an adapter, not an enforcement layer. Claude’s own system ensures the returned citation references point at a document that was actually provided — fabricating a reference is provider-side impossible. We translate Anthropic’s native shape back into citeformer’s

class:

~citeformer.core.Citation / :class:~citeformer.core.Reference types so downstream code can mix Anthropic output with local-backend output in the same pipeline.

Because the enforcement is native, marker_style is advisory on this backend — we render Claude’s citations in the chosen shape for consistency with the rest of citeformer, but the provider itself doesn’t know about marker styles; it emits a structured citation block per assertion.

Prompt caching (cache_control) is on by default for the document blocks. Claude prices cache-read tokens at ~10% of fresh input tokens, so for any RAG pipeline that reuses the same source list across calls the saving is substantial. Disable with use_prompt_cache=False if the documents are one-shot.

True per-block streaming via :meth:stream is wired to the SDK’s messages.stream() context manager — text deltas are batched per block so the citation markers attach to the right block when the block finishes (the per-token delta path doesn’t carry citation info on the wire; you only see citations at content_block_stop).

Requires the anthropic extra: pip install citeformer[anthropic].

Module Contents

Classes

AnthropicBackend

Anthropic Messages API backend with native citation support.

API

class citeformer.backends.anthropic.AnthropicBackend(model: str = _DEFAULT_MODEL, *, client: Any | None = None, async_client: Any | None = None, **client_kwargs: Any)

Bases: citeformer.backends.base.Backend

Anthropic Messages API backend with native citation support.

Attributes: model: Anthropic model id (e.g. "claude-sonnet-4-6"). client: The anthropic.Anthropic client. last_usage: Token-usage payload from the most recent generate() / stream() call. None before the first call. The orchestrator threads this onto :attr:GenerationResult.usage. last_rich_citations: One dict per marker emitted in the most recent call, in left-to-right output order. Each carries the source_id, cited_text (the exact span Claude cited from), source_span (offsets into the source content), and document_title returned by the Citations API. The orchestrator zips this with the parsed marker list and populates :attr:Citation.cited_text / source_span / document_title. Empty list when the call emitted no citations.

Initialization

Construct an Anthropic backend.

Args: model: Anthropic model id supporting Citations (any 3.5+ or Claude 4 family). client: Pre-built anthropic.Anthropic client. If None, one is constructed from the environment (picks up ANTHROPIC_API_KEY). async_client: Pre-built anthropic.AsyncAnthropic client used by :meth:agenerate / :meth:astream (ADR-014). When None, one is built lazily from client_kwargs on the first async call — sync-only callers don’t pay the construction cost. **client_kwargs: Forwarded to Anthropic() / AsyncAnthropic() when the respective client is None.

model: str

None

last_usage: citeformer.core.TokenUsage | None

None

last_rich_citations: list[dict[str, Any]]

None

property client: Any

Lazy anthropic.Anthropic client used by the sync surface.

Built on first access from client_kwargs (or returns the constructor-supplied override). Async-only callers never trigger construction.

property async_client: Any

Lazy anthropic.AsyncAnthropic client used by the async surface.

Built on first access from the same client_kwargs the sync client used. Sync-only callers never trigger construction.

generate(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) str

Call Messages API with citations enabled; flatten to marker-decorated text.

Args: prompt: User prompt. sources: Sources in scope. Each becomes one document block. policy: Citation policy — threaded into the system prompt so Claude sees the caller’s enforcement intent. The provider itself doesn’t have a typed policy, so we rely on the system prompt to shape behaviour. **options: max_tokens (default 1024), temperature (default Anthropic’s own default — passed through only when explicitly supplied), system_prompt (extra system content), marker_style (default BRACKET — advisory, used to render citation markers), use_prompt_cache (default True; sets cache_control: ephemeral on every document block so repeat-source RAG pays cache-read prices on subsequent calls), extra_headers (forwarded to the SDK).

Returns: Flattened text carrying the configured marker style for every assertion Claude cited.

Raises: ValueError: If sources is empty.

async agenerate(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) str

Native-async counterpart of :meth:generate (ADR-014).

Uses self.async_client (the lazy AsyncAnthropic) so concurrent callers don’t tie up executor threads on the SDK’s HTTP wait. Same request-shape construction, prompt-caching, last_usage and last_rich_citations capture as the sync path — only the client call is awaited.

async astream(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) collections.abc.AsyncIterator[str]

Native-async block-level streaming via AsyncAnthropic.messages.stream.

Mirrors :meth:stream exactly but uses the SDK’s async stream context manager (async with ... + async for event in stream

  • await stream.get_final_message()). One yielded chunk per completed text block, with citation markers attached.

Falls back to a single-chunk yield via :meth:agenerate when the async client doesn’t expose stream (test stand-ins that only mock messages.create).

stream(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) collections.abc.Iterator[str]

Stream block-sized chunks via Anthropic’s native messages.stream().

Each yielded chunk corresponds to one finished text block from Claude — text + the marker(s) for any citations attached to that block. Yielding per-block (rather than per-token) is the natural granularity for the Citations API: citation events only arrive at content_block_stop, so per-token text deltas would have to be rewritten in-place when the citations land. The per-block path is honest and produces clean output.

Falls back to the non-streaming path on SDKs that don’t expose messages.stream (very old client versions or test stand-ins that mock only messages.create).

Args: prompt: See :meth:generate. sources: See :meth:generate. policy: See :meth:generate. **options: Same options as :meth:generate.

Yields: Per-block text chunks (each terminated by a single space) carrying any citation markers that landed on the block.