citeformer.backends.openai¶
OpenAI backend — schema-level cite-id enforcement via Structured Outputs.
OpenAI’s response_format={"type": "json_schema", "strict": true, ...} API
lets us hand the model a JSON schema where the source_id field is
constrained to a literal enum of the in-scope source indices. When
strict=true the API rejects any response whose source_id isn’t one of
the enumerated integers — structurally equivalent, at the schema layer, to
what XGrammar does at the logit layer for local backends.
Tier honesty:
Local backends (HF / vLLM / llama.cpp) enforce at the logit layer — a fabricated cite id is token-impossible to sample.
This backend enforces at the schema layer. The provider validates the assistant’s generation against the schema before returning it. Fabrication is structurally impossible in the returned payload, which is what matters for downstream consumers.
Requires the openai extra: pip install citeformer[openai].
Model requirements: the strict: true JSON-schema mode is supported on
gpt-4o-2024-08-06, gpt-4o-mini, and every OpenAI model released
after August 2024. Older models will return a 400 — we surface that
directly rather than silently falling back to non-strict mode.
Module Contents¶
Classes¶
OpenAI chat-completions backend with schema-level cite enforcement. |
API¶
- class citeformer.backends.openai.OpenAIBackend(model: str = _DEFAULT_MODEL, *, client: Any | None = None, async_client: Any | None = None, **client_kwargs: Any)¶
Bases:
citeformer.backends.base.BackendOpenAI chat-completions backend with schema-level cite enforcement.
Requests a structured response of the shape::
{ "segments": [ {"text": "A sentence.", "citations": [1, 2]}, {"text": "Another one.", "citations": [3]} ] }where every
citations[*]integer is enum-constrained to 1..N. The backend then flattens the segments back into a single string carrying the configured :class:~citeformer.core.MarkerStylemarkers (default[N]) — so downstream code (Citeformer orchestrator, verify, render) sees the same shape as local-backend output.Attributes: model: OpenAI model identifier (e.g.
"gpt-4o-mini"). client: The authenticatedopenai.OpenAIclient. last_usage: Token-usage payload from the most recentgenerate()call.Nonebefore the first call. The orchestrator threads this onto :attr:GenerationResult.usage.Initialization
Create an OpenAI backend.
Args: model: Model id supporting strict JSON schema (
gpt-4o-minior later). client: Pre-builtopenai.OpenAIclient used by :meth:generate/ :meth:stream. IfNone, one is built lazily fromclient_kwargson the first sync call (picks upOPENAI_API_KEYfrom env then). async_client: Pre-builtopenai.AsyncOpenAIclient used by :meth:agenerate/ :meth:astream(ADR-014). IfNone, one is built lazily fromclient_kwargson the first async call. Sync-only callers don’t pay the async construction cost; async-only callers don’t pay the sync one. **client_kwargs: Forwarded toopenai.OpenAI()/openai.AsyncOpenAI()when the respective client isNone(base_url,api_key,organization,timeout, …). Useful for pointing at a compatible endpoint (Azure, local LiteLLM, Together, Anyscale).- last_usage: citeformer.core.TokenUsage | None¶
None
- property client: Any¶
Lazy
openai.OpenAIclient used by the sync surface.Built on first access from
client_kwargs(or returns the constructor-supplied override). Async-only callers never trigger construction — important for tests that inject only anasync_clientwithout settingOPENAI_API_KEY.
- property async_client: Any¶
Lazy
openai.AsyncOpenAIclient used by the async surface.Built on first access from the same
client_kwargsthe sync client uses (so a backend pointing atbase_url=...for OpenRouter / Fireworks / Together cascades correctly to the async client too). Sync-only callers never trigger construction.
- generate(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) str¶
Generate text with schema-level citation constraint.
Args: prompt: User prompt; caller is responsible for RAG stitching. sources: Sources in scope — position (1-indexed) becomes the cite enum entry. policy: Citation policy (
REQUIRED/AUTO/QUOTES_ONLY). Threaded into the schema’sdescriptionso the model sees the same enforcement intent it would under local decoding. **options:max_tokens(default 1024),temperature(default 0.7),marker_style(default BRACKET),system_prompt(additional system-role content prepended to the assembled citation instructions).Returns: Flattened text carrying
marker_stylemarkers for every cited source, in document order.Raises: ValueError: If
sourcesis empty.
- async agenerate(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) str¶
Native-async counterpart of :meth:
generate(ADR-014).Uses
self.async_client(the lazyAsyncOpenAI) so concurrent callers don’t tie up executor threads on the SDK’s HTTP wait. The request shape, schema construction, segment flattening, andlast_usageextraction are identical to the sync path — only the client call is awaited. Subclasses (OpenRouter / Fireworks / Together) inherit this unchanged; their_build_response_format/_augment_create_kwargshooks fire from here just like in the sync path.
- stream(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) collections.abc.Iterator[str]¶
Stream segments by yielding each finished sentence with its markers.
OpenAI’s streaming response-format path is richer than we need here — we’re not trying to surface per-token deltas, just complete sentence-level chunks as each segment is validated. The simpler implementation calls :meth:
generateonce and chunks its output on sentence boundaries so downstream consumers ofCiteformer.streamstill see multiple chunks.
- async astream(prompt: str, sources: list[citeformer.core.Source], policy: citeformer.core.Policy, **options: Any) collections.abc.AsyncIterator[str]¶
Native-async counterpart of :meth:
stream(ADR-014).Awaits :meth:
agenerate(uses the async client) and then yields the same sentence-chunked output the sync :meth:streamproduces. Cascades to OpenRouter / Fireworks / Together since they don’t overridestreameither.