citeformer.backends.fireworks¶
Fireworks AI backend — true logit-tier enforcement on a hosted API via GBNF.
Fireworks AI exposes a native response_format={"type": "grammar", "grammar": "<GBNF>"} mode (docs <https://docs.fireworks.ai/structured-responses/structured-output-grammar-based>_)
that compiles the supplied GBNF and uses it to mask logits at every
decode step inside the Fireworks runtime — the same mechanism XGrammar
uses inside the local HF backend, just running on Fireworks’s servers
instead of yours. This is the cleanest “true logit-tier on a hosted
API” backend possible: we drop the existing :func:citeformer.grammar. build_grammar output in unchanged and the same cite-id rule that
constrains local sampling now constrains Fireworks’s sampling.
Tier honesty: this is logit-tier, not schema-tier. A fabricated
cite id is token-impossible to sample inside the Fireworks runtime —
exactly the same guarantee as HFBackend / VLLMBackend /
LlamaCppBackend, just with the masking running on a managed
GPU instead of yours.
Implementation note: Fireworks is wire-compatible with OpenAI’s chat
completions API, so this backend subclasses :class:OpenAIBackend and
overrides only two hooks — :meth:_build_response_format (swap
strict-JSON for grammar mode) and :meth:_decode_response_text (the
grammar response is plain text with inline markers, not a segments
JSON object, so flattening is a no-op). All the messaging assembly,
streaming pseudo-stream, and last_usage extraction are inherited
unchanged.
Requires the fireworks extra: pip install citeformer[fireworks]
(re-uses the openai SDK; no Fireworks-specific client needed).
Module Contents¶
Classes¶
Fireworks AI backend with native GBNF logit-tier enforcement. |
Data¶
API¶
- citeformer.backends.fireworks.DEFAULT_BASE_URL¶
- class citeformer.backends.fireworks.FireworksBackend(model: str = _DEFAULT_MODEL, *, client: Any | None = None, async_client: Any | None = None, api_key: str | None = None, base_url: str = DEFAULT_BASE_URL, **client_kwargs: Any)¶
Bases:
citeformer.backends.openai.OpenAIBackendFireworks AI backend with native GBNF logit-tier enforcement.
Attributes: model: Fireworks model id (
accounts/fireworks/models/<name>). client: The underlyingopenai.OpenAIclient, configured with Fireworks’s base URL.Initialization
Construct a Fireworks backend.
Args: model: Fireworks model id (
accounts/fireworks/models/...). Default isllama-v3p1-8b-instruct— small, serverless, supports grammar mode. See https://fireworks.ai/models for the live catalogue. client: Pre-builtopenai.OpenAIclient (already pointed at Fireworks). WhenNone, one is constructed using the other arguments. api_key: Fireworks API key. WhenNone, falls back toFIREWORKS_API_KEYfrom the environment, then toclient_kwargs["api_key"]if supplied. base_url: Fireworks API base URL. Override only for staging / proxy testing. **client_kwargs: Forwarded toopenai.OpenAI(**kwargs)whenclientisNone(timeout,max_retries, …).