citeformer.verify.nli¶
Natural-language-inference backend for verification.
We wrap a DeBERTa-v3 MNLI model via transformers. The model is lazy-loaded
on first entail() call, cached globally per (model_name, device) so multiple
Verifier instances share weights. Batched scoring is the common path —
single-pair calls funnel through the batched API with a one-element batch.
Default model: MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli
(~850 MB; well-tested on scientific claims). Override via the nli_model
kwarg on Verifier. A smaller / faster default can be swapped in at build
time by setting the CITEFORMER_NLI_MODEL env var.
Long premises (>512 tokens) can be chunked (opt-in): we slide a
fixed-size window over the premise, score each chunk against the
hypothesis, and take the maximum entailment as the pair’s result. That
surfaces claim-to-source entailment that lives past the first 512
tokens — useful when scoring against full PDF body text. But max-over-
windows also inflates false positives on unrelated claims (each extra
window is another chance for noise to cross the threshold), so we keep
it off by default for score stability and enable it explicitly via
chunk_premise=True when the caller wants long-document scoring.
When chunking is on, consider raising threshold on the Verifier
(0.7–0.8 rather than 0.5) to compensate for the max-reduction bias.
Requires the verify extra: pip install citeformer[verify].
Module Contents¶
Classes¶
Data¶
API¶
- citeformer.verify.nli.DEFAULT_NLI_MODEL¶
‘get(…)’
- class citeformer.verify.nli.NLIResult¶
One NLI scoring outcome for a (premise, hypothesis) pair.
Attributes: entailment: Probability of the
entailmentclass in [0, 1]. neutral: Probability of theneutralclass. contradiction: Probability of thecontradictionclass.
- class citeformer.verify.nli.NLIModel(model_name: str = DEFAULT_NLI_MODEL, *, device: str | None = None, batch_size: int = 8, chunk_premise: bool = False, max_premise_tokens: int = _DEFAULT_MAX_PREMISE_TOKENS, chunk_stride: int = _DEFAULT_CHUNK_STRIDE)¶
DeBERTa-v3-MNLI (or drop-in compatible) NLI scorer.
Instances are cheap to construct; weights are loaded on first
entail(). The transformers model + tokenizer are cached globally per (model_name, device) viafunctools.lru_cacheso multipleNLIModelinstances with identical config share a single GPU residence.Attributes: model_name: HuggingFace model identifier. device: Torch device (
cuda/mps/cpu) resolved at construction. batch_size: Max pairs to score in a single forward pass. chunk_premise: WhenTrue, long premises are split into overlapping windows; max entailment across windows is the pair’s result. Default isFalse— max-over-windows inflates false positives on unrelated claims. Enable for long-document scoring with a bumpedthresholdon the Verifier (0.7+) to compensate. max_premise_tokens: Window size in tokens. Default 400 (leaves room for the hypothesis + special tokens inside DeBERTa’s 512 cap). chunk_stride: Token stride between windows. Default 300; overlap = max_premise_tokens - stride.Initialization
Construct an
NLIModel.Args: model_name: HF identifier (e.g.
"MoritzLaurer/DeBERTa-…"). device:Noneauto-detects CUDA > MPS > CPU. batch_size: Max pairs per forward pass; adjust down on low-VRAM hardware. chunk_premise: IfTrue(default), long premises are chunked and scored with max-entailment reduction. Set toFalsefor raw truncation atmax_premise_tokens + hypothesis. max_premise_tokens: Window size when chunking. 400 is a safe default under DeBERTa’s 512-token limit. chunk_stride: Stride between windows. Lower = more overlap = slower but more thorough.Raises: ImportError: If
citeformer[verify]extras aren’t installed. ValueError: Ifchunk_stride >= max_premise_tokens(would make windows non-overlapping or skip content).- entail(premise: str, hypothesis: str) citeformer.verify.nli.NLIResult¶
Score a single (premise, hypothesis) pair.
Uses chunked scoring when
chunk_premiseis enabled and the premise is long enough to benefit.Args: premise: The evidence / source text. hypothesis: The claim being checked against the premise.
Returns: An
NLIResultwith per-class probabilities.
- entail_batch(pairs: list[tuple[str, str]]) list[citeformer.verify.nli.NLIResult]¶
Score a list of (premise, hypothesis) pairs in batches.
Empty input returns an empty list. Uses chunked scoring when the model’s
chunk_premiseis True; otherwise falls back to the naive 512-token truncation path.Args: pairs: A list of
(premise, hypothesis)tuples.Returns: Results in the same order as input.