`citeformer.verify.sentences`¶

Sentence splitter for verification paths.

Verification needs to identify per-sentence char spans so that:

Each Citation can be associated with “the sentence containing this marker”.
Uncited sentences can be scored against every source for coverage checks.

We avoid heavy NLP dependencies (nltk with punkt download, spacy) and emit spans via a small regex-based splitter. This handles the common cases — ASCII and Unicode terminators, multiple terminators (!?, !!), abbreviations common enough to skip (Dr., et al., e.g., i.e.). It will mis-split on exotic cases (abbreviated initials in names, URLs with dots); that’s an accepted limitation for v0.1.

Trade-off discussion lives in the verification docs (docs/verification.md#limitations).

Module Contents¶

Classes¶

SentenceSpan

One sentence extracted from a text, carrying its char offsets.

Functions¶

`split_sentences`	Split `text` into sentence spans.
`sentence_containing`	Return the `SentenceSpan` containing `char_offset`, or None if not found.
`strip_citation_markers`	Remove `[N]` style citation markers from `text`.

API¶

class citeformer.verify.sentences.SentenceSpan¶

One sentence extracted from a text, carrying its char offsets.

Attributes: index: 0-indexed position among the sentences in the source text. start: Inclusive char offset into the original text. end: Exclusive char offset into the original text. text: The sentence slice (stripped of leading/trailing whitespace).

index: int¶: None

start: int¶: None

end: int¶: None

text: str¶: None

citeformer.verify.sentences.split_sentences(text: str) → list[citeformer.verify.sentences.SentenceSpan]¶

Split text into sentence spans.

Spans are returned in source order and cover the full text (modulo leading / trailing whitespace). Empty / whitespace-only inputs return an empty list.

Args: text: The text to split.

Returns: A list of SentenceSpan records.

citeformer.verify.sentences.sentence_containing(spans: list[citeformer.verify.sentences.SentenceSpan], char_offset: int) → citeformer.verify.sentences.SentenceSpan | None¶

Return the SentenceSpan containing char_offset, or None if not found.

Handy for mapping a Citation.span to the sentence it belongs to.

citeformer.verify.sentences.strip_citation_markers(text: str) → str¶

Remove [N] style citation markers from text.

Leading spaces before the marker are consumed to avoid leaving double-spaces. Preserves trailing punctuation.

citeformer.verify.sentences¶

Module Contents¶

Classes¶

Functions¶

API¶

`citeformer.verify.sentences`¶