citeformer.metadata.arxiv

arXiv metadata fetcher.

arXiv’s export API returns Atom XML, not JSON, so we parse it ourselves and translate to CSL-JSON. The abstract is returned as the abstract field (which Source.from_arxiv pops into content).

Module Contents

Functions

fetch_arxiv

Fetch CSL-JSON metadata for an arXiv paper.

API

citeformer.metadata.arxiv.fetch_arxiv(arxiv_id: str, *, timeout: float = _DEFAULT_TIMEOUT, use_cache: bool = True) dict[str, Any]

Fetch CSL-JSON metadata for an arXiv paper.

Args: arxiv_id: arXiv identifier (e.g. "2305.14627"). Accepts URL, arxiv:, and versioned ("2305.14627v2") forms; version suffix is stripped. timeout: HTTP timeout in seconds. use_cache: Cache the CSL-JSON under ~/.cache/citeformer/metadata/.

Returns: CSL-JSON item dict with an extra abstract key carrying the paper abstract (useful as Source.content).

Raises: ValueError: If arXiv returns no entry for the id. httpx.HTTPStatusError: On HTTP errors.