citeformer.metadata.arxiv¶
arXiv metadata fetcher.
arXiv’s export API returns Atom XML, not JSON, so we parse it ourselves and
translate to CSL-JSON. The abstract is returned as the abstract field
(which Source.from_arxiv pops into content).
Module Contents¶
Functions¶
Fetch CSL-JSON metadata for an arXiv paper. |
API¶
- citeformer.metadata.arxiv.fetch_arxiv(arxiv_id: str, *, timeout: float = _DEFAULT_TIMEOUT, use_cache: bool = True) dict[str, Any]¶
Fetch CSL-JSON metadata for an arXiv paper.
Args: arxiv_id: arXiv identifier (e.g.
"2305.14627"). Accepts URL,arxiv:, and versioned ("2305.14627v2") forms; version suffix is stripped. timeout: HTTP timeout in seconds. use_cache: Cache the CSL-JSON under~/.cache/citeformer/metadata/.Returns: CSL-JSON item dict with an extra
abstractkey carrying the paper abstract (useful asSource.content).Raises: ValueError: If arXiv returns no entry for the id. httpx.HTTPStatusError: On HTTP errors.