citeformer.integrations.langchain¶
LangChain ↔ citeformer adapter.
LangChain’s retrieval story produces List[Document] — each with
page_content (the chunk text) and metadata (free-form dict). To
feed those into Citeformer.generate we need to convert each
Document to a Source with CSL-JSON-shaped metadata.
Duck-typed: we don’t import LangChain at module load, so you can use
these functions with anything that has page_content: str +
metadata: dict attributes — LangChain’s Document, a mock, a
pydantic model, whatever.
Typical usage::
from citeformer import Citeformer
from citeformer.integrations.langchain import sources_from_documents
from citeformer.backends.hf import HFBackend
docs = retriever.get_relevant_documents(query) # LangChain retriever
sources = sources_from_documents(docs)
cf = Citeformer(backend=HFBackend("gpt2"))
result = cf.generate(prompt=query, sources=sources)
If your retrieved documents have rich metadata (a Zotero library, a
Crossref-backed vectorstore), pass metadata_converter= to map from
your custom shape to CSL-JSON. The default converter produces a
minimal-but-valid CSL item ({id, type: 'webpage', title}) from
whatever is in Document.metadata.
Module Contents¶
Functions¶
Fallback conversion from LangChain-style metadata to CSL-JSON. |
|
Convert one LangChain-shaped |
|
Convert an iterable of LangChain documents to citeformer sources. |
Data¶
API¶
- citeformer.integrations.langchain.MetadataConverter¶
None
- citeformer.integrations.langchain.default_metadata_converter(metadata: dict[str, Any]) dict[str, Any]¶
Fallback conversion from LangChain-style metadata to CSL-JSON.
Pulls common keys the LangChain ecosystem uses (
title,source,url,author) and packages them as a minimal CSL-JSON{id, type: 'webpage', title, URL?}item. Unknown keys are kept under_langchain_metadataso downstream code can still access them if needed.
- citeformer.integrations.langchain.source_from_document(document: citeformer.integrations.langchain._DocumentLike, *, metadata_converter: citeformer.integrations.langchain.MetadataConverter | None = None) citeformer.core.Source¶
Convert one LangChain-shaped
Documentinto a citeformerSource.Args: document: Object with
page_content: str+metadata: dictattributes. LangChain’slangchain_core.documents.Documentis the canonical shape; any duck-typed equivalent works. metadata_converter: Optional override for the default CSL-JSON conversion. Signature:(dict) -> dict. Useful when your retrieved documents come from a rich source (Zotero, Crossref-backed vectorstore) and you want to preserve that.Returns: A
Sourcewithcontent = document.page_contentandmetadatashaped as CSL-JSON.Raises: TypeError: If
documentdoesn’t have the expected attributes.
- citeformer.integrations.langchain.sources_from_documents(documents: collections.abc.Iterable[citeformer.integrations.langchain._DocumentLike], *, metadata_converter: citeformer.integrations.langchain.MetadataConverter | None = None) list[citeformer.core.Source]¶
Convert an iterable of LangChain documents to citeformer sources.
Preserves order; downstream citation ids correspond 1:1 with list position, which matches how LangChain retrievers return their relevance-ordered results.