Skip to content

API Reference

Complete method reference for the Literature Mapper Python API.


LiteratureMapper

The main entry point for corpus management.

from literature_mapper import LiteratureMapper

mapper = LiteratureMapper(corpus_path, model_name="gemini-3-flash-preview")

Processing

Method Parameters Returns Description
process_new_papers() recursive=False ProcessingResult Process unprocessed PDFs
update_citations() email=None None Fetch OpenAlex citation data

Retrieval

Method Parameters Returns Description
get_all_analyses() DataFrame All papers as DataFrame
get_paper_by_id() paper_id: int dict Single paper details
search_papers() column: str, query: str DataFrame Keyword search
search_corpus() query, semantic, use_enhanced, min_year, max_year, node_types, limit list[dict] Semantic search
get_statistics() CorpusStats Corpus counts

Normalization

Method Parameters Returns Description
normalize_authors() mappings: dict int Merge author aliases
normalize_concepts() mappings: dict int Merge concept aliases
get_canonical_author() alias: str str Resolve author alias
get_canonical_concept() alias: str str Resolve concept alias

Temporal

Method Parameters Returns Description
get_concept_timeline() concept=None, top_n=10 DataFrame Concept temporal data
compute_temporal_stats() None Compute trend statistics
get_trending_concepts() direction, limit DataFrame Rising/declining concepts
detect_concept_eras() gap: int DataFrame Revival detection

Genealogy

Method Parameters Returns Description
build_genealogy() verbose=False dict Infer paper relationships
find_contradictions() concept=None DataFrame CHALLENGES edges
get_argument_evolution() concept: str DataFrame Temporal relationship trace

Synthesis

Method Parameters Returns Description
synthesize_answer() question, year_range=None, limit=10 str RAG answer synthesis
validate_hypothesis() hypothesis: str, limit=10 dict Hypothesis evaluation

Export

Method Parameters Returns Description
export_to_csv() output_path: str None Export to CSV

GhostHunter

Gap detection for missing papers and authors.

from literature_mapper.ghosts import GhostHunter

hunter = GhostHunter(mapper)
Method Parameters Returns Description
find_bibliographic_ghosts() threshold=3 DataFrame Missing cited papers
find_missing_authors() threshold=5 DataFrame Missing cited authors

CorpusAnalyzer

Corpus-level statistics and analytics.

from literature_mapper.analysis import CorpusAnalyzer

analyzer = CorpusAnalyzer(corpus_path)
Method Parameters Returns Description
get_year_distribution() DataFrame Papers per year
get_top_authors() limit=10 DataFrame Most prolific authors
get_top_concepts() limit=10 DataFrame Most frequent concepts
find_hub_papers() limit=10 DataFrame Most referenced papers

Visualization

from literature_mapper.viz import export_to_gexf

export_to_gexf(corpus_path, output_path, mode="semantic", threshold=0.05)
Parameter Type Default Description
corpus_path str Path to corpus directory
output_path str Output .gexf file path
mode str "semantic" Graph type: semantic, authors, concepts, river, similarity
threshold float 0.05 Minimum edge weight (for similarity mode)

Data Classes

ProcessingResult

@dataclass
class ProcessingResult:
    processed: int    # Number of successfully processed PDFs
    skipped: int      # Already in database
    failed: int       # Processing errors

CorpusStats

@dataclass
class CorpusStats:
    total_papers: int
    total_authors: int
    total_concepts: int