Skip to content

Search & Retrieval

Find evidence in your corpus using keyword matching or semantic search by meaning.


Case-insensitive substring match on specific columns:

# Find papers using survey methodology
survey_papers = mapper.search_papers(column='methodology', query='survey')

print(f"Found {len(survey_papers)} papers using survey methodology:")
for _, row in survey_papers.head(3).iterrows():
    print(f"  • {row['title'][:60]}... ({row['year']})")

Semantic search uses embedding vectors to find content by meaning, not exact keywords.

results = mapper.search_corpus(
    query="influence of social ties on information diffusion",
    semantic=True,
    limit=5
)

for r in results:
    print(f"[{r['match_score']:.3f}] {r['title']} ({r['year']})")

When to Use Semantic Search

Use semantic search when you're looking for conceptually related content that may not use your exact terminology. For example, searching for "network effects" might find papers discussing "social contagion" or "peer influence."


Enhanced Retrieval

Enhanced mode adds two powerful features:

Feature Description
MMR Reranking Maximal Marginal Relevance ensures diverse results, not just the most similar
Consensus Grouping Identifies when multiple papers make the same claim
enhanced_results = mapper.search_corpus(
    query="social contagion vs homophily",
    semantic=True,
    use_enhanced=True,
    node_types=["finding", "limitation", "method", "hypothesis"],
    limit=5
)

for r in enhanced_results:
    print(f"[{r['match_score']:.2f}] {r['title']}...")
    print(f"  Type: {r['node_type']}")
    print(f"  Context: {r['match_context'][:100]}...\n")

Filtering by Node Type

Control which knowledge graph node types appear in results:

Node Type What It Captures
paper Paper titles and abstracts
finding Key results and claims
method Research methodologies
limitation Acknowledged weaknesses
hypothesis Theoretical propositions

Year-Range Filtering

Scope any search to specific time periods:

# Only retrieve evidence from 2010-2020
recent_results = mapper.search_corpus(
    query="network centrality",
    semantic=True,
    min_year=2010,
    max_year=2020,
    limit=5
)

for r in recent_results:
    print(f"{r['year']}: {r['title']}")

This is especially useful for:

  • Comparing historical vs. recent perspectives
  • Focusing on foundational works only
  • Analyzing how discourse changed over time