Search & Retrieval¶
Find evidence in your corpus using keyword matching or semantic search by meaning.
Keyword Search¶
Case-insensitive substring match on specific columns:
# Find papers using survey methodology
survey_papers = mapper.search_papers(column='methodology', query='survey')
print(f"Found {len(survey_papers)} papers using survey methodology:")
for _, row in survey_papers.head(3).iterrows():
print(f" • {row['title'][:60]}... ({row['year']})")
Semantic Search¶
Semantic search uses embedding vectors to find content by meaning, not exact keywords.
results = mapper.search_corpus(
query="influence of social ties on information diffusion",
semantic=True,
limit=5
)
for r in results:
print(f"[{r['match_score']:.3f}] {r['title']} ({r['year']})")
When to Use Semantic Search
Use semantic search when you're looking for conceptually related content that may not use your exact terminology. For example, searching for "network effects" might find papers discussing "social contagion" or "peer influence."
Enhanced Retrieval¶
Enhanced mode adds two powerful features:
| Feature | Description |
|---|---|
| MMR Reranking | Maximal Marginal Relevance ensures diverse results, not just the most similar |
| Consensus Grouping | Identifies when multiple papers make the same claim |
enhanced_results = mapper.search_corpus(
query="social contagion vs homophily",
semantic=True,
use_enhanced=True,
node_types=["finding", "limitation", "method", "hypothesis"],
limit=5
)
for r in enhanced_results:
print(f"[{r['match_score']:.2f}] {r['title']}...")
print(f" Type: {r['node_type']}")
print(f" Context: {r['match_context'][:100]}...\n")
Filtering by Node Type¶
Control which knowledge graph node types appear in results:
| Node Type | What It Captures |
|---|---|
paper |
Paper titles and abstracts |
finding |
Key results and claims |
method |
Research methodologies |
limitation |
Acknowledged weaknesses |
hypothesis |
Theoretical propositions |
Year-Range Filtering¶
Scope any search to specific time periods:
# Only retrieve evidence from 2010-2020
recent_results = mapper.search_corpus(
query="network centrality",
semantic=True,
min_year=2010,
max_year=2020,
limit=5
)
for r in recent_results:
print(f"{r['year']}: {r['title']}")
This is especially useful for:
- Comparing historical vs. recent perspectives
- Focusing on foundational works only
- Analyzing how discourse changed over time