Retrieval Surfaces
over everything you've written down.
A searchable layer over your contracts, tickets, code, decks, and Slack. We build the ingestion, the embedding, the graph, and the prompt scaffolding — and we tune it against your team's actual queries, not synthetic eval sets.
RAG is easy. Retrieval is the hard part.
The default stack — chunk, embed, cosine — answers about 70% of real queries well enough. The other 30% — the queries your team actually struggles with — need re-rankers, hybrid scoring, query rewriting, and a graph for relations. That's where we live.
What we actually build.
Ingest & chunk
Connectors for Confluence, Notion, Drive, Slack, GitHub, S3. Chunking strategies that respect document structure — not blind fixed windows.
Embed & index
Sparse + dense in one index. Per-domain embeddings when off-the-shelf doesn't generalize. We benchmark before we commit.
Re-rank
Cross-encoders calibrated to your judgment data. The single biggest unlock for hard queries — and the most under-shipped step in real RAG systems.
Graph relations
When 'who works on what' or 'which contract supersedes which' matters, we add a property graph alongside vectors. Joined at query time.
Query rewriting
An LLM rewrites the user's query into 1–3 retrieval-friendly forms. Doubles recall on noisy queries; cheap to run.
Eval & tune
Judgment data from your real users, golden queries, A/B harness. Tuning doesn't end at launch — it starts.