/neuronio ›› services ›› 03 · retrieval-surfaces

Retrieval Surfaces
over everything you've written down.

vector · graph · hybrid · tuned

A searchable layer over your contracts, tickets, code, decks, and Slack. We build the ingestion, the embedding, the graph, and the prompt scaffolding — and we tune it against your team's actual queries, not synthetic eval sets.

// 01 intent

RAG is easy. Retrieval is the hard part.

The default stack — chunk, embed, cosine — answers about 70% of real queries well enough. The other 30% — the queries your team actually struggles with — need re-rankers, hybrid scoring, query rewriting, and a graph for relations. That's where we live.

// 02 capabilities

What we actually build.

▣

Ingest & chunk

Connectors for Confluence, Notion, Drive, Slack, GitHub, S3. Chunking strategies that respect document structure — not blind fixed windows.

unstructuredllamaparse

▤

Embed & index

Sparse + dense in one index. Per-domain embeddings when off-the-shelf doesn't generalize. We benchmark before we commit.

turbopufferqdrant

▦

Re-rank

Cross-encoders calibrated to your judgment data. The single biggest unlock for hard queries — and the most under-shipped step in real RAG systems.

cohere-rerankbge-reranker

▥

Graph relations

When 'who works on what' or 'which contract supersedes which' matters, we add a property graph alongside vectors. Joined at query time.

neo4jkuzu

▧

Query rewriting

An LLM rewrites the user's query into 1–3 retrieval-friendly forms. Doubles recall on noisy queries; cheap to run.

claude-haiku

▨

Eval & tune

Judgment data from your real users, golden queries, A/B harness. Tuning doesn't end at launch — it starts.

braintrustranx

// 03 artifact

A peek at real output.

query-trace · 'who owns the renewal for acme-eu?'↻ neuronio.ai

// query rewrites (parallel) REWRITE q1 = "acme-eu contract renewal owner" REWRITE q2 = "renewal contact for acme europe gmbh" REWRITE q3 = "who handles acme-eu account renewals" // retrieval — hybrid sparse+dense, top-30 each SEARCH index=contracts hits=28 SEARCH index=crm hits=19 SEARCH index=slack hits=42 // graph join — pull contract→account→owner CYPHER MATCH (c:Contract {region:"EU"})-[:FOR]->(a:Account {name:"Acme"}) -[:OWNED_BY]->(p:Person) RETURN p, c.expires_at // → 1 row · Maya Chen · expires 2026-08-14 // re-rank top-89 → top-12 RERANK model="bge-reranker-v2-m3" ms=94 // answer with provenance ANSWER "Maya Chen owns Acme-EU renewals; current contract expires Aug 14, 2026." CITE [contract/CTR-2024-EU-118 §1.3, crm/account/acme-eu]

// 04 deliverables

What lands in your repo.

Ingestion layer

Connectors, chunkers, dedupers, refresh schedule. Survives source drift; runs in your VPC.

Index & graph

Vector index plus property graph (when warranted), with the join layer between them.

Query layer

Rewriter, retriever, reranker, prompter — separable and replaceable. No magic blob.

Eval harness

Judgment data, golden queries, regression suite. We hand you the data, not just the score.

Operator UI

An internal tool to inspect retrievals, tag bad answers, and feed them back into the eval set.

// 05 questions

Things people actually ask.

Q-01Why not just use ChatGPT Enterprise / Glean / a built-in?+

Sometimes you should — and we'll tell you so in week one. Where we win is when the answers your team needs require domain re-ranking, structured relations, or unusual sources.

Q-02How do you keep the index fresh?+

Source-aware deltas, not full reindex. Slack and tickets are streamed; docs are polled with versioning. Stale-on-purpose is a config, not an accident.

Q-03Can we run this on-prem / in our VPC?+

Yes. The whole stack — embeddings, index, graph, LLM if needed — can live in your VPC. We've shipped fully air-gapped variants.

Q-04What about access control?+

ACLs are honored at retrieval time, not at answer time. The model never sees a chunk the asker can't read. We test this with red-team queries before launch.

Q-05How big can it scale?+

Largest production index we run is ~80M chunks. The bottleneck is usually re-ranker throughput, not vector recall. We size for it up front.

Tell us the work. We'll tell you the agent.

Open a Channel → All Services ↘

Retrieval Surfacesover everything you've written down.