Embeddings & Retrieval¶
SecondBrain ships a pluggable embedding stack with seven providers, two modern API rerankers, query expansion, adaptive top-k, and an offline retrieval-quality eval harness. The default is local-first; API providers must be opted into explicitly.
Providers¶
| Name | Selector | Default model | Native dim | Auth | Notes |
|---|---|---|---|---|---|
| voyage | SB_EMBED_PROVIDER=voyage |
voyage-3-large |
1024 | VOYAGE_API_KEY |
MRL truncation supported |
| openai | SB_EMBED_PROVIDER=openai |
text-embedding-3-large |
1024 (truncated from 3072) | OPENAI_API_KEY |
MRL via dimensions arg |
| cohere | SB_EMBED_PROVIDER=cohere |
embed-v4.0 |
1536 | COHERE_API_KEY |
Distinguishes search_query vs search_document |
| jina | SB_EMBED_PROVIDER=jina |
jina-embeddings-v3 |
1024 | JINA_API_KEY |
Task-aware, MRL |
| bge-m3 | SB_EMBED_PROVIDER=bge-m3 |
BAAI/bge-m3 |
1024 | (local) | Multilingual; default config target |
| st | SB_EMBED_PROVIDER=st |
RetrievalSettings.embedding_model |
varies | (local) | Sentence-transformers + hash fallback |
| ollama | SB_OLLAMA_EMBED_MODEL=… |
per model | varies | (local daemon) | Auto-selected when env set |
| hash | SB_EMBED_PROVIDER=hash |
deterministic hash | 256 | (always available) | CI / offline / debugging |
Resolution order¶
SB_EMBED_PROVIDERset → use it (errors if unknown).SB_OLLAMA_EMBED_MODELset → Ollama daemon.- Otherwise → local sentence-transformers (or hash fallback if no model loads).
API providers are never auto-selected. This is deliberate: a user with
OPENAI_API_KEY set for chat shouldn't be silently charged per ingest.
CLI¶
sb embeddings list # show providers + active selection
sb embeddings probe "sample text" # encode one string with the active provider
sb embeddings benchmark # compare providers on retrieval goldens
sb embeddings benchmark --providers voyage,bge-m3,hash --json
sb embeddings benchmark --memory-fixture brain/autotune/fixtures/memory_retrieval.core.json --providers bge-m3,hash --json
sb embeddings migrate --to bge-m3 # drop + recreate Chroma collections at the new dim
Rerankers¶
sb ships three; the default is the local cross-encoder.
SB_RERANKER=cross-encoder # default; cross-encoder/ms-marco-MiniLM-L-6-v2
SB_RERANKER=cohere # rerank-v3.5 (multilingual)
SB_RERANKER=voyage # rerank-2
SB_RERANKER=auto # picks the first available (cohere → voyage → cross-encoder)
Reranking is enabled by default (RetrievalSettings.reranker_enabled = True)
and runs after hybrid fusion. Two-stage retrieval: dense top-50 → rerank →
top-k. API rerankers add 200–500ms; cross-encoder adds 100–300ms on CPU.
Query expansion + adaptive top-k¶
RetrievalSettings(
query_expansion_enabled=True, # paraphrase the query, RRF-fuse dense results
query_expansion_max_variants=3,
adaptive_top_k=True, # short factual → 5, multi-hop synthesis → 20
)
Both default to enabled. Short queries (≤4 tokens) skip expansion to avoid latency on factual lookups. The synonym-based fallback runs offline when no LLM provider is supplied.
Eval harness¶
make eval-retrieval # runs brain/evals/fixtures/retrieval/seed_goldens.jsonl
sb embeddings benchmark --goldens path/to/your.jsonl
sb embeddings benchmark --memory-fixture path/to/memory-benchmark.json --memory-suite auto --max-cases 500
Goldens are JSONL: each line is {query, relevant_chunk_ids?, relevant_substrings?}.
The harness reports recall@5, recall@10, nDCG@10, MRR, and p50/p95 latency
per provider.
For memory-focused model selection, pass a memory benchmark fixture instead of
retrieval JSONL. --memory-fixture accepts synthetic fixtures and local
LoCoMo, LongMemEval, or BEAM dataset paths, converts conversation turns into a
stable in-memory corpus, and compares providers against each case's question,
support IDs, and gold nuggets. Use the same metric family as public embedding
benchmarks: recall/nDCG/MRR for ranking quality, plus query p95 and index
latency for runtime fit. External leaderboards such as MTEB, BEIR, and MIRACL
are useful shortlists, but promotion should be based on this local memory
benchmark because SecondBrain's workload is long-conversation, temporal, and
preference-heavy.
Authoring a golden case:
{"id":"travel-ai","query":"AI travel assistant learnings","relevant_substrings":["travel","assistant"]}
{"id":"by-id","query":"specific chunk","relevant_chunk_ids":["vault/note.md#a1b2c3"]}
relevant_chunk_ids gives proper recall@k denominators; relevant_substrings
treats any substring hit as a presence match (recall = 1.0 if any returned
chunk contains the substring).
Migration¶
Switching providers changes the embedding dimension. Existing Chroma
collections are auto-detected and recreated on first query (the dim guard
in MemorySemanticIndex and VectorStore). To do it explicitly: