Skip to content

Embeddings & Retrieval

SecondBrain ships a pluggable embedding stack with seven providers, two modern API rerankers, query expansion, adaptive top-k, and an offline retrieval-quality eval harness. The default is local-first; API providers must be opted into explicitly.

Providers

Name Selector Default model Native dim Auth Notes
voyage SB_EMBED_PROVIDER=voyage voyage-3-large 1024 VOYAGE_API_KEY MRL truncation supported
openai SB_EMBED_PROVIDER=openai text-embedding-3-large 1024 (truncated from 3072) OPENAI_API_KEY MRL via dimensions arg
cohere SB_EMBED_PROVIDER=cohere embed-v4.0 1536 COHERE_API_KEY Distinguishes search_query vs search_document
jina SB_EMBED_PROVIDER=jina jina-embeddings-v3 1024 JINA_API_KEY Task-aware, MRL
bge-m3 SB_EMBED_PROVIDER=bge-m3 BAAI/bge-m3 1024 (local) Multilingual; default config target
st SB_EMBED_PROVIDER=st RetrievalSettings.embedding_model varies (local) Sentence-transformers + hash fallback
ollama SB_OLLAMA_EMBED_MODEL=… per model varies (local daemon) Auto-selected when env set
hash SB_EMBED_PROVIDER=hash deterministic hash 256 (always available) CI / offline / debugging

Resolution order

  1. SB_EMBED_PROVIDER set → use it (errors if unknown).
  2. SB_OLLAMA_EMBED_MODEL set → Ollama daemon.
  3. Otherwise → local sentence-transformers (or hash fallback if no model loads).

API providers are never auto-selected. This is deliberate: a user with OPENAI_API_KEY set for chat shouldn't be silently charged per ingest.

CLI

sb embeddings list                       # show providers + active selection
sb embeddings probe "sample text"        # encode one string with the active provider
sb embeddings benchmark                  # compare providers on retrieval goldens
sb embeddings benchmark --providers voyage,bge-m3,hash --json
sb embeddings benchmark --memory-fixture brain/autotune/fixtures/memory_retrieval.core.json --providers bge-m3,hash --json
sb embeddings migrate --to bge-m3        # drop + recreate Chroma collections at the new dim

Rerankers

sb ships three; the default is the local cross-encoder.

SB_RERANKER=cross-encoder    # default; cross-encoder/ms-marco-MiniLM-L-6-v2
SB_RERANKER=cohere           # rerank-v3.5 (multilingual)
SB_RERANKER=voyage           # rerank-2
SB_RERANKER=auto             # picks the first available (cohere → voyage → cross-encoder)

Reranking is enabled by default (RetrievalSettings.reranker_enabled = True) and runs after hybrid fusion. Two-stage retrieval: dense top-50 → rerank → top-k. API rerankers add 200–500ms; cross-encoder adds 100–300ms on CPU.

Query expansion + adaptive top-k

RetrievalSettings(
    query_expansion_enabled=True,        # paraphrase the query, RRF-fuse dense results
    query_expansion_max_variants=3,
    adaptive_top_k=True,                 # short factual → 5, multi-hop synthesis → 20
)

Both default to enabled. Short queries (≤4 tokens) skip expansion to avoid latency on factual lookups. The synonym-based fallback runs offline when no LLM provider is supplied.

Eval harness

make eval-retrieval                          # runs brain/evals/fixtures/retrieval/seed_goldens.jsonl
sb embeddings benchmark --goldens path/to/your.jsonl
sb embeddings benchmark --memory-fixture path/to/memory-benchmark.json --memory-suite auto --max-cases 500

Goldens are JSONL: each line is {query, relevant_chunk_ids?, relevant_substrings?}. The harness reports recall@5, recall@10, nDCG@10, MRR, and p50/p95 latency per provider.

For memory-focused model selection, pass a memory benchmark fixture instead of retrieval JSONL. --memory-fixture accepts synthetic fixtures and local LoCoMo, LongMemEval, or BEAM dataset paths, converts conversation turns into a stable in-memory corpus, and compares providers against each case's question, support IDs, and gold nuggets. Use the same metric family as public embedding benchmarks: recall/nDCG/MRR for ranking quality, plus query p95 and index latency for runtime fit. External leaderboards such as MTEB, BEIR, and MIRACL are useful shortlists, but promotion should be based on this local memory benchmark because SecondBrain's workload is long-conversation, temporal, and preference-heavy.

Authoring a golden case:

{"id":"travel-ai","query":"AI travel assistant learnings","relevant_substrings":["travel","assistant"]}
{"id":"by-id","query":"specific chunk","relevant_chunk_ids":["vault/note.md#a1b2c3"]}

relevant_chunk_ids gives proper recall@k denominators; relevant_substrings treats any substring hit as a presence match (recall = 1.0 if any returned chunk contains the substring).

Migration

Switching providers changes the embedding dimension. Existing Chroma collections are auto-detected and recreated on first query (the dim guard in MemorySemanticIndex and VectorStore). To do it explicitly:

sb embeddings migrate --to voyage   # drops + recreates collections
sb context index                    # re-encode chunk store
sb ingest vault                     # re-encode memory + source-evidence