Embeddings & Retrieval¶

SecondBrain ships a pluggable embedding stack with seven providers, two modern API rerankers, query expansion, adaptive top-k, and an offline retrieval-quality eval harness. The default is local-first; API providers must be opted into explicitly.

Providers¶

Name	Selector	Default model	Native dim	Auth	Notes
voyage	`SB_EMBED_PROVIDER=voyage`	`voyage-3-large`	1024	`VOYAGE_API_KEY`	MRL truncation supported
openai	`SB_EMBED_PROVIDER=openai`	`text-embedding-3-large`	1024 (truncated from 3072)	`OPENAI_API_KEY`	MRL via `dimensions` arg
cohere	`SB_EMBED_PROVIDER=cohere`	`embed-v4.0`	1536	`COHERE_API_KEY`	Distinguishes `search_query` vs `search_document`
jina	`SB_EMBED_PROVIDER=jina`	`jina-embeddings-v3`	1024	`JINA_API_KEY`	Task-aware, MRL
bge-m3	`SB_EMBED_PROVIDER=bge-m3`	`BAAI/bge-m3`	1024	(local)	Multilingual; default config target
st	`SB_EMBED_PROVIDER=st`	`RetrievalSettings.embedding_model`	varies	(local)	Sentence-transformers + hash fallback
ollama	`SB_OLLAMA_EMBED_MODEL=…`	per model	varies	(local daemon)	Auto-selected when env set
hash	`SB_EMBED_PROVIDER=hash`	deterministic hash	256	(always available)	CI / offline / debugging

Resolution order¶

SB_EMBED_PROVIDER set → use it (errors if unknown).
SB_OLLAMA_EMBED_MODEL set → Ollama daemon.
Otherwise → local sentence-transformers (or hash fallback if no model loads).

API providers are never auto-selected. This is deliberate: a user with OPENAI_API_KEY set for chat shouldn't be silently charged per ingest.

CLI¶

sb embeddings list                       # show providers + active selection
sb embeddings probe "sample text"        # encode one string with the active provider
sb embeddings benchmark                  # compare providers on retrieval goldens
sb embeddings benchmark --providers voyage,bge-m3,hash --json
sb embeddings benchmark --memory-fixture brain/autotune/fixtures/memory_retrieval.core.json --providers bge-m3,hash --json
sb embeddings migrate --to bge-m3        # drop + recreate Chroma collections at the new dim

Rerankers¶

sb ships three; the default is the local cross-encoder.

SB_RERANKER=cross-encoder    # default; cross-encoder/ms-marco-MiniLM-L-6-v2
SB_RERANKER=cohere           # rerank-v3.5 (multilingual)
SB_RERANKER=voyage           # rerank-2
SB_RERANKER=auto             # picks the first available (cohere → voyage → cross-encoder)

Reranking is enabled by default (RetrievalSettings.reranker_enabled = True) and runs after hybrid fusion. Two-stage retrieval: dense top-50 → rerank → top-k. API rerankers add 200–500ms; cross-encoder adds 100–300ms on CPU.

Query expansion + adaptive top-k¶

RetrievalSettings(
    query_expansion_enabled=True,        # paraphrase the query, RRF-fuse dense results
    query_expansion_max_variants=3,
    adaptive_top_k=True,                 # short factual → 5, multi-hop synthesis → 20
)

Both default to enabled. Short queries (≤4 tokens) skip expansion to avoid latency on factual lookups. The synonym-based fallback runs offline when no LLM provider is supplied.

Eval harness¶

make eval-retrieval                          # runs brain/evals/fixtures/retrieval/seed_goldens.jsonl
sb embeddings benchmark --goldens path/to/your.jsonl
sb embeddings benchmark --memory-fixture path/to/memory-benchmark.json --memory-suite auto --max-cases 500

Goldens are JSONL: each line is {query, relevant_chunk_ids?, relevant_substrings?}. The harness reports recall@5, recall@10, nDCG@10, MRR, and p50/p95 latency per provider.

For memory-focused model selection, pass a memory benchmark fixture instead of retrieval JSONL. --memory-fixture accepts synthetic fixtures and local LoCoMo, LongMemEval, or BEAM dataset paths, converts conversation turns into a stable in-memory corpus, and compares providers against each case's question, support IDs, and gold nuggets. Use the same metric family as public embedding benchmarks: recall/nDCG/MRR for ranking quality, plus query p95 and index latency for runtime fit. External leaderboards such as MTEB, BEIR, and MIRACL are useful shortlists, but promotion should be based on this local memory benchmark because SecondBrain's workload is long-conversation, temporal, and preference-heavy.

Authoring a golden case:

{"id":"travel-ai","query":"AI travel assistant learnings","relevant_substrings":["travel","assistant"]}
{"id":"by-id","query":"specific chunk","relevant_chunk_ids":["vault/note.md#a1b2c3"]}

relevant_chunk_ids gives proper recall@k denominators; relevant_substrings treats any substring hit as a presence match (recall = 1.0 if any returned chunk contains the substring).

Migration¶

Switching providers changes the embedding dimension. Existing Chroma collections are auto-detected and recreated on first query (the dim guard in MemorySemanticIndex and VectorStore). To do it explicitly:

sb embeddings migrate --to voyage   # drops + recreates collections
sb context index                    # re-encode chunk store
sb ingest vault                     # re-encode memory + source-evidence