Memory¶

SecondBrain has a first-class persistent memory subsystem exposed through sb memory.

Purpose¶

inspect what long-term and working memory contain
review and approve memory writes
run maintenance such as consolidation, synthesis, decay, and dedup
export or import memory state when needed

Who It Is For¶

operators using SecondBrain across repeated sessions
contributors working on runtime, memory, or retrieval behavior
anyone debugging why the system remembered or forgot something

Main Command Groups¶

sb memory list
sb memory search "rollback"
sb memory context "incident response"
sb memory show <memory_id>
sb memory evidence search "incident transcript"
sb memory evidence show <evidence_id>
sb memory evidence stats
sb memory add
sb memory edit <memory_id>
sb memory delete <memory_id>
sb memory pin <memory_id>
sb memory unpin <memory_id>
sb memory review-list
sb memory review-show <review_id>
sb memory review-apply <review_id>
sb memory stats
sb memory evals
sb memory benchmark generate --suite synthetic --style longmemeval --cases 25 --out out/memory-benchmark.json
sb memory benchmark run --dataset-path out/memory-benchmark.json --top-k 10,20,50 --json
sb memory benchmark report --run-id <eval_run_id> --json
sb memory timeline

The live CLI also includes maintenance and lifecycle commands such as maintenance, consolidate, synthesize, decay, dedup, export, import, working, episodes, and events.

Sleep-time background sessions can also feed this review queue: a governed sleep-time pass reflects on recent history, runs memory synthesis in the background, and enqueues proposed durable-memory updates here instead of applying them automatically.

Memory Model¶

The current implementation centers on three practical layers:

working memory for session-scoped scratch state
episodic memory for chronological event streams
long-term memory for curated durable records
source evidence for verbatim snippets with provenance anchors

The SQLite store is authoritative. Semantic retrieval is an index layer over that durable state. Memory links act as a lightweight graph layer during retrieval: semantic and keyword hits can pull in one-hop linked memories so bridge facts, provenance, and support chains are available even when they do not directly match the query. Source evidence is kept separate from long-term memory: captures and chat turns can write recall snippets, but only distilled facts, preferences, and habits enter durable memory through the existing governance and review flow.

Governance Flywheel¶

Long-term memories now carry an explicit governance envelope in metadata. New writes normalize confidence, provenance, source type, and governance version so operators can distinguish directly stated memories from synthesized or review-backed records. Retrieval debug output includes a why_used rationale for each selected long-term memory, including match type, score components, decayed confidence, provenance coverage, and contradiction flags.

Contradictions are surfaced before durable writes. Review-queue proposals are annotated with candidate conflicts, and sleep-time consolidation only enqueues memory-write proposals; it does not silently apply them.

Run governance evals with:

sb memory evals
sb memory evals --json

Run memory benchmark evals with:

sb memory benchmark generate --suite synthetic --style longmemeval --cases 25 --out out/memory-benchmark.json
sb memory benchmark run --dataset-path out/memory-benchmark.json --top-k 10,20,50 --json
sb memory benchmark datasets list
sb memory benchmark datasets download --suite locomo --out data/memory-benchmarks

Benchmark runs use an isolated temporary memory database and persist summaries through the shared eval store as suites such as memory.synthetic, memory.locomo, memory.longmemeval, and memory.beam. Public dataset adapters read local JSON/JSONL paths only; they do not download benchmark data. The deterministic path reports answer exact match, F1, BLEU-1 (B1), judge proxy score (J), retrieval F1/Jaccard/MRR/nDCG, latency, token savings, compression ratio, throughput, cost estimates, and break-even turns.

For public benchmark runs, download the benchmark data yourself and point the runner at the local file or directory:

sb memory benchmark datasets download --suite locomo --out data/memory-benchmarks
sb memory benchmark datasets download --suite longmemeval --variant longmemeval-s --out data/memory-benchmarks
sb memory benchmark datasets download --suite beam --variant beam-100k --out data/memory-benchmarks

sb memory benchmark run --dataset-path data/memory-benchmarks/locomo/locomo10.json --top-k 10,20,50,200 --json
sb memory benchmark run --dataset-path data/memory-benchmarks/longmemeval/longmemeval_s_cleaned.json --top-k 10,20,50,200 --json
sb memory benchmark run --dataset-path data/memory-benchmarks/beam/100K --max-cases 400 --top-k 10,20,50,200 --json

run defaults to --suite auto, which detects downloaded dataset manifests, known benchmark paths, and synthetic fixture payloads. Pass --suite locomo, --suite longmemeval, --suite beam, or --suite synthetic to override that inference.

To choose an embedding backend for memory retrieval, run the same fixtures through the embedding benchmark:

sb embeddings benchmark --memory-fixture brain/autotune/fixtures/memory_retrieval.core.json --providers bge-m3,hash --json
sb embeddings benchmark --memory-fixture data/memory-benchmarks/locomo/locomo10.json --memory-suite locomo --max-cases 500 --providers voyage,openai,bge-m3 --json

This isolates the embedding model's ranking behavior over memory turns before a full sb memory benchmark run measures the end-to-end memory pipeline.

The LoCoMo adapter maps text QA and event-summary records while preserving multimodal fields as metadata for v1 scoring. The LongMemEval adapter maps timestamped haystack sessions, turn-level has_answer evidence, answer-session IDs, knowledge-update, temporal, preference, multi-session, and abstention cases. The BEAM downloader pulls the official repository chat JSON tree for the selected size bucket (beam-100k, beam-500k, beam-1m, or beam-10m); beam-500k, beam-1m, beam-10m, and LongMemEval_M require --include-large. The BEAM adapter preserves scale buckets, ability types, source refs, and gold nuggets for large-tier operator runs.

The evals report provenance coverage, explicit-confidence coverage, low decayed-confidence memories, contradiction links, unresolved active contradictions, pending review volume, and pending sleep-time proposals.

Recommended Operator Flow¶

Inspect¶

sb memory list
sb memory search "deployment rollback"
sb memory evidence search "deployment rollback"
sb memory timeline

Review¶

sb memory review-list
sb memory review-show <review_id>
sb memory review-apply <review_id>

Maintain¶

sb memory stats
sb memory evals
sb memory consolidate
sb memory dedup
sb memory decay

Operational Notes¶

Prefer review and approval flows for durable updates when the memory source is uncertain.
Use sb memory evidence search when you need a verbatim snippet, source anchor, or recall-index diagnostic.
Use pinning for records that should resist decay and rank higher during retrieval.
Use memory links for relationship facts that make multi-hop recall possible; keep contradicts links for governance and ranking, not for automatic graph expansion.
Use export/import for migration or inspection work, not as the default daily path.
When memory behavior affects answer quality, also inspect sb data-agent status and sb context compile.
Use sb memory benchmark run when changing memory retrieval, scoring, or context-budget behavior; it gives repeatable synthetic pressure without mutating production memory.

Implementation Pointers¶

Key code paths:

brain/memory/store.py
brain/memory/retriever.py
brain/memory/consolidator.py
brain/cli/memory.py