Memory¶
SecondBrain has a first-class persistent memory subsystem exposed through sb memory.
Purpose¶
- inspect what long-term and working memory contain
- review and approve memory writes
- run maintenance such as consolidation, synthesis, decay, and dedup
- export or import memory state when needed
Who It Is For¶
- operators using SecondBrain across repeated sessions
- contributors working on runtime, memory, or retrieval behavior
- anyone debugging why the system remembered or forgot something
Main Command Groups¶
sb memory list
sb memory search "rollback"
sb memory context "incident response"
sb memory show <memory_id>
sb memory evidence search "incident transcript"
sb memory evidence show <evidence_id>
sb memory evidence stats
sb memory add
sb memory edit <memory_id>
sb memory delete <memory_id>
sb memory pin <memory_id>
sb memory unpin <memory_id>
sb memory review-list
sb memory review-show <review_id>
sb memory review-apply <review_id>
sb memory stats
sb memory evals
sb memory benchmark generate --suite synthetic --style longmemeval --cases 25 --out out/memory-benchmark.json
sb memory benchmark run --dataset-path out/memory-benchmark.json --top-k 10,20,50 --json
sb memory benchmark report --run-id <eval_run_id> --json
sb memory timeline
The live CLI also includes maintenance and lifecycle commands such as maintenance, consolidate, synthesize, decay, dedup, export, import, working, episodes, and events.
Sleep-time background sessions can also feed this review queue: a governed sleep-time pass reflects on recent history, runs memory synthesis in the background, and enqueues proposed durable-memory updates here instead of applying them automatically.
Memory Model¶
The current implementation centers on three practical layers:
- working memory for session-scoped scratch state
- episodic memory for chronological event streams
- long-term memory for curated durable records
- source evidence for verbatim snippets with provenance anchors
The SQLite store is authoritative. Semantic retrieval is an index layer over that durable state. Memory links act as a lightweight graph layer during retrieval: semantic and keyword hits can pull in one-hop linked memories so bridge facts, provenance, and support chains are available even when they do not directly match the query. Source evidence is kept separate from long-term memory: captures and chat turns can write recall snippets, but only distilled facts, preferences, and habits enter durable memory through the existing governance and review flow.
Governance Flywheel¶
Long-term memories now carry an explicit governance envelope in metadata. New
writes normalize confidence, provenance, source type, and governance version so
operators can distinguish directly stated memories from synthesized or
review-backed records. Retrieval debug output includes a why_used rationale
for each selected long-term memory, including match type, score components,
decayed confidence, provenance coverage, and contradiction flags.
Contradictions are surfaced before durable writes. Review-queue proposals are annotated with candidate conflicts, and sleep-time consolidation only enqueues memory-write proposals; it does not silently apply them.
Run governance evals with:
Run memory benchmark evals with:
sb memory benchmark generate --suite synthetic --style longmemeval --cases 25 --out out/memory-benchmark.json
sb memory benchmark run --dataset-path out/memory-benchmark.json --top-k 10,20,50 --json
sb memory benchmark datasets list
sb memory benchmark datasets download --suite locomo --out data/memory-benchmarks
Benchmark runs use an isolated temporary memory database and persist summaries
through the shared eval store as suites such as memory.synthetic,
memory.locomo, memory.longmemeval, and memory.beam. Public dataset
adapters read local JSON/JSONL paths only; they do not download benchmark data.
The deterministic path reports answer exact match, F1, BLEU-1 (B1), judge
proxy score (J), retrieval F1/Jaccard/MRR/nDCG, latency, token savings,
compression ratio, throughput, cost estimates, and break-even turns.
For public benchmark runs, download the benchmark data yourself and point the runner at the local file or directory:
sb memory benchmark datasets download --suite locomo --out data/memory-benchmarks
sb memory benchmark datasets download --suite longmemeval --variant longmemeval-s --out data/memory-benchmarks
sb memory benchmark datasets download --suite beam --variant beam-100k --out data/memory-benchmarks
sb memory benchmark run --dataset-path data/memory-benchmarks/locomo/locomo10.json --top-k 10,20,50,200 --json
sb memory benchmark run --dataset-path data/memory-benchmarks/longmemeval/longmemeval_s_cleaned.json --top-k 10,20,50,200 --json
sb memory benchmark run --dataset-path data/memory-benchmarks/beam/100K --max-cases 400 --top-k 10,20,50,200 --json
run defaults to --suite auto, which detects downloaded dataset manifests,
known benchmark paths, and synthetic fixture payloads. Pass --suite locomo,
--suite longmemeval, --suite beam, or --suite synthetic to override that
inference.
To choose an embedding backend for memory retrieval, run the same fixtures through the embedding benchmark:
sb embeddings benchmark --memory-fixture brain/autotune/fixtures/memory_retrieval.core.json --providers bge-m3,hash --json
sb embeddings benchmark --memory-fixture data/memory-benchmarks/locomo/locomo10.json --memory-suite locomo --max-cases 500 --providers voyage,openai,bge-m3 --json
This isolates the embedding model's ranking behavior over memory turns before a
full sb memory benchmark run measures the end-to-end memory pipeline.
The LoCoMo adapter maps text QA and event-summary records while preserving
multimodal fields as metadata for v1 scoring. The LongMemEval adapter maps
timestamped haystack sessions, turn-level has_answer evidence, answer-session
IDs, knowledge-update, temporal, preference, multi-session, and abstention
cases. The BEAM downloader pulls the official repository chat JSON tree for the
selected size bucket (beam-100k, beam-500k, beam-1m, or beam-10m);
beam-500k, beam-1m, beam-10m, and LongMemEval_M require
--include-large. The BEAM adapter preserves scale buckets, ability types,
source refs, and gold nuggets for large-tier operator runs.
The evals report provenance coverage, explicit-confidence coverage, low decayed-confidence memories, contradiction links, unresolved active contradictions, pending review volume, and pending sleep-time proposals.
Recommended Operator Flow¶
Inspect¶
sb memory list
sb memory search "deployment rollback"
sb memory evidence search "deployment rollback"
sb memory timeline
Review¶
Maintain¶
Operational Notes¶
- Prefer review and approval flows for durable updates when the memory source is uncertain.
- Use
sb memory evidence searchwhen you need a verbatim snippet, source anchor, or recall-index diagnostic. - Use pinning for records that should resist decay and rank higher during retrieval.
- Use memory links for relationship facts that make multi-hop recall possible;
keep
contradictslinks for governance and ranking, not for automatic graph expansion. - Use export/import for migration or inspection work, not as the default daily path.
- When memory behavior affects answer quality, also inspect
sb data-agent statusandsb context compile. - Use
sb memory benchmark runwhen changing memory retrieval, scoring, or context-budget behavior; it gives repeatable synthetic pressure without mutating production memory.
Implementation Pointers¶
Key code paths:
brain/memory/store.pybrain/memory/retriever.pybrain/memory/consolidator.pybrain/cli/memory.py