Plugging into the SecondBrain Memory API¶
The SecondBrain Memory API is the durable memory + grounded-retrieval
layer agents plug into. AI you can audit: every response carries a
Citation envelope so callers can trace every claim back to its source
chunk. This page is the adoption guide. Read it once; copy the curl + MCP
examples; you're live.
Brand promise: every memory-bearing response carries a
Citationenvelope (chunk_hash,source_path,anchor,text_span,score,retrieved_at). If a Memory API response can't show its work, treat it as a contract violation.
There are two ways to plug in:
| Surface | When to use | Latency profile |
|---|---|---|
| MCP (model context protocol) | Inside an LLM agent (Claude Code, Cursor, Codex, custom) | Fast — same process as the agent if local; round-trip if HTTP MCP |
HTTP (/v1/*) |
Anything else: scripts, services, custom apps, agents in other runtimes | One TCP round-trip per call |
Both surfaces speak the same /v1/ Memory API contract; pick the one that fits your
runtime. Detailed contract: contracts/memory_api_v1.yaml (in the repo root).
Quality numbers: docs/QUALITY.md.
1. Get the Memory API running¶
The fastest path is Docker:
git clone https://github.com/contextosai/SecondBrain.git
cd SecondBrain
SB_SERVE_TOKEN=$(openssl rand -hex 24) make quickstart-docker
That builds the image (~2.7 GB, CPU-only torch), starts the daemon at
http://localhost:8765, polls /health until green, and prints the bearer
token plus a curl example. The image runs on macOS / Linux / Windows
(WSL2). Image source: repo-root Dockerfile.
If you'd rather run from source:
pip install -e ".[reranker]"
export SB_SERVE_TOKEN=$(openssl rand -hex 24)
sb serve --host 0.0.0.0 --port 8765
2. Plug into Claude Code (MCP)¶
Claude Code reads MCP server config from .claude/settings.json in the
project root (or globally from ~/.claude/settings.json). Add:
{
"mcpServers": {
"secondbrain": {
"command": ".venv/bin/python",
"args": ["-m", "brain.mcp.cc_server"]
}
}
}
Restart Claude Code. The Memory API's 8 production MCP tools (secondbrain_recall,
secondbrain_ask, secondbrain_ingest, secondbrain_pack, secondbrain_open_loops,
secondbrain_shravan_add, secondbrain_manan_reflect,
secondbrain_nididhyasan_implement, secondbrain_knowledge_status,
secondbrain_knowledge_review,
secondbrain_decision_extract, secondbrain_meeting_extract, secondbrain_grounded_answer)
are now in Claude's tool list. Every memory-bearing response includes the
Citation envelope as a structured JSON block at the bottom of the tool
output:
{
"query": "migration plan",
"citations": [
{
"chunk_hash": "abc1234567890def",
"source_path": "/vault/01_projects/launch.md",
"anchor": "## Migration",
"text_span": "the migration begins on may 12...",
"score": 0.91,
"retrieved_at": "2026-05-08T10:14:22Z"
}
]
}
Smoke test from a shell:
printf '{"jsonrpc":"2.0","id":"1","method":"tools/list","params":{}}\n' \
| python -m brain.mcp.cc_server
You should see all eight tools listed.
3. Plug in over HTTP (/v1/*)¶
Anything that speaks HTTP works. Headers:
Hybrid retrieval (POST /v1/memory/recall)¶
curl -s \
-H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"query":"migration plan","top_k":5}' \
http://localhost:8765/v1/memory/recall
Returns:
{
"query": "migration plan",
"results": [
{
"content": "...",
"citation": {
"chunk_hash": "abc1234567890def",
"source_path": "/vault/01_projects/launch.md",
"anchor": "## Migration",
"text_span": "...",
"score": 0.91,
"retrieved_at": "2026-05-08T10:14:22Z"
},
"source_type": "hybrid"
}
]
}
Closed-corpus QA (POST /v1/grounded/answer)¶
curl -s \
-H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"question":"when does the migration begin","top_k":3}' \
http://localhost:8765/v1/grounded/answer
Returns the answer + citations[] (≥ 1 required by the citation density
gate) + trajectory_id for replay + a citation_gate field that reports
whether the gate passed:
{
"answer": "The migration begins on may 12 [chunk1].",
"citations": [...],
"trajectory_id": "traj_abc123",
"citation_gate": {"passed": true, "reason": "ok", "min_citations": 1, "actual_citations": 3, "strict": true},
"termination_reason": "completed"
}
If citation_gate.passed == false, the answer is rewritten with an
[citation_gate:insufficient_evidence] marker. Trust this — the Memory
API won't lie about not having evidence.
Recording a decision (POST /v1/decisions)¶
curl -s \
-H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: $(uuidgen)" \
-d '{
"key":"memory_api.adoption.demo",
"title":"Adopt the SecondBrain Memory API",
"rationale":"Auditable retrieval is non-negotiable for our use case.",
"alternatives":["build in-house"],
"impacted_entities":["memory_api","engineering"],
"project":"adoption"
}' \
http://localhost:8765/v1/decisions
The Idempotency-Key header is required on writes — POST the same
key twice and you get the same record back, not a duplicate.
Forget content (POST /v1/memory/forget)¶
The Memory API is not write-only — agents can unindex over the wire. Forget by source path:
curl -s \
-H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"source_path":"/vault/01_projects/old.md"}' \
http://localhost:8765/v1/memory/forget
Or by content-addressed chunk_hash (the same hash returned in every
Citation envelope):
curl -s \
-H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"chunk_hash":"abc1234567890def"}' \
http://localhost:8765/v1/memory/forget
Returns:
{
"forgotten_at": "2026-05-08T11:02:14Z",
"forgotten_count": 7,
"per_dimension": {"source_path": 7},
"source_path": "/vault/01_projects/old.md",
"chunk_hash": null
}
Idempotent — forgetting content that doesn't exist returns
forgotten_count: 0 rather than an error. Both dimensions may be
supplied in one call.
Streaming grounded answers (POST /v1/grounded/answer/stream)¶
Same input shape as the non-streaming route, but the response is Server-Sent Events that surface lifecycle events as the multi-step loop runs — useful when you want to show a user that retrieval is happening while the final answer is still being assembled.
curl -N \
-H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"question":"when does the migration begin","top_k":3}' \
http://localhost:8765/v1/grounded/answer/stream
Sequence on the wire:
event: run_started
data: {"trajectory_id":"traj_abc","prompt":"…","ts":...}
event: step
data: {"step_index":1,"kind":"retrieve","retrieval":{...}}
event: step
data: {"step_index":2,"kind":"observe","content":"…"}
event: run_complete
data: {"trajectory_id":"traj_abc","final_answer":"…","citations":[…],
"citation_gate":{…}}
The final run_complete payload is equivalent to the body returned by
the non-streaming route. Use the non-streaming /grounded/answer for
simpler integration; use the streaming variant when felt latency or
audit visibility matter.
Trajectory replay (GET /v1/audit/event_log?trajectory_id=)¶
curl -s \
-H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
"http://localhost:8765/v1/audit/event_log?trajectory_id=traj_abc123"
Returns the full sequence of retrieval calls, tool invocations, and intermediate reasoning steps that produced the answer. This is the literal auditability promise — every grounded answer can be re-played.
4. The full surface¶
| HTTP | MCP tool | Purpose |
|---|---|---|
POST /v1/memory/recall |
secondbrain_recall |
Hybrid retrieval over the workspace vault |
POST /v1/memory/ingest |
secondbrain_ingest |
Add file or text to the vault and index it |
POST /v1/memory/forget |
secondbrain_forget |
Unindex by source_path or chunk_hash (idempotent) |
POST /v1/memory/pack |
secondbrain_pack |
Build a bounded ContextPack for an intent |
POST /v1/memory/assimilation/shravan |
secondbrain_shravan_add |
Capture source-aware intake |
POST /v1/memory/assimilation/manan |
secondbrain_manan_reflect |
Reflect on captured knowledge |
POST /v1/memory/assimilation/nididhyasan |
secondbrain_nididhyasan_implement |
Convert reflection into practice or a memory proposal |
GET /v1/memory/assimilation/status |
secondbrain_knowledge_status |
Show knowledge maturity status |
GET /v1/memory/assimilation/review |
secondbrain_knowledge_review |
Review items needing reflection or practice |
POST /v1/grounded/answer |
secondbrain_grounded_answer |
Closed-corpus QA with citation gate |
POST /v1/grounded/eval |
secondbrain_grounded_eval |
Score an agent output against expected nuggets |
GET /v1/decisions |
secondbrain_decisions_list |
List decision records, namespace + cursor |
GET /v1/decisions/{ref} |
secondbrain_decisions_get |
Fetch a single decision |
POST /v1/decisions |
secondbrain_decisions_record |
Record a new decision (idempotent) |
POST /v1/decisions/extract |
secondbrain_decision_extract |
Parse decisions from markdown without persisting |
POST /v1/meetings/extract |
secondbrain_meeting_extract |
Run the meeting copilot on a transcript |
GET /v1/open_loops |
secondbrain_open_loops |
List unresolved TODO/OPENLOOP markers |
GET /v1/audit/event_log |
secondbrain_audit |
Tail event log; trajectory replay via ?trajectory_id= |
Anything outside this list is internal-only and may change. Don't
build against /chat, /sessions, /work, etc. — they exist for the
reference UI, not for Memory API consumers.
5. Multi-tenancy (optional, for hosted / team use)¶
By default the Memory API is single-tenant — one SB_SERVE_TOKEN, one
vault, one state directory. For team or hosted deployments, set
SB_MULTI_TENANT=1 and use the workspace CLI:
SB_MULTI_TENANT=1 sb workspace create team-platform
SB_MULTI_TENANT=1 sb workspace token issue --workspace team-platform --actor alice
# → ws_abc123def456... (raw token shown ONCE; SHA-256 hash is stored)
Each workspace gets its own state dir and vault under
state/<workspace>/ and vault/<workspace>/. The token determines the
workspace; callers can't cross workspaces by header. Architecture detail:
docs/archive/planning/multi_tenancy_design.md.
6. What you're committing to¶
Adopting the Memory API means:
- Citations on every memory result. The
chunk_hashis content-addressed (16-hex SHA-256 of normalised chunk text). Two ingests of the same content produce the same hash. Use it to dedupe, cache, or verify provenance. - Trajectory replay for grounded answers. Every answer keeps an audit log of retrieval calls + tool invocations. If a user asks "where did this come from", you can show them.
- The
/v1/contract is stable. Breaking changes ship as/v2/. Additive changes, including new optional fields and routes, ship under/v1/. - Quality numbers are public.
docs/QUALITY.mdis reproducible viamake eval-memory-api. If we regress onnDCG@10 < 0.85we treat it as a bug.
7. When something breaks¶
401 Unauthorized— missing or wrong bearer. Check${SB_SERVE_TOKEN}.400 Idempotency-Key header is required— write routes need the header. Use a fresh UUID per logical write.citation_gate.passed: false— grounded answer didn't have enough evidence. Either ingest more relevant content or accept the insufficient-evidence marker.- Empty
resultsarray — vault is empty for this query.POST /v1/memory/ingestfirst. - Container crash-loop — see
docker compose logs secondbrain. The most common cause is a.dockerignoreexcludingbrain/paths; the shipped one is verified.
If you hit something else, open an issue with the trajectory_id from the
response — that's enough for us to replay your call from our side.
8. Going further¶
docs/MEMORY_API.md— full Memory API v1 referencedocs/QUALITY.md— published quality scorecard + methodologydocs/archive/planning/multi_tenancy_design.md— Phase 3 architecture (planning draft, not yet implemented)contracts/memory_api_v1.yaml— OpenAPI 3.1 spec (in the repo root)tests/memory/test_memory_api_v1_e2e.py— single round-trip integration test exercising the core Memory API routes