Plugging into the SecondBrain Memory API¶

The SecondBrain Memory API is the durable memory + grounded-retrieval layer agents plug into. AI you can audit: every response carries a Citation envelope so callers can trace every claim back to its source chunk. This page is the adoption guide. Read it once; copy the curl + MCP examples; you're live.

Brand promise: every memory-bearing response carries a Citation envelope (chunk_hash, source_path, anchor, text_span, score, retrieved_at). If a Memory API response can't show its work, treat it as a contract violation.

There are two ways to plug in:

Surface	When to use	Latency profile
MCP (model context protocol)	Inside an LLM agent (Claude Code, Cursor, Codex, custom)	Fast — same process as the agent if local; round-trip if HTTP MCP
HTTP (`/v1/*`)	Anything else: scripts, services, custom apps, agents in other runtimes	One TCP round-trip per call

Both surfaces speak the same /v1/ Memory API contract; pick the one that fits your runtime. Detailed contract: contracts/memory_api_v1.yaml (in the repo root). Quality numbers: docs/QUALITY.md.

1. Get the Memory API running¶

The fastest path is Docker:

git clone https://github.com/contextosai/SecondBrain-collab.git
cd SecondBrain-collab
SB_SERVE_TOKEN=$(openssl rand -hex 24) make quickstart-docker

That builds the image (~2.7 GB, CPU-only torch), starts the daemon at http://localhost:8765, polls /health until green, and prints the bearer token plus a curl example. The image runs on macOS / Linux / Windows (WSL2). Image source: repo-root Dockerfile.

If you'd rather run from source:

pip install -e ".[reranker]"
export SB_SERVE_TOKEN=$(openssl rand -hex 24)
sb serve --host 0.0.0.0 --port 8765

2. Plug into Claude Code (MCP)¶

Claude Code reads MCP server config from .claude/settings.json in the project root (or globally from ~/.claude/settings.json). Add:

{
  "mcpServers": {
    "secondbrain": {
      "command": ".venv/bin/python",
      "args": ["-m", "brain.mcp.cc_server"]
    }
  }
}

Restart Claude Code. The Memory API's 8 production MCP tools (secondbrain_recall, secondbrain_ask, secondbrain_ingest, secondbrain_pack, secondbrain_open_loops, secondbrain_shravan_add, secondbrain_manan_reflect, secondbrain_nididhyasan_implement, secondbrain_knowledge_status, secondbrain_knowledge_review, secondbrain_decision_extract, secondbrain_meeting_extract, secondbrain_grounded_answer) are now in Claude's tool list. Every memory-bearing response includes the Citation envelope as a structured JSON block at the bottom of the tool output:

{
  "query": "migration plan",
  "citations": [
    {
      "chunk_hash": "abc1234567890def",
      "source_path": "/vault/01_projects/launch.md",
      "anchor": "## Migration",
      "text_span": "the migration begins on may 12...",
      "score": 0.91,
      "retrieved_at": "2026-05-08T10:14:22Z"
    }
  ]
}

Smoke test from a shell:

printf '{"jsonrpc":"2.0","id":"1","method":"tools/list","params":{}}\n' \
  | python -m brain.mcp.cc_server

You should see all eight tools listed.

3. Plug in over HTTP (`/v1/*`)¶

Anything that speaks HTTP works. Headers:

Authorization: Bearer ${SB_SERVE_TOKEN}
Content-Type: application/json

Hybrid retrieval (`POST /v1/memory/recall`)¶

curl -s \
  -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"query":"migration plan","top_k":5}' \
  http://localhost:8765/v1/memory/recall

Returns:

{
  "query": "migration plan",
  "results": [
    {
      "content": "...",
      "citation": {
        "chunk_hash": "abc1234567890def",
        "source_path": "/vault/01_projects/launch.md",
        "anchor": "## Migration",
        "text_span": "...",
        "score": 0.91,
        "retrieved_at": "2026-05-08T10:14:22Z"
      },
      "source_type": "hybrid"
    }
  ]
}

Closed-corpus QA (`POST /v1/grounded/answer`)¶

curl -s \
  -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"question":"when does the migration begin","top_k":3}' \
  http://localhost:8765/v1/grounded/answer

Returns the answer + citations[] (≥ 1 required by the citation density gate) + trajectory_id for replay + a citation_gate field that reports whether the gate passed:

{
  "answer": "The migration begins on may 12 [chunk1].",
  "citations": [...],
  "trajectory_id": "traj_abc123",
  "citation_gate": {"passed": true, "reason": "ok", "min_citations": 1, "actual_citations": 3, "strict": true},
  "termination_reason": "completed"
}

If citation_gate.passed == false, the answer is rewritten with an [citation_gate:insufficient_evidence] marker. Trust this — the Memory API won't lie about not having evidence.

Recording a decision (`POST /v1/decisions`)¶

curl -s \
  -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "key":"memory_api.adoption.demo",
    "title":"Adopt the SecondBrain Memory API",
    "rationale":"Auditable retrieval is non-negotiable for our use case.",
    "alternatives":["build in-house"],
    "impacted_entities":["memory_api","engineering"],
    "project":"adoption"
  }' \
  http://localhost:8765/v1/decisions

The Idempotency-Key header is required on writes — POST the same key twice and you get the same record back, not a duplicate.

Forget content (`POST /v1/memory/forget`)¶

The Memory API is not write-only — agents can unindex over the wire. Forget by source path:

curl -s \
  -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"source_path":"/vault/01_projects/old.md"}' \
  http://localhost:8765/v1/memory/forget

Or by content-addressed chunk_hash (the same hash returned in every Citation envelope):

curl -s \
  -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"chunk_hash":"abc1234567890def"}' \
  http://localhost:8765/v1/memory/forget

Returns:

{
  "forgotten_at": "2026-05-08T11:02:14Z",
  "forgotten_count": 7,
  "per_dimension": {"source_path": 7},
  "source_path": "/vault/01_projects/old.md",
  "chunk_hash": null
}

Idempotent — forgetting content that doesn't exist returns forgotten_count: 0 rather than an error. Both dimensions may be supplied in one call.

Streaming grounded answers (`POST /v1/grounded/answer/stream`)¶

Same input shape as the non-streaming route, but the response is Server-Sent Events that surface lifecycle events as the multi-step loop runs — useful when you want to show a user that retrieval is happening while the final answer is still being assembled.

curl -N \
  -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"question":"when does the migration begin","top_k":3}' \
  http://localhost:8765/v1/grounded/answer/stream

Sequence on the wire:

event: run_started
data: {"trajectory_id":"traj_abc","prompt":"…","ts":...}

event: step
data: {"step_index":1,"kind":"retrieve","retrieval":{...}}

event: step
data: {"step_index":2,"kind":"observe","content":"…"}

event: run_complete
data: {"trajectory_id":"traj_abc","final_answer":"…","citations":[…],
       "citation_gate":{…}}

The final run_complete payload is equivalent to the body returned by the non-streaming route. Use the non-streaming /grounded/answer for simpler integration; use the streaming variant when felt latency or audit visibility matter.

Trajectory replay (`GET /v1/audit/event_log?trajectory_id=`)¶

curl -s \
  -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
  "http://localhost:8765/v1/audit/event_log?trajectory_id=traj_abc123"

Returns the full sequence of retrieval calls, tool invocations, and intermediate reasoning steps that produced the answer. This is the literal auditability promise — every grounded answer can be re-played.

4. The full surface¶

HTTP	MCP tool	Purpose
`POST /v1/memory/recall`	`secondbrain_recall`	Hybrid retrieval over the workspace vault
`POST /v1/memory/ingest`	`secondbrain_ingest`	Add file or text to the vault and index it
`POST /v1/memory/forget`	`secondbrain_forget`	Unindex by `source_path` or `chunk_hash` (idempotent)
`POST /v1/memory/pack`	`secondbrain_pack`	Build a bounded ContextPack for an intent
`POST /v1/memory/assimilation/shravan`	`secondbrain_shravan_add`	Capture source-aware intake
`POST /v1/memory/assimilation/manan`	`secondbrain_manan_reflect`	Reflect on captured knowledge
`POST /v1/memory/assimilation/nididhyasan`	`secondbrain_nididhyasan_implement`	Convert reflection into practice or a memory proposal
`GET /v1/memory/assimilation/status`	`secondbrain_knowledge_status`	Show knowledge maturity status
`GET /v1/memory/assimilation/review`	`secondbrain_knowledge_review`	Review items needing reflection or practice
`POST /v1/grounded/answer`	`secondbrain_grounded_answer`	Closed-corpus QA with citation gate
`POST /v1/grounded/eval`	`secondbrain_grounded_eval`	Score an agent output against expected nuggets
`GET /v1/decisions`	`secondbrain_decisions_list`	List decision records, namespace + cursor
`GET /v1/decisions/{ref}`	`secondbrain_decisions_get`	Fetch a single decision
`POST /v1/decisions`	`secondbrain_decisions_record`	Record a new decision (idempotent)
`POST /v1/decisions/extract`	`secondbrain_decision_extract`	Parse decisions from markdown without persisting
`POST /v1/meetings/extract`	`secondbrain_meeting_extract`	Run the meeting copilot on a transcript
`GET /v1/open_loops`	`secondbrain_open_loops`	List unresolved TODO/OPENLOOP markers
`GET /v1/audit/event_log`	`secondbrain_audit`	Tail event log; trajectory replay via `?trajectory_id=`

Anything outside this list is internal-only and may change. Don't build against /chat, /sessions, /work, etc. — they exist for the reference UI, not for Memory API consumers.

5. Multi-tenancy (optional, for hosted / team use)¶

By default the Memory API is single-tenant — one SB_SERVE_TOKEN, one vault, one state directory. For team or hosted deployments, set SB_MULTI_TENANT=1 and use the workspace CLI:

SB_MULTI_TENANT=1 sb workspace create team-platform
SB_MULTI_TENANT=1 sb workspace token issue --workspace team-platform --actor alice
# → ws_abc123def456...   (raw token shown ONCE; SHA-256 hash is stored)

Each workspace gets its own state dir and vault under state/<workspace>/ and vault/<workspace>/. The token determines the workspace; callers can't cross workspaces by header. Architecture detail: docs/archive/planning/multi_tenancy_design.md.

6. What you're committing to¶

Adopting the Memory API means:

Citations on every memory result. The chunk_hash is content-addressed (16-hex SHA-256 of normalised chunk text). Two ingests of the same content produce the same hash. Use it to dedupe, cache, or verify provenance.
Trajectory replay for grounded answers. Every answer keeps an audit log of retrieval calls + tool invocations. If a user asks "where did this come from", you can show them.
The /v1/ contract is stable. Breaking changes ship as /v2/. Additive changes, including new optional fields and routes, ship under /v1/.
Quality numbers are public. docs/QUALITY.md is reproducible via make eval-memory-api. If we regress on nDCG@10 < 0.85 we treat it as a bug.

7. When something breaks¶

401 Unauthorized — missing or wrong bearer. Check ${SB_SERVE_TOKEN}.
400 Idempotency-Key header is required — write routes need the header. Use a fresh UUID per logical write.
citation_gate.passed: false — grounded answer didn't have enough evidence. Either ingest more relevant content or accept the insufficient-evidence marker.
Empty results array — vault is empty for this query. POST /v1/memory/ingest first.
Container crash-loop — see docker compose logs secondbrain. The most common cause is a .dockerignore excluding brain/ paths; the shipped one is verified.

If you hit something else, open an issue with the trajectory_id from the response — that's enough for us to replay your call from our side.

8. Going further¶

docs/MEMORY_API.md — full Memory API v1 reference
docs/QUALITY.md — published quality scorecard + methodology
docs/archive/planning/multi_tenancy_design.md — Phase 3 architecture (planning draft, not yet implemented)
contracts/memory_api_v1.yaml — OpenAPI 3.1 spec (in the repo root)
tests/memory/test_memory_api_v1_e2e.py — single round-trip integration test exercising the core Memory API routes

Plugging into the SecondBrain Memory API¶

1. Get the Memory API running¶

2. Plug into Claude Code (MCP)¶

3. Plug in over HTTP (/v1/*)¶

Hybrid retrieval (POST /v1/memory/recall)¶

Closed-corpus QA (POST /v1/grounded/answer)¶

Recording a decision (POST /v1/decisions)¶

Forget content (POST /v1/memory/forget)¶

Streaming grounded answers (POST /v1/grounded/answer/stream)¶

Trajectory replay (GET /v1/audit/event_log?trajectory_id=)¶