Skip to content

Agent Operability

SecondBrain is designed to be worked on by coding agents under human direction. Keep the repo legible enough that an agent can discover context, make a scoped change, validate it, and leave evidence for review without relying on private chat history.

Operating Principles

Principle Repo behavior
Humans steer; agents execute. Put intent, constraints, acceptance criteria, and verification in the task or plan before widening the implementation.
AGENTS.md routes; docs explain. Keep AGENTS.md short and point agents to docs/INDEX.md, contributor docs, and testing guidance.
Failures become harness pressure. Promote repeated or important failures into replay cases, eval cases, improvement records, docs, or mechanical checks.
Runtime evidence beats log hunting. Prefer trace IDs, session IDs, quality run IDs, replay case IDs, and verification commands in handoffs.
Repeated review feedback becomes tooling. Convert stable feedback into tests, layer rules, harness review roles, task templates, or docs.
Autonomy grows by policy. Write-capable and merge-capable agent work needs explicit approval gates and replayable evidence.

Task Templates

Use sb tasks classify "<goal>" --json to classify a goal and sb tasks templates --json to inspect the catalog. Coding-agent work should normally land in one of these templates:

Template Use for Required evidence
tpl_code_feature Feature implementation or capability additions. Acceptance criteria, focused tests, verification commands, docs/help sync when user-facing.
tpl_code_bugfix Bugs, failing tests, regressions, or broken runtime behavior. Reproduction trace or failing test, focused regression coverage, verification commands.
tpl_code_refactor Scoped refactors and module splits. Ownership boundary, unchanged-behavior checks, verification commands.
tpl_eval_replay Turning real failures into replay/eval coverage. Failure signature, replay or eval case, gate protected by the case.
tpl_docs_update Docs, help text, schema references, and agent-readable repo knowledge. Source of truth link, refreshed docs index, docs build.

These templates extend the general local task catalog; they do not replace domain workflows such as research, summarization, drafting, decision review, or maintenance.

Autonomy Ladder

Use the ladder to decide how far an agent may go without further human approval.

Level Capability Guardrail
0 Explain and plan only. No file writes.
1 Implement narrow local changes. Focused tests and diff summary.
2 Validate with runtime evidence. Trace/session/quality links or replayable evidence.
3 Open a PR or improvement record. Docs, tests, and quality evidence attached.
4 Handle review and CI feedback. Bounded retries and explicit escalation on repeated failure.
5 Merge or promote automatically. Human-approved policy, rollback path, and audit trail.

SecondBrain defaults to levels 1-3 during local development. Levels 4-5 are policy-controlled and should stay opt-in.

Failure Promotion

When an agent run fails, decide which artifact should carry the learning:

Failure shape Promotion target
Reproducible runtime failure sb quality replay-cases and sb quality promote-replay <case-id>
Repeated regression signature sb quality harness gc --last 7d --json proposal
Missing or stale docs docs update plus sb codebase docs-index --write
Architecture drift layer rule, harness review role, or focused test
UI contract drift stream-contract test and serve-ui reducer/rendering update
Ambiguous task shape task-template or plan-lifecycle update

Do not leave important lessons only in chat. Move them into docs, replay/eval artifacts, task templates, tests, or harness checks.

Mechanical Checks

Use the operating-harness checks before handing off agent-generated work:

sb quality harness audit --changed-only --json
sb quality harness review --role runtime-contract --changed-only --json
sb quality harness review --role docs-help --changed-only --json
sb quality harness review --role test-strategy --changed-only --json
sb quality harness gc --last 7d --json

audit checks coverage gaps, oversized agent-hostile files, missing remediation hints, skill validation metadata, docs hygiene, and pending replay coverage. Docs hygiene includes AGENTS.md routing, docs-index freshness, active plan lifecycle shape, MkDocs nav coverage, and component-test entrypoint coverage.

Stream Contract

When adding a new chat stream event type, keep the server, contract endpoint, frontend reducer, and UI rendering in the same diff:

  1. brain/serve/chat_runtime.py Add on_<event> to QueueCallbacks and add the event type string to STREAM_EVENT_TYPES.
  2. brain/serve/routers/core.py GET /stream-events serves STREAM_EVENT_TYPES; code changes are usually unnecessary unless the endpoint contract changes.
  3. serve-ui/src/lib/chat.ts Handle the event in reduceStreamEvent and add the type to HANDLED_EVENT_TYPES.
  4. serve-ui/src/pages/ChatPage.tsx Render the new state in the thread UI.

Startup drift check:

  • The web UI fetches /stream-events on load.
  • If the server emits an unhandled type, the browser logs console.warn.
  • After adding an event, check the browser DevTools console for drift warnings.

Decision Logs

Use a short decision log when a plan or guide records a choice that future agents must preserve. Keep entries factual:

## Decision Log

| Date | Decision | Reason | Follow-up |
| --- | --- | --- | --- |
| 2026-04-27 | Keep AGENTS.md as a routing table. | Large instruction blobs drift and waste context. | Enforce with harness docs hygiene. |

Prefer one durable decision entry over repeating the same rule in several files.

Implementation Plan: Agent-Operable Workflow

These workstreams turn reusable workflow lessons into SecondBrain product work. They should land as small, reviewable changes with CLI JSON contracts first and UI/docs follow-up only where the contract is stable.

Workstream Implementation target Verification
Skills reference Generate an active docs/reference/skills.md page from skills/*/SKILL.md frontmatter and include name, description, version, trust level, safety class, examples, and tests. sb skills docs-write; pytest -q tests/skills; make docs
Project lifecycle Extend sb project with prototype-first init, additive enhance / upgrade, and --dry-run flows that preview file changes before writing. pytest -q tests/projects/test_project_mode.py; sb project --help
Behavior scaffold Seed project-mode scaffolds with a tiny replay or behavior-eval skeleton so agent behavior expectations do not get encoded as brittle pytest assertions. pytest -q tests/quality tests/projects/test_project_mode.py; sb quality replay-cases --json
Skill validation Make sb skills validate --strict warn when a skill lacks examples, tests, trigger guidance, or clear phase-specific routing language. pytest -q tests/skills/test_skills_validator.py tests/skills/test_skills_cli.py
Evidence-first handoff Teach project finalization to include trace IDs, session IDs, quality run IDs, replay IDs, and exact verification commands when available. pytest -q tests/projects/test_project_mode.py tests/quality; sb project finalize --help
Approval posture Keep network, deployment, daemon, publish, and destructive workflow actions behind explicit operator controls and approval ledger records. pytest -q tests/infra -k "approval or policy"; sb approvals ledger --json

Suggested sequencing:

  1. Add strict skill metadata warnings and the generated skills reference page.
  2. Add dry-run lifecycle commands for project scaffolds.
  3. Seed behavior-eval or replay skeletons from project initialization.
  4. Wire finalization and handoff commands to quality and runtime evidence.
  5. Expand approval-ledger coverage for any workflow step that can leave the local machine, mutate infrastructure, or publish state.

Done means the workflow can be followed from a clean checkout without private chat context: initialize, implement, validate, promote replay/eval coverage, handoff with evidence, and inspect any approval-sensitive action after the fact.