Agent Operability¶
SecondBrain is designed to be worked on by coding agents under human direction. Keep the repo legible enough that an agent can discover context, make a scoped change, validate it, and leave evidence for review without relying on private chat history.
Operating Principles¶
| Principle | Repo behavior |
|---|---|
| Humans steer; agents execute. | Put intent, constraints, acceptance criteria, and verification in the task or plan before widening the implementation. |
AGENTS.md routes; docs explain. |
Keep AGENTS.md short and point agents to docs/INDEX.md, contributor docs, and testing guidance. |
| Failures become harness pressure. | Promote repeated or important failures into replay cases, eval cases, improvement records, docs, or mechanical checks. |
| Runtime evidence beats log hunting. | Prefer trace IDs, session IDs, quality run IDs, replay case IDs, and verification commands in handoffs. |
| Repeated review feedback becomes tooling. | Convert stable feedback into tests, layer rules, harness review roles, task templates, or docs. |
| Autonomy grows by policy. | Write-capable and merge-capable agent work needs explicit approval gates and replayable evidence. |
Task Templates¶
Use sb tasks classify "<goal>" --json to classify a goal and sb tasks
templates --json to inspect the catalog. Coding-agent work should normally land
in one of these templates:
| Template | Use for | Required evidence |
|---|---|---|
tpl_code_feature |
Feature implementation or capability additions. | Acceptance criteria, focused tests, verification commands, docs/help sync when user-facing. |
tpl_code_bugfix |
Bugs, failing tests, regressions, or broken runtime behavior. | Reproduction trace or failing test, focused regression coverage, verification commands. |
tpl_code_refactor |
Scoped refactors and module splits. | Ownership boundary, unchanged-behavior checks, verification commands. |
tpl_eval_replay |
Turning real failures into replay/eval coverage. | Failure signature, replay or eval case, gate protected by the case. |
tpl_docs_update |
Docs, help text, schema references, and agent-readable repo knowledge. | Source of truth link, refreshed docs index, docs build. |
These templates extend the general local task catalog; they do not replace domain workflows such as research, summarization, drafting, decision review, or maintenance.
Autonomy Ladder¶
Use the ladder to decide how far an agent may go without further human approval.
| Level | Capability | Guardrail |
|---|---|---|
| 0 | Explain and plan only. | No file writes. |
| 1 | Implement narrow local changes. | Focused tests and diff summary. |
| 2 | Validate with runtime evidence. | Trace/session/quality links or replayable evidence. |
| 3 | Open a PR or improvement record. | Docs, tests, and quality evidence attached. |
| 4 | Handle review and CI feedback. | Bounded retries and explicit escalation on repeated failure. |
| 5 | Merge or promote automatically. | Human-approved policy, rollback path, and audit trail. |
SecondBrain defaults to levels 1-3 during local development. Levels 4-5 are policy-controlled and should stay opt-in.
Failure Promotion¶
When an agent run fails, decide which artifact should carry the learning:
| Failure shape | Promotion target |
|---|---|
| Reproducible runtime failure | sb quality replay-cases and sb quality promote-replay <case-id> |
| Repeated regression signature | sb quality harness gc --last 7d --json proposal |
| Missing or stale docs | docs update plus sb codebase docs-index --write |
| Architecture drift | layer rule, harness review role, or focused test |
| UI contract drift | stream-contract test and serve-ui reducer/rendering update |
| Ambiguous task shape | task-template or plan-lifecycle update |
Do not leave important lessons only in chat. Move them into docs, replay/eval artifacts, task templates, tests, or harness checks.
Mechanical Checks¶
Use the operating-harness checks before handing off agent-generated work:
sb quality harness audit --changed-only --json
sb quality harness review --role runtime-contract --changed-only --json
sb quality harness review --role docs-help --changed-only --json
sb quality harness review --role test-strategy --changed-only --json
sb quality harness gc --last 7d --json
audit checks coverage gaps, oversized agent-hostile files, missing remediation
hints, skill validation metadata, docs hygiene, and pending replay coverage.
Docs hygiene includes AGENTS.md routing, docs-index freshness, active plan
lifecycle shape, MkDocs nav coverage, and component-test entrypoint coverage.
Stream Contract¶
When adding a new chat stream event type, keep the server, contract endpoint, frontend reducer, and UI rendering in the same diff:
brain/serve/chat_runtime.pyAddon_<event>toQueueCallbacksand add the event type string toSTREAM_EVENT_TYPES.brain/serve/routers/core.pyGET /stream-eventsservesSTREAM_EVENT_TYPES; code changes are usually unnecessary unless the endpoint contract changes.serve-ui/src/lib/chat.tsHandle the event inreduceStreamEventand add the type toHANDLED_EVENT_TYPES.serve-ui/src/pages/ChatPage.tsxRender the new state in the thread UI.
Startup drift check:
- The web UI fetches
/stream-eventson load. - If the server emits an unhandled type, the browser logs
console.warn. - After adding an event, check the browser DevTools console for drift warnings.
Decision Logs¶
Use a short decision log when a plan or guide records a choice that future agents must preserve. Keep entries factual:
## Decision Log
| Date | Decision | Reason | Follow-up |
| --- | --- | --- | --- |
| 2026-04-27 | Keep AGENTS.md as a routing table. | Large instruction blobs drift and waste context. | Enforce with harness docs hygiene. |
Prefer one durable decision entry over repeating the same rule in several files.
Implementation Plan: Agent-Operable Workflow¶
These workstreams turn reusable workflow lessons into SecondBrain product work. They should land as small, reviewable changes with CLI JSON contracts first and UI/docs follow-up only where the contract is stable.
| Workstream | Implementation target | Verification |
|---|---|---|
| Skills reference | Generate an active docs/reference/skills.md page from skills/*/SKILL.md frontmatter and include name, description, version, trust level, safety class, examples, and tests. |
sb skills docs-write; pytest -q tests/skills; make docs |
| Project lifecycle | Extend sb project with prototype-first init, additive enhance / upgrade, and --dry-run flows that preview file changes before writing. |
pytest -q tests/projects/test_project_mode.py; sb project --help |
| Behavior scaffold | Seed project-mode scaffolds with a tiny replay or behavior-eval skeleton so agent behavior expectations do not get encoded as brittle pytest assertions. | pytest -q tests/quality tests/projects/test_project_mode.py; sb quality replay-cases --json |
| Skill validation | Make sb skills validate --strict warn when a skill lacks examples, tests, trigger guidance, or clear phase-specific routing language. |
pytest -q tests/skills/test_skills_validator.py tests/skills/test_skills_cli.py |
| Evidence-first handoff | Teach project finalization to include trace IDs, session IDs, quality run IDs, replay IDs, and exact verification commands when available. | pytest -q tests/projects/test_project_mode.py tests/quality; sb project finalize --help |
| Approval posture | Keep network, deployment, daemon, publish, and destructive workflow actions behind explicit operator controls and approval ledger records. | pytest -q tests/infra -k "approval or policy"; sb approvals ledger --json |
Suggested sequencing:
- Add strict skill metadata warnings and the generated skills reference page.
- Add dry-run lifecycle commands for project scaffolds.
- Seed behavior-eval or replay skeletons from project initialization.
- Wire finalization and handoff commands to quality and runtime evidence.
- Expand approval-ledger coverage for any workflow step that can leave the local machine, mutate infrastructure, or publish state.
Done means the workflow can be followed from a clean checkout without private chat context: initialize, implement, validate, promote replay/eval coverage, handoff with evidence, and inspect any approval-sensitive action after the fact.