Agent Operability¶

SecondBrain is designed to be worked on by coding agents under human direction. Keep the repo legible enough that an agent can discover context, make a scoped change, validate it, and leave evidence for review without relying on private chat history.

Operating Principles¶

Principle	Repo behavior
Humans steer; agents execute.	Put intent, constraints, acceptance criteria, and verification in the task or plan before widening the implementation.
`AGENTS.md` routes; docs explain.	Keep `AGENTS.md` short and point agents to `docs/INDEX.md`, contributor docs, and testing guidance.
Failures become harness pressure.	Promote repeated or important failures into replay cases, eval cases, improvement records, docs, or mechanical checks.
Runtime evidence beats log hunting.	Prefer trace IDs, session IDs, quality run IDs, replay case IDs, and verification commands in handoffs.
Repeated review feedback becomes tooling.	Convert stable feedback into tests, layer rules, harness review roles, task templates, or docs.
Autonomy grows by policy.	Write-capable and merge-capable agent work needs explicit approval gates and replayable evidence.

Task Templates¶

Use sb tasks classify "<goal>" --json to classify a goal and sb tasks templates --json to inspect the catalog. Coding-agent work should normally land in one of these templates:

Template	Use for	Required evidence
`tpl_code_feature`	Feature implementation or capability additions.	Acceptance criteria, focused tests, verification commands, docs/help sync when user-facing.
`tpl_code_bugfix`	Bugs, failing tests, regressions, or broken runtime behavior.	Reproduction trace or failing test, focused regression coverage, verification commands.
`tpl_code_refactor`	Scoped refactors and module splits.	Ownership boundary, unchanged-behavior checks, verification commands.
`tpl_eval_replay`	Turning real failures into replay/eval coverage.	Failure signature, replay or eval case, gate protected by the case.
`tpl_docs_update`	Docs, help text, schema references, and agent-readable repo knowledge.	Source of truth link, refreshed docs index, docs build.

These templates extend the general local task catalog; they do not replace domain workflows such as research, summarization, drafting, decision review, or maintenance.

Autonomy Ladder¶

Use the ladder to decide how far an agent may go without further human approval.

Level	Capability	Guardrail
0	Explain and plan only.	No file writes.
1	Implement narrow local changes.	Focused tests and diff summary.
2	Validate with runtime evidence.	Trace/session/quality links or replayable evidence.
3	Open a PR or improvement record.	Docs, tests, and quality evidence attached.
4	Handle review and CI feedback.	Bounded retries and explicit escalation on repeated failure.
5	Merge or promote automatically.	Human-approved policy, rollback path, and audit trail.

SecondBrain defaults to levels 1-3 during local development. Levels 4-5 are policy-controlled and should stay opt-in.

Failure Promotion¶

When an agent run fails, decide which artifact should carry the learning:

Failure shape	Promotion target
Reproducible runtime failure	`sb quality replay-cases` and `sb quality promote-replay <case-id>`
Repeated regression signature	`sb quality harness gc --last 7d --json` proposal
Missing or stale docs	docs update plus `sb codebase docs-index --write`
Architecture drift	layer rule, harness review role, or focused test
UI contract drift	stream-contract test and `serve-ui` reducer/rendering update
Ambiguous task shape	task-template or plan-lifecycle update

Do not leave important lessons only in chat. Move them into docs, replay/eval artifacts, task templates, tests, or harness checks.

Mechanical Checks¶

Use the operating-harness checks before handing off agent-generated work:

sb quality harness audit --changed-only --json
sb quality harness review --role runtime-contract --changed-only --json
sb quality harness review --role docs-help --changed-only --json
sb quality harness review --role test-strategy --changed-only --json
sb quality harness gc --last 7d --json

audit checks coverage gaps, oversized agent-hostile files, missing remediation hints, skill validation metadata, docs hygiene, and pending replay coverage. Docs hygiene includes AGENTS.md routing, docs-index freshness, active plan lifecycle shape, MkDocs nav coverage, and component-test entrypoint coverage.

Stream Contract¶

When adding a new chat stream event type, keep the server, contract endpoint, frontend reducer, and UI rendering in the same diff:

brain/serve/chat_runtime.py Add on_<event> to QueueCallbacks and add the event type string to STREAM_EVENT_TYPES.
brain/serve/routers/core.py GET /stream-events serves STREAM_EVENT_TYPES; code changes are usually unnecessary unless the endpoint contract changes.
serve-ui/src/lib/chat.ts Handle the event in reduceStreamEvent and add the type to HANDLED_EVENT_TYPES.
serve-ui/src/pages/ChatPage.tsx Render the new state in the thread UI.

Startup drift check:

The web UI fetches /stream-events on load.
If the server emits an unhandled type, the browser logs console.warn.
After adding an event, check the browser DevTools console for drift warnings.

Decision Logs¶

Use a short decision log when a plan or guide records a choice that future agents must preserve. Keep entries factual:

## Decision Log

| Date | Decision | Reason | Follow-up |
| --- | --- | --- | --- |
| 2026-04-27 | Keep AGENTS.md as a routing table. | Large instruction blobs drift and waste context. | Enforce with harness docs hygiene. |

Prefer one durable decision entry over repeating the same rule in several files.

Implementation Plan: Agent-Operable Workflow¶

These workstreams turn reusable workflow lessons into SecondBrain product work. They should land as small, reviewable changes with CLI JSON contracts first and UI/docs follow-up only where the contract is stable.

Workstream	Implementation target	Verification
Skills reference	Generate an active `docs/reference/skills.md` page from `skills/*/SKILL.md` frontmatter and include name, description, version, trust level, safety class, examples, and tests.	`sb skills docs-write`; `pytest -q tests/skills`; `make docs`
Project lifecycle	Extend `sb project` with prototype-first `init`, additive `enhance` / `upgrade`, and `--dry-run` flows that preview file changes before writing.	`pytest -q tests/projects/test_project_mode.py`; `sb project --help`
Behavior scaffold	Seed project-mode scaffolds with a tiny replay or behavior-eval skeleton so agent behavior expectations do not get encoded as brittle pytest assertions.	`pytest -q tests/quality tests/projects/test_project_mode.py`; `sb quality replay-cases --json`
Skill validation	Make `sb skills validate --strict` warn when a skill lacks examples, tests, trigger guidance, or clear phase-specific routing language.	`pytest -q tests/skills/test_skills_validator.py tests/skills/test_skills_cli.py`
Evidence-first handoff	Teach project finalization to include trace IDs, session IDs, quality run IDs, replay IDs, and exact verification commands when available.	`pytest -q tests/projects/test_project_mode.py tests/quality`; `sb project finalize --help`
Approval posture	Keep network, deployment, daemon, publish, and destructive workflow actions behind explicit operator controls and approval ledger records.	`pytest -q tests/infra -k "approval or policy"`; `sb approvals ledger --json`

Suggested sequencing:

Add strict skill metadata warnings and the generated skills reference page.
Add dry-run lifecycle commands for project scaffolds.
Seed behavior-eval or replay skeletons from project initialization.
Wire finalization and handoff commands to quality and runtime evidence.
Expand approval-ledger coverage for any workflow step that can leave the local machine, mutate infrastructure, or publish state.

Done means the workflow can be followed from a clean checkout without private chat context: initialize, implement, validate, promote replay/eval coverage, handoff with evidence, and inspect any approval-sensitive action after the fact.