Agent Harness Claude Code Learnings¶

Current State¶

SecondBrain already had several pieces of a governed local agent runtime:

brain/agent/, brain/agent_runtime/, and brain/tasks/ provide planning, tool execution, policies, traces, and task graphs.
brain/context_packs/ and brain/retrieval/ provide context compilation and promoted-knowledge-first retrieval.
brain/policies/, brain/orchestrator/, and approval stores provide permission decisions and approval gates.
brain/background_sessions/ provides durable session state, checkpoints, bridge sessions, and resumable runs.
brain/skills/ and contracts/registry/agent_profiles.yaml provide local skill discovery and agent profiles.

Gaps Closed In This Rollout¶

Added canonical agent-run contracts under brain/agent_harness/ for task runs, context budgets, permission decisions, checkpoints, verification plans/results, run traces, and skill manifest exports.
Exported matching runtime schemas under contracts/runtime/.
Added deterministic verification plans with command, grep, file existence, JSON schema, and richer artifact validation checks for rendered UI/HTML, screenshots/images, JSON artifacts, documents, decks, and generic artifacts.
Added durable file checkpoint create/list/diff/rewind support backed by JSON manifests and copied blobs, independent of Git.
Added default deterministic kernel hooks for secret scanning, destructive-command blocking, and provenance checks on memory writes.
Added Claude-Code-style permission aliases: plan_only, auto_read, auto_edit_local, auto_safe, and dangerous_requires_approval.
Added top-level operator commands: sb verify, sb checkpoint, sb rewind, sb checkpoints list, sb checkpoints diff, and sb act.
Extended sb diff <checkpoint_id> to diff current files against a durable checkpoint while preserving the existing Git diff behavior when no checkpoint is supplied.
Added sb task start "..." --worktree / sb tasks start to create a task graph and optionally prepare an isolated workspace.
Added sb skills run <skill> "..." to produce a skill-scoped agent prompt package.
Added sb context explain <run_id> and expanded sb context doctor with estimated token usage, top token sources, duplicate context, low-value context, stale memories, and pruning recommendations.
Added sb context doctor --write-rank-hints so pruning diagnostics can feed retrieval rank hints instead of only reporting budget pressure.
Added sb context doctor --cleanup-rank-hints as a dry-run-first cleanup path for resolved managed retrieval hints, with applied cleanup archiving removed hints instead of deleting source content.
Added bundled skills for repo maintenance, context refining, Codex prompting, and artifact generation.
Added specialist profiles for repo-scanner, security-reviewer, and memory-curator.
Added automatic durable pre-mutation checkpoints for chat code writes, gateway write_file, and vault markdown writes.
Added file-backed AgentRunTrace persistence for sb verify, sb checkpoint, sb rewind, sb plan, sb task start, sb tasks run, sb act, and sb tasks resume operator actions.
Added stable AgentRunTrace receipts for background-session lifecycle checkpoints, including queued, started, approval, retry, step, completed, failed, paused, cancelled, and operator-expired states.
Added stable AgentRunTrace receipts for interactive serve-chat turns, including non-stream, streaming, slash-command, approval-resume, failure, and cancellation paths.
Added terminal REPL AgentRunTrace receipts through ChatTransport, preserving turn journal token, fallback, tool, and terminal status details.
Added opt-in direct AgentHarness.run_turn AgentRunTrace receipts for non-transport surfaces such as sb chat --print, chat subagents, local agent-builder runs, spawn-tool subagents, and simulations.
Added opt-in AgentRunTrace receipts for prompt autotune/eval helper runs when a trace state directory is supplied.
Added sb traces list and sb traces show <trace_id> so durable agent-run receipts can be inspected directly from the CLI.
Added normalized subagent.result.v1 envelopes with compressed findings, deduped sources, confidence, verification status, and raw-output truncation markers while preserving legacy subagent output.
Added sb tasks workspace status [task_graph_id] to audit workspace health, git cleanliness, changed files, blockers, and merge readiness across task worktrees.
Added sb tasks workspace promote <task_graph_id> as a dry-run-first, --apply --yes-guarded fast-forward promotion path for clean candidate task worktrees, with candidate commit metadata and optional reviewer gating.
Added narrow non-file state checkpoints for explicit SQLite rows and keyed SQLite tables, including diff, rewind, and trace receipts.

Minimal Implementation Shape¶

The implementation is additive. Existing sb chat, task planning, background sessions, and policy flows remain compatible. New contracts and commands sit beside the existing runtime so the system can migrate gradually toward the standard lifecycle:

Understand -> Gather Context -> Plan -> Act -> Verify -> Summarize -> Persist Memory

Deterministic task graphs now synthesize artifact validation checks when completed step outputs or declared output metadata are available. sb verify still treats truly empty plans as unverified instead of claiming success.

Rollout Plan¶

Keep the current task graph and background-session stores as the source of operational truth.
Use the new agent-run contracts as portable read/write envelopes at runtime boundaries.
Attach richer verification plans to additional domain-specific task templates as those surfaces expose artifact metadata.
Continue expanding checkpoint hooks from known file-mutation tools into broader task execution loops where state restore semantics are explicit.
Keep promoting subagent outputs through compressed findings only, with explicit source paths and verification status.
Use context-budget rank hints as the first retrieval-feedback path, then broaden into memory-curation workflows.

Remaining Follow-Up¶

No open implementation gap is tracked for this rollout. Future expansion should stay conservative: add new checkpoint scopes, artifact validators, or context-pruning actions only when the affected state has explicit restore or audit semantics.