Agent Harness Frontend Contract¶

Added 2026-04-21. This document is the implementation-aligned source of truth for how agent runtime state reaches frontend clients.

Purpose¶

define the canonical event contract from harness to frontend
separate stable operator-facing signals from debug/detail payloads
inventory current live-chat and background-session exposure
identify bridge-engine fidelity limits

Runtime Boundary¶

flowchart LR
  AH["AgentHarness / AgenticRuntime"] --> CB["Callbacks"]
  CB --> SSE["Serve SSE envelopes"]
  CB --> DB["Session events + checkpoints"]
  SSE --> UI["serve-ui reducer + timeline"]
  DB --> UI
  BG["BackgroundSession runtime"] --> DB
  BG --> SSE

Primary paths:

live chat: brain/agent/harness.py -> brain/serve/chat_runtime.py -> serve-ui/src/lib/chat.ts
background/agentic: brain/background_sessions/runtime.py -> session events/checkpoints -> /sessions/{id}/events/stream
health snapshots: background session row + session events + tool observations
checkpoints/artifacts/memory proposals -> /sessions/{id}/health/stream -> cockpit/agent-run UI

Exposure Tiers¶

Default operator UI:

response text
thinking/reflection summaries
tool timeline and outcomes
approvals
agent health score, risk flags, budget pressure, memory provenance, audit rows, and recovery hint
citations and context-pack summary
Antahkarana / guardrail summary
background checkpoints and step progress

Debug UI:

raw event payloads
raw tool args/results/outcomes
context warning / compaction payloads
tool-selection and validation payloads

The frontend must not invent hidden chain-of-thought beyond the existing thinking stream content.

Canonical Event Inventory¶

Event	Source	Persisted	Frontend role	Tier
`session`	serve chat bootstrap	no	assign session id	default
`session_state`	serve chat + background runtime	yes	runtime/status card	default
`session_checkpoint`	background runtime	yes	checkpoint panel	default
`policy_decision`	serve chat + background runtime	yes	governance timeline and diagnostics	debug
`hook_event`	harness callbacks	yes	hook lifecycle trace	debug
`sandbox_notice`	serve chat + background runtime	yes	execution-isolation notice	debug
`transport_diagnostic`	serve chat finalizer	yes	provider fallback / degraded-runtime diagnostics	debug
`subagent_event`	subagent registry + tool callbacks	yes	delegated-work lifecycle	debug
`turn_start`, `turn_start_tools`	harness callbacks	yes	turn framing	debug
`token`, `response`	provider streaming + harness	yes	assistant text	default
`thinking`, `reflection`	harness/reflection engine	yes	reasoning surface	default
`tool_call`	harness callback	yes	live tool row start	default
`tool_result`	post-hook success path	yes	rendered tool result	default
`tool_outcome`	typed `ToolOutcome` path	yes	terminal tool status	default
`tool_batch`	harness scheduler	yes	wave diagnostics	debug
`tool_selected`	semantic tool selection	yes	selection diagnostics	debug
`tool_validation_error`	arg validation / repair loop	yes	diagnostics	debug
`context_warning`	context warning tracker	yes	warning banner/timeline	default
`context_compacted`	compactor	yes	compaction marker	default
`step_progress`	native agentic background runtime	yes	background step timeline	default
`usage`	turn finalizer	yes	token metrics	debug
`retrieval`	serve chat completion	yes	citations/context panel	default
`antahkarana`	serve chat cognition summary	yes	cognition/guardrail panel	default
`approval_required`, `approval_applied`	serve chat + background runtime	yes	approval UX	default
`turn_end`, `completed`, `error`	serve chat + background runtime	yes	turn final state	default

Session State Contract¶

Common fields:

session_id
status
provider
model
agent_profile
permission_mode
approval_mode
think_level
policy_path
sandbox_mode
sandbox_isolation

Optional background fields:

engine
runner_kind
approval_request_id
workspace_id
task_graph_id
process_id
attempt_count
max_retries
expires_at
parent_session_id
branch_source_event_id
branch_resume_cursor
last_heartbeat_at
last_error
updated_at
ended_at
resume_context

session_state remains additive. New fields must not break existing clients.

Tool Lifecycle Contract¶

Tool lifecycle must preserve tool_call_id on every tool-related event:

tool_call
tool_result when a post-hook result exists
tool_outcome for the typed terminal outcome

tool_outcome.status values currently used:

ok
timeout
error
denied
artifact

tool_result is not a substitute for tool_outcome; it is the rendered result surface for successful post-hook tool payloads.

Payload Examples¶

Live chat `local_search`¶

{
  "type": "tool_call",
  "payload": {
    "name": "local_search",
    "arguments": { "query": "roadmap", "_tool_call_id": "call-1" },
    "tool_call_id": "call-1"
  }
}

{
  "type": "tool_outcome",
  "payload": {
    "tool_call_id": "call-1",
    "tool_name": "local_search",
    "outcome_type": "ToolExecutionResult",
    "status": "ok",
    "elapsed_ms": 84,
    "result": { "chunks": [] }
  }
}

Approval pause / resume¶

{
  "type": "approval_required",
  "payload": {
    "request_id": "apr_123",
    "tool_name": "write_file",
    "reason": "Web chat turn requested a write-capable tool."
  }
}

{
  "type": "approval_applied",
  "payload": {
    "request_id": "apr_123",
    "tool_name": "write_file"
  }
}

Context compaction¶

{
  "type": "context_compacted",
  "payload": {
    "original_count": 24,
    "compacted_count": 10,
    "tokens_saved": 4200,
    "summary": "Kept the recent working set and collapsed older tool chatter.",
    "preserved_messages": 6,
    "resume_cursor": "compact:serve_demo:14"
  }
}

Background checkpoint¶

{
  "type": "session_checkpoint",
  "payload": {
    "session_id": "bg_123",
    "sequence": 4,
    "checkpoint_type": "step",
    "status": "completed",
    "summary_text": "step-1 completed: Inspect files",
    "resume_cursor": null
  }
}

Policy / sandbox / transport / subagent diagnostics¶

{
  "type": "policy_decision",
  "payload": {
    "subject": "turn_runtime",
    "outcome": "configured",
    "reason": "Resolved runtime execution mode for this chat turn.",
    "source": "serve_chat_runtime",
    "policy_path": "execution_mode -> approval_policy -> runtime_callbacks",
    "permission_mode": "normal",
    "approval_mode": "on_request"
  }
}

{
  "type": "sandbox_notice",
  "payload": {
    "requested_backend": "background_session",
    "resolved_backend": "task_workspace",
    "isolation": "workspace",
    "reason": "Prepared a task workspace for write-capable background work."
  }
}

{
  "type": "transport_diagnostic",
  "payload": {
    "status": "degraded",
    "kind": "provider_fallback",
    "provider_requested": "openai",
    "provider_actual": "anthropic",
    "reason": "Primary provider failed health checks.",
    "failure_count": 1
  }
}

{
  "type": "subagent_event",
  "payload": {
    "agent_id": "subagent_1",
    "status": "completed",
    "parent_session_id": "serve_chat_123"
  }
}

Session Detail And Resume Contract¶

GET /chat/sessions/{session_id} now exposes additional operator-facing data:

resume_points: ordered resume/branch candidates derived from checkpoints, approvals, and completed turns
subagents: persisted delegated-work snapshots tied to the session
background_artifacts: typed artifacts claimed by a background session, such as task graphs, isolated workspaces, agentic run records, and final outputs
channel: a deterministic projection of the session event log into stable participants and stamped envelopes. The raw event log remains authoritative; the projection is for bounded timeline rendering and operator audit.

GET /sessions/{session_id} exposes the durable background row plus:

checkpoints: latest persisted background checkpoints
artifacts: typed background artifacts, ordered newest first
health: current operator-facing health snapshot
channel: the same participant/envelope projection exposed on chat session detail, rooted at the background session id

GET /sessions/{session_id}/health/stream emits health_snapshot SSE envelopes whenever the derived snapshot changes. Start, resume, pause, cancel, list, and detail responses also include health data so clients do not need to wait for a separate refetch before rendering the current control-plane state.

Health snapshots include policy.health and policy.enforcement. The policy is stored locally in background-session metadata and currently supports these operator budgets: minimum score, runtime minutes, total tokens, cost USD, context usage percentage, failed tool calls, loop warnings, repeated plan events, tool latency, and events without evidence. When a running/recovering session crosses a configured threshold, the runtime/supervisor either creates a normal local approval request and moves the session to awaiting_approval, or pauses the session directly, depending on policy.health.action.

Resume flows:

POST /chat/sessions/{session_id}/branch branches a new chat session from from_event_id or resume_cursor
POST /sessions/{session_id}/resume can either resume the same background session or branch a new one when from_event_id or resume_cursor is supplied
POST /sessions/{session_id}/pause pauses a background session without marking it terminal
branch sessions are seeded with a resume_seed checkpoint so the frontend can render continuity context without replaying the full source history
branched background sessions carry parent_session_id, branch_source_event_id, and branch_resume_cursor on the durable row

Session channel projection is exposed on GET /chat/sessions/{session_id} and GET /sessions/{session_id}:

channel.channel_id is session:{session_id}.
participants includes the runtime, the session agent, and any observed local operator or tool participants.
envelopes mirrors recent session_events rows with sender_id, recipient_ids, event_type, summary, and the original payload.
Approval events route through the local operator participant instead of using an implicit side path.

Background Session Kernel Contract¶

Background sessions are the durable agent kernel for long-running work. The runtime persists enough state to recover or branch without depending on process memory:

attempts: every run increments attempt_count; transient failures move to recovering while attempt_count <= max_retries
expiry: sessions with a past expires_at fail with an expired checkpoint before execution or supervisor recovery
pause/resume: paused sessions keep their row, checkpoints, and artifacts; resume requeues the same session unless a branch cursor is supplied
work isolation: write-capable sessions prepare a task graph and task workspace, then persist both as typed artifacts
approvals: approval waits remain awaiting_approval with a durable approval_request_id
artifacts: final outputs, task graphs, workspaces, and native agentic run ids are persisted in background_agent_artifacts

Gap Matrix¶

Gap class	Current state
already exposed	`thinking`, `tool_call`, `tool_result`, `retrieval`, `antahkarana`, approvals, checkpoints
now standardized in the main stream	`tool_outcome`, `step_progress`, `policy_decision`, `hook_event`, `sandbox_notice`, `transport_diagnostic`, `subagent_event`
persisted and rendered in diagnostics	`tool_batch`, `tool_selected`, `tool_validation_error`, `context_warning`, `context_compacted`
session detail continuity surfaces	`resume_points`, `resume_seed` checkpoints, persisted `subagents`, typed background artifacts
background durable kernel	retries, expiry, pause/resume, branch-from-state, approval waits, workspace/task graph artifacts
agent health control plane	health score, stuck/waiting/degraded/budget flags, memory provenance, audit trail, recovery hints, health SSE, and enforced local health policy
background-only fidelity limits	bridge engines expose state, checkpoints, approvals, final output, and typed artifacts but not native per-tool callbacks

Bridge Engine Limits¶

claude and codex background sessions are best-effort bridge surfaces. They can reliably expose:

session_state
session_checkpoint
policy_decision
sandbox_notice
approval_required / approval_applied
completed / error

They do not claim native harness parity for:

per-tool callbacks
typed tool_outcome
context warning / compaction internals
native step_progress

Rollout Order¶

keep STREAM_EVENT_TYPES and HANDLED_EVENT_TYPES in lock-step
emit canonical policy, hook, sandbox, transport, and subagent diagnostics from QueueCallbacks
persist continuity checkpoints and branchable resume metadata in session storage
standardize background session_state and persist step_progress
render unified UI timeline and diagnostic panels in serve-ui
keep docs/tests aligned with the shipped contract

Acceptance Criteria¶

/stream-events matches the frontend handled-event set
duplicate tool names resolve correctly by tool_call_id
chat turns emit tool_outcome for success, timeout, denial, artifact, and failure paths
context warnings and compactions are both persisted and visible in the UI timeline
policy, hook, sandbox, transport, and subagent diagnostics are persisted and reducer-handled
session detail exposes resume points and persisted delegated-work state
chat and background sessions can branch from an event id or resume cursor
native background agentic runs emit session_state, session_checkpoint, and step_progress
bridge sessions document reduced fidelity instead of pretending full parity