Skip to content

Agent Harness Frontend Contract

Added 2026-04-21. This document is the implementation-aligned source of truth for how agent runtime state reaches frontend clients.

Purpose

  • define the canonical event contract from harness to frontend
  • separate stable operator-facing signals from debug/detail payloads
  • inventory current live-chat and background-session exposure
  • identify bridge-engine fidelity limits

Runtime Boundary

flowchart LR
  AH["AgentHarness / AgenticRuntime"] --> CB["Callbacks"]
  CB --> SSE["Serve SSE envelopes"]
  CB --> DB["Session events + checkpoints"]
  SSE --> UI["serve-ui reducer + timeline"]
  DB --> UI
  BG["BackgroundSession runtime"] --> DB
  BG --> SSE

Primary paths:

  • live chat: brain/agent/harness.py -> brain/serve/chat_runtime.py -> serve-ui/src/lib/chat.ts
  • background/agentic: brain/background_sessions/runtime.py -> session events/checkpoints -> /sessions/{id}/events/stream
  • health snapshots: background session row + session events + tool observations
  • checkpoints/artifacts/memory proposals -> /sessions/{id}/health/stream -> cockpit/agent-run UI

Exposure Tiers

Default operator UI:

  • response text
  • thinking/reflection summaries
  • tool timeline and outcomes
  • approvals
  • agent health score, risk flags, budget pressure, memory provenance, audit rows, and recovery hint
  • citations and context-pack summary
  • Antahkarana / guardrail summary
  • background checkpoints and step progress

Debug UI:

  • raw event payloads
  • raw tool args/results/outcomes
  • context warning / compaction payloads
  • tool-selection and validation payloads

The frontend must not invent hidden chain-of-thought beyond the existing thinking stream content.

Canonical Event Inventory

Event Source Persisted Frontend role Tier
session serve chat bootstrap no assign session id default
session_state serve chat + background runtime yes runtime/status card default
session_checkpoint background runtime yes checkpoint panel default
policy_decision serve chat + background runtime yes governance timeline and diagnostics debug
hook_event harness callbacks yes hook lifecycle trace debug
sandbox_notice serve chat + background runtime yes execution-isolation notice debug
transport_diagnostic serve chat finalizer yes provider fallback / degraded-runtime diagnostics debug
subagent_event subagent registry + tool callbacks yes delegated-work lifecycle debug
turn_start, turn_start_tools harness callbacks yes turn framing debug
token, response provider streaming + harness yes assistant text default
thinking, reflection harness/reflection engine yes reasoning surface default
tool_call harness callback yes live tool row start default
tool_result post-hook success path yes rendered tool result default
tool_outcome typed ToolOutcome path yes terminal tool status default
tool_batch harness scheduler yes wave diagnostics debug
tool_selected semantic tool selection yes selection diagnostics debug
tool_validation_error arg validation / repair loop yes diagnostics debug
context_warning context warning tracker yes warning banner/timeline default
context_compacted compactor yes compaction marker default
step_progress native agentic background runtime yes background step timeline default
usage turn finalizer yes token metrics debug
retrieval serve chat completion yes citations/context panel default
antahkarana serve chat cognition summary yes cognition/guardrail panel default
approval_required, approval_applied serve chat + background runtime yes approval UX default
turn_end, completed, error serve chat + background runtime yes turn final state default

Session State Contract

Common fields:

  • session_id
  • status
  • provider
  • model
  • agent_profile
  • permission_mode
  • approval_mode
  • think_level
  • policy_path
  • sandbox_mode
  • sandbox_isolation

Optional background fields:

  • engine
  • runner_kind
  • approval_request_id
  • workspace_id
  • task_graph_id
  • process_id
  • attempt_count
  • max_retries
  • expires_at
  • parent_session_id
  • branch_source_event_id
  • branch_resume_cursor
  • last_heartbeat_at
  • last_error
  • updated_at
  • ended_at
  • resume_context

session_state remains additive. New fields must not break existing clients.

Tool Lifecycle Contract

Tool lifecycle must preserve tool_call_id on every tool-related event:

  1. tool_call
  2. tool_result when a post-hook result exists
  3. tool_outcome for the typed terminal outcome

tool_outcome.status values currently used:

  • ok
  • timeout
  • error
  • denied
  • artifact

tool_result is not a substitute for tool_outcome; it is the rendered result surface for successful post-hook tool payloads.

Payload Examples

{
  "type": "tool_call",
  "payload": {
    "name": "local_search",
    "arguments": { "query": "roadmap", "_tool_call_id": "call-1" },
    "tool_call_id": "call-1"
  }
}
{
  "type": "tool_outcome",
  "payload": {
    "tool_call_id": "call-1",
    "tool_name": "local_search",
    "outcome_type": "ToolExecutionResult",
    "status": "ok",
    "elapsed_ms": 84,
    "result": { "chunks": [] }
  }
}

Approval pause / resume

{
  "type": "approval_required",
  "payload": {
    "request_id": "apr_123",
    "tool_name": "write_file",
    "reason": "Web chat turn requested a write-capable tool."
  }
}
{
  "type": "approval_applied",
  "payload": {
    "request_id": "apr_123",
    "tool_name": "write_file"
  }
}

Context compaction

{
  "type": "context_compacted",
  "payload": {
    "original_count": 24,
    "compacted_count": 10,
    "tokens_saved": 4200,
    "summary": "Kept the recent working set and collapsed older tool chatter.",
    "preserved_messages": 6,
    "resume_cursor": "compact:serve_demo:14"
  }
}

Background checkpoint

{
  "type": "session_checkpoint",
  "payload": {
    "session_id": "bg_123",
    "sequence": 4,
    "checkpoint_type": "step",
    "status": "completed",
    "summary_text": "step-1 completed: Inspect files",
    "resume_cursor": null
  }
}

Policy / sandbox / transport / subagent diagnostics

{
  "type": "policy_decision",
  "payload": {
    "subject": "turn_runtime",
    "outcome": "configured",
    "reason": "Resolved runtime execution mode for this chat turn.",
    "source": "serve_chat_runtime",
    "policy_path": "execution_mode -> approval_policy -> runtime_callbacks",
    "permission_mode": "normal",
    "approval_mode": "on_request"
  }
}
{
  "type": "sandbox_notice",
  "payload": {
    "requested_backend": "background_session",
    "resolved_backend": "task_workspace",
    "isolation": "workspace",
    "reason": "Prepared a task workspace for write-capable background work."
  }
}
{
  "type": "transport_diagnostic",
  "payload": {
    "status": "degraded",
    "kind": "provider_fallback",
    "provider_requested": "openai",
    "provider_actual": "anthropic",
    "reason": "Primary provider failed health checks.",
    "failure_count": 1
  }
}
{
  "type": "subagent_event",
  "payload": {
    "agent_id": "subagent_1",
    "status": "completed",
    "parent_session_id": "serve_chat_123"
  }
}

Session Detail And Resume Contract

GET /chat/sessions/{session_id} now exposes additional operator-facing data:

  • resume_points: ordered resume/branch candidates derived from checkpoints, approvals, and completed turns
  • subagents: persisted delegated-work snapshots tied to the session
  • background_artifacts: typed artifacts claimed by a background session, such as task graphs, isolated workspaces, agentic run records, and final outputs
  • channel: a deterministic projection of the session event log into stable participants and stamped envelopes. The raw event log remains authoritative; the projection is for bounded timeline rendering and operator audit.

GET /sessions/{session_id} exposes the durable background row plus:

  • checkpoints: latest persisted background checkpoints
  • artifacts: typed background artifacts, ordered newest first
  • health: current operator-facing health snapshot
  • channel: the same participant/envelope projection exposed on chat session detail, rooted at the background session id

GET /sessions/{session_id}/health/stream emits health_snapshot SSE envelopes whenever the derived snapshot changes. Start, resume, pause, cancel, list, and detail responses also include health data so clients do not need to wait for a separate refetch before rendering the current control-plane state.

Health snapshots include policy.health and policy.enforcement. The policy is stored locally in background-session metadata and currently supports these operator budgets: minimum score, runtime minutes, total tokens, cost USD, context usage percentage, failed tool calls, loop warnings, repeated plan events, tool latency, and events without evidence. When a running/recovering session crosses a configured threshold, the runtime/supervisor either creates a normal local approval request and moves the session to awaiting_approval, or pauses the session directly, depending on policy.health.action.

Resume flows:

  • POST /chat/sessions/{session_id}/branch branches a new chat session from from_event_id or resume_cursor
  • POST /sessions/{session_id}/resume can either resume the same background session or branch a new one when from_event_id or resume_cursor is supplied
  • POST /sessions/{session_id}/pause pauses a background session without marking it terminal
  • branch sessions are seeded with a resume_seed checkpoint so the frontend can render continuity context without replaying the full source history
  • branched background sessions carry parent_session_id, branch_source_event_id, and branch_resume_cursor on the durable row

Session channel projection is exposed on GET /chat/sessions/{session_id} and GET /sessions/{session_id}:

  • channel.channel_id is session:{session_id}.
  • participants includes the runtime, the session agent, and any observed local operator or tool participants.
  • envelopes mirrors recent session_events rows with sender_id, recipient_ids, event_type, summary, and the original payload.
  • Approval events route through the local operator participant instead of using an implicit side path.

Background Session Kernel Contract

Background sessions are the durable agent kernel for long-running work. The runtime persists enough state to recover or branch without depending on process memory:

  • attempts: every run increments attempt_count; transient failures move to recovering while attempt_count <= max_retries
  • expiry: sessions with a past expires_at fail with an expired checkpoint before execution or supervisor recovery
  • pause/resume: paused sessions keep their row, checkpoints, and artifacts; resume requeues the same session unless a branch cursor is supplied
  • work isolation: write-capable sessions prepare a task graph and task workspace, then persist both as typed artifacts
  • approvals: approval waits remain awaiting_approval with a durable approval_request_id
  • artifacts: final outputs, task graphs, workspaces, and native agentic run ids are persisted in background_agent_artifacts

Gap Matrix

Gap class Current state
already exposed thinking, tool_call, tool_result, retrieval, antahkarana, approvals, checkpoints
now standardized in the main stream tool_outcome, step_progress, policy_decision, hook_event, sandbox_notice, transport_diagnostic, subagent_event
persisted and rendered in diagnostics tool_batch, tool_selected, tool_validation_error, context_warning, context_compacted
session detail continuity surfaces resume_points, resume_seed checkpoints, persisted subagents, typed background artifacts
background durable kernel retries, expiry, pause/resume, branch-from-state, approval waits, workspace/task graph artifacts
agent health control plane health score, stuck/waiting/degraded/budget flags, memory provenance, audit trail, recovery hints, health SSE, and enforced local health policy
background-only fidelity limits bridge engines expose state, checkpoints, approvals, final output, and typed artifacts but not native per-tool callbacks

Bridge Engine Limits

claude and codex background sessions are best-effort bridge surfaces. They can reliably expose:

  • session_state
  • session_checkpoint
  • policy_decision
  • sandbox_notice
  • approval_required / approval_applied
  • completed / error

They do not claim native harness parity for:

  • per-tool callbacks
  • typed tool_outcome
  • context warning / compaction internals
  • native step_progress

Rollout Order

  1. keep STREAM_EVENT_TYPES and HANDLED_EVENT_TYPES in lock-step
  2. emit canonical policy, hook, sandbox, transport, and subagent diagnostics from QueueCallbacks
  3. persist continuity checkpoints and branchable resume metadata in session storage
  4. standardize background session_state and persist step_progress
  5. render unified UI timeline and diagnostic panels in serve-ui
  6. keep docs/tests aligned with the shipped contract

Acceptance Criteria

  • /stream-events matches the frontend handled-event set
  • duplicate tool names resolve correctly by tool_call_id
  • chat turns emit tool_outcome for success, timeout, denial, artifact, and failure paths
  • context warnings and compactions are both persisted and visible in the UI timeline
  • policy, hook, sandbox, transport, and subagent diagnostics are persisted and reducer-handled
  • session detail exposes resume points and persisted delegated-work state
  • chat and background sessions can branch from an event id or resume cursor
  • native background agentic runs emit session_state, session_checkpoint, and step_progress
  • bridge sessions document reduced fidelity instead of pretending full parity