Skip to content

Agent Harness — Overview & Architecture

Design Philosophy

AgentHarness is a thin façade: it holds no mutable per-turn state after __init__. All per-turn data lives in a TurnRuntimeState object created at the start of each turn and discarded at the end. Ten single-responsibility collaborators share that object via explicit parameters.


Collaborator Map

AgentHarness.run_turn()
 ├─ ThinkingManager.on_turn_start()         ← turn counter + thinking state
 ├─ [parallel] _load_memory_context()       ← retrieve episodic memory
 ├─ [parallel] tools.run("get_context_pack") ← vault context retrieval
 ├─ TurnBudget.create()                     ← explicit resource envelope
 │   └─ DeadlineToken.from_budget()         ← per-tool cooperative cancellation
 ├─ TurnRuntimeState(budget)                ← all mutable turn state
 │   ├─ token counts
 │   ├─ CitationAccumulator
 │   ├─ tool trace (thread-safe)
 │   └─ late-write gate
 ├─ TurnPreparer.prepare()                  ← message assembly, per-turn render hash
 ├─ _execute_turn_loop_v2()                 ← bounded tool-calling loop
 │   │
 │   ├─ SemanticToolSelector (optional)     ← reduce tool list for focused queries
 │   │
 │   ├─ ReplayControl.should_skip()         ← idempotency-aware dedup
 │   │   ├─ idempotent + prior success   → skip
 │   │   ├─ idempotent + prior failure   → allow retry
 │   │   └─ non-idempotent               → always allow
 │   │
 │   ├─ ToolScheduler.schedule()            ← safe wave partitioning
 │   │   ├─ READ_ONLY, non-overlapping keys → parallel wave
 │   │   └─ LOCAL_WRITE / NETWORK / DESTRUCTIVE → sequential wave
 │   │
 │   ├─ BoundedToolExecutor.execute()       ← deadline-safe execution
 │   │   ├─ DeadlineToken passed into RunContext
 │   │   ├─ complete_trace() → False        → late-write suppressed
 │   │   └─ ArtifactStore                  ← oversized outputs (TTL-aware)
 │   │
 │   └─ ReflectionEngine.reflect()          ← budget-aware reflection step
 ├─ state.close_writes()                    ← seal the write gate
 └─ TurnFinalizer.finalize()                ← citations, memory, decisions, judges

Key Design Decisions

1. TurnRuntimeState — All Per-Turn State in One Place

TurnRuntimeState is created fresh each turn and discarded at the end. The harness instance is effectively immutable after __init__, so concurrent or sequential turns cannot share accidental state.

2. Late-Write Blocking

A timed-out tool thread may complete in the background after the turn loop has already moved on. TurnRuntimeState has a write gate (_writes_open). The harness calls state.close_writes() after the loop exits. The background thread calls state.complete_trace() when it finishes; if it finds a timed_out entry in the trace, complete_trace() returns False and the thread suppresses all side effects.

3. ToolScheduler — LOCAL_WRITE Always Sequential

ToolScheduler.schedule() enforces: - READ_ONLY with non-overlapping resource_keys → parallel wave - LOCAL_WRITE / NETWORK / DESTRUCTIVE → always sequential (single-call wave)

This prevents two vault-writing or database-writing tools from interleaving and corrupting state.

4. Per-Tool Deadline Propagation

Each tool call receives per_tool_s = budget.per_tool_remaining_s(cap_s) as its deadline, not the full turn timeout. This prevents a single long-running tool from claiming the entire remaining budget. A DeadlineToken is also passed so cooperative tools can exit early.

5. Typed ToolOutcomes

All tool outcomes are a closed union of frozen dataclasses:

Type When used
ToolExecutionResult Successful execution
ToolTimeout Exceeded deadline
ToolFailure Exception or error response
ToolDenied Pre-execution skip (duplicate, blocked, validation, deadline, write denied)
ToolArtifactReference Oversized output stored in ArtifactStore

6. ArtifactStore — Session-Scoped, TTL-Aware Output Storage

When a tool output exceeds 12,000 characters, it is stored in ArtifactStore (under /tmp/sb_artifacts/<session_id>/) instead of being injected raw into the LLM context. The model receives a compact reference and retrieves the full content via read_file. Artifacts expire after 1 hour by default.

7. ReplayControl — Idempotency-Aware Dedup

ReplayControl.should_skip() checks ToolSpec.idempotent and the status of the prior call before suppressing a duplicate: - Non-idempotent → always allow (e.g. send_email may legitimately be called twice) - Idempotent + prior SUCCESS → skip - Idempotent + prior FAILURE / TIMEOUT → allow (retry)

8. Per-Turn prompt_render_hash

TurnPreparer.prepare() computes the render hash from system_prompt_override on every call. If the system prompt changes mid-session, the hash correctly reflects the new content. TurnPreparation is a frozen dataclass so consumers cannot accidentally mutate assembled turn state.


Thread Safety Model

Object Mutating threads Safety mechanism
TurnBudget Main thread + parallel tool threads Internal threading.Lock on all counters
TurnRuntimeState.tool_trace Main thread + tool threads _trace_lock
TurnRuntimeState._writes_open Main thread writes; tool threads read _writes_lock
TurnRuntimeState.tool_results Main thread only (after complete_trace() gate) Write gate
ArtifactStore._index Main thread + tool threads Internal threading.Lock
ReplayControl._history Main thread only No lock needed (per-turn instance)

Invariants

These hold for every turn regardless of timeouts, tool failures, or exceptions:

  1. state.close_writes() is always called before finalize().
  2. Every ToolCall in the LLM response receives exactly one tool-role message appended to messages (the exception handler in the loop guarantees this).
  3. TurnFinalizer.finalize() never raises — all sub-steps are exception-wrapped.
  4. ReflectionEngine.reflect() always returns a ReflectionResult (stub on budget exhaustion, deadline expiry, or LLM failure).
  5. BoundedToolExecutor.execute() never raises — it returns a typed ToolOutcome.