Agent Harness — Overview & Architecture¶

Design Philosophy¶

AgentHarness is a thin façade: it holds no mutable per-turn state after __init__. All per-turn data lives in a TurnRuntimeState object created at the start of each turn and discarded at the end. Ten single-responsibility collaborators share that object via explicit parameters.

Collaborator Map¶

AgentHarness.run_turn()
 │
 ├─ ThinkingManager.on_turn_start()         ← turn counter + thinking state
 │
 ├─ [parallel] _load_memory_context()       ← retrieve episodic memory
 ├─ [parallel] tools.run("get_context_pack") ← vault context retrieval
 │
 ├─ TurnBudget.create()                     ← explicit resource envelope
 │   └─ DeadlineToken.from_budget()         ← per-tool cooperative cancellation
 │
 ├─ TurnRuntimeState(budget)                ← all mutable turn state
 │   ├─ token counts
 │   ├─ CitationAccumulator
 │   ├─ tool trace (thread-safe)
 │   └─ late-write gate
 │
 ├─ TurnPreparer.prepare()                  ← message assembly, per-turn render hash
 │
 ├─ _execute_turn_loop_v2()                 ← bounded tool-calling loop
 │   │
 │   ├─ SemanticToolSelector (optional)     ← reduce tool list for focused queries
 │   │
 │   ├─ ReplayControl.should_skip()         ← idempotency-aware dedup
 │   │   ├─ idempotent + prior success   → skip
 │   │   ├─ idempotent + prior failure   → allow retry
 │   │   └─ non-idempotent               → always allow
 │   │
 │   ├─ ToolScheduler.schedule()            ← safe wave partitioning
 │   │   ├─ READ_ONLY, non-overlapping keys → parallel wave
 │   │   └─ LOCAL_WRITE / NETWORK / DESTRUCTIVE → sequential wave
 │   │
 │   ├─ BoundedToolExecutor.execute()       ← deadline-safe execution
 │   │   ├─ DeadlineToken passed into RunContext
 │   │   ├─ complete_trace() → False        → late-write suppressed
 │   │   └─ ArtifactStore                  ← oversized outputs (TTL-aware)
 │   │
 │   └─ ReflectionEngine.reflect()          ← budget-aware reflection step
 │
 ├─ state.close_writes()                    ← seal the write gate
 │
 └─ TurnFinalizer.finalize()                ← citations, memory, decisions, judges

Key Design Decisions¶

1. TurnRuntimeState — All Per-Turn State in One Place¶

TurnRuntimeState is created fresh each turn and discarded at the end. The harness instance is effectively immutable after __init__, so concurrent or sequential turns cannot share accidental state.

2. Late-Write Blocking¶

A timed-out tool thread may complete in the background after the turn loop has already moved on. TurnRuntimeState has a write gate (_writes_open). The harness calls state.close_writes() after the loop exits. The background thread calls state.complete_trace() when it finishes; if it finds a timed_out entry in the trace, complete_trace() returns False and the thread suppresses all side effects.

3. ToolScheduler — LOCAL_WRITE Always Sequential¶

ToolScheduler.schedule() enforces: - READ_ONLY with non-overlapping resource_keys → parallel wave - LOCAL_WRITE / NETWORK / DESTRUCTIVE → always sequential (single-call wave)

This prevents two vault-writing or database-writing tools from interleaving and corrupting state.

4. Per-Tool Deadline Propagation¶

Each tool call receives per_tool_s = budget.per_tool_remaining_s(cap_s) as its deadline, not the full turn timeout. This prevents a single long-running tool from claiming the entire remaining budget. A DeadlineToken is also passed so cooperative tools can exit early.

5. Typed ToolOutcomes¶

All tool outcomes are a closed union of frozen dataclasses:

Type	When used
`ToolExecutionResult`	Successful execution
`ToolTimeout`	Exceeded deadline
`ToolFailure`	Exception or error response
`ToolDenied`	Pre-execution skip (duplicate, blocked, validation, deadline, write denied)
`ToolArtifactReference`	Oversized output stored in ArtifactStore

6. ArtifactStore — Session-Scoped, TTL-Aware Output Storage¶

When a tool output exceeds 12,000 characters, it is stored in ArtifactStore (under /tmp/sb_artifacts/<session_id>/) instead of being injected raw into the LLM context. The model receives a compact reference and retrieves the full content via read_file. Artifacts expire after 1 hour by default.

7. ReplayControl — Idempotency-Aware Dedup¶

ReplayControl.should_skip() checks ToolSpec.idempotent and the status of the prior call before suppressing a duplicate: - Non-idempotent → always allow (e.g. send_email may legitimately be called twice) - Idempotent + prior SUCCESS → skip - Idempotent + prior FAILURE / TIMEOUT → allow (retry)

8. Per-Turn prompt_render_hash¶

TurnPreparer.prepare() computes the render hash from system_prompt_override on every call. If the system prompt changes mid-session, the hash correctly reflects the new content. TurnPreparation is a frozen dataclass so consumers cannot accidentally mutate assembled turn state.

Thread Safety Model¶

Object	Mutating threads	Safety mechanism
`TurnBudget`	Main thread + parallel tool threads	Internal `threading.Lock` on all counters
`TurnRuntimeState.tool_trace`	Main thread + tool threads	`_trace_lock`
`TurnRuntimeState._writes_open`	Main thread writes; tool threads read	`_writes_lock`
`TurnRuntimeState.tool_results`	Main thread only (after `complete_trace()` gate)	Write gate
`ArtifactStore._index`	Main thread + tool threads	Internal `threading.Lock`
`ReplayControl._history`	Main thread only	No lock needed (per-turn instance)

Invariants¶

These hold for every turn regardless of timeouts, tool failures, or exceptions:

state.close_writes() is always called before finalize().
Every ToolCall in the LLM response receives exactly one tool-role message appended to messages (the exception handler in the loop guarantees this).
TurnFinalizer.finalize() never raises — all sub-steps are exception-wrapped.
ReflectionEngine.reflect() always returns a ReflectionResult (stub on budget exhaustion, deadline expiry, or LLM failure).
BoundedToolExecutor.execute() never raises — it returns a typed ToolOutcome.