Agent Harness — Overview & Architecture¶
Design Philosophy¶
AgentHarness is a thin façade: it holds no mutable per-turn state after
__init__. All per-turn data lives in a TurnRuntimeState object created at
the start of each turn and discarded at the end. Ten
single-responsibility collaborators share that object via explicit parameters.
Collaborator Map¶
AgentHarness.run_turn()
│
├─ ThinkingManager.on_turn_start() ← turn counter + thinking state
│
├─ [parallel] _load_memory_context() ← retrieve episodic memory
├─ [parallel] tools.run("get_context_pack") ← vault context retrieval
│
├─ TurnBudget.create() ← explicit resource envelope
│ └─ DeadlineToken.from_budget() ← per-tool cooperative cancellation
│
├─ TurnRuntimeState(budget) ← all mutable turn state
│ ├─ token counts
│ ├─ CitationAccumulator
│ ├─ tool trace (thread-safe)
│ └─ late-write gate
│
├─ TurnPreparer.prepare() ← message assembly, per-turn render hash
│
├─ _execute_turn_loop_v2() ← bounded tool-calling loop
│ │
│ ├─ SemanticToolSelector (optional) ← reduce tool list for focused queries
│ │
│ ├─ ReplayControl.should_skip() ← idempotency-aware dedup
│ │ ├─ idempotent + prior success → skip
│ │ ├─ idempotent + prior failure → allow retry
│ │ └─ non-idempotent → always allow
│ │
│ ├─ ToolScheduler.schedule() ← safe wave partitioning
│ │ ├─ READ_ONLY, non-overlapping keys → parallel wave
│ │ └─ LOCAL_WRITE / NETWORK / DESTRUCTIVE → sequential wave
│ │
│ ├─ BoundedToolExecutor.execute() ← deadline-safe execution
│ │ ├─ DeadlineToken passed into RunContext
│ │ ├─ complete_trace() → False → late-write suppressed
│ │ └─ ArtifactStore ← oversized outputs (TTL-aware)
│ │
│ └─ ReflectionEngine.reflect() ← budget-aware reflection step
│
├─ state.close_writes() ← seal the write gate
│
└─ TurnFinalizer.finalize() ← citations, memory, decisions, judges
Key Design Decisions¶
1. TurnRuntimeState — All Per-Turn State in One Place¶
TurnRuntimeState is created fresh each turn and discarded at the end. The
harness instance is effectively immutable after __init__, so concurrent or
sequential turns cannot share accidental state.
2. Late-Write Blocking¶
A timed-out tool thread may complete in the background after the turn loop has
already moved on. TurnRuntimeState has a write gate (_writes_open). The
harness calls state.close_writes() after the loop exits. The background thread
calls state.complete_trace() when it finishes; if it finds a timed_out entry
in the trace, complete_trace() returns False and the thread suppresses all
side effects.
3. ToolScheduler — LOCAL_WRITE Always Sequential¶
ToolScheduler.schedule() enforces:
- READ_ONLY with non-overlapping resource_keys → parallel wave
- LOCAL_WRITE / NETWORK / DESTRUCTIVE → always sequential (single-call wave)
This prevents two vault-writing or database-writing tools from interleaving and corrupting state.
4. Per-Tool Deadline Propagation¶
Each tool call receives per_tool_s = budget.per_tool_remaining_s(cap_s) as
its deadline, not the full turn timeout. This prevents a single long-running
tool from claiming the entire remaining budget. A DeadlineToken is also passed
so cooperative tools can exit early.
5. Typed ToolOutcomes¶
All tool outcomes are a closed union of frozen dataclasses:
| Type | When used |
|---|---|
ToolExecutionResult |
Successful execution |
ToolTimeout |
Exceeded deadline |
ToolFailure |
Exception or error response |
ToolDenied |
Pre-execution skip (duplicate, blocked, validation, deadline, write denied) |
ToolArtifactReference |
Oversized output stored in ArtifactStore |
6. ArtifactStore — Session-Scoped, TTL-Aware Output Storage¶
When a tool output exceeds 12,000 characters, it is stored in ArtifactStore
(under /tmp/sb_artifacts/<session_id>/) instead of being injected raw into the
LLM context. The model receives a compact reference and retrieves the full content
via read_file. Artifacts expire after 1 hour by default.
7. ReplayControl — Idempotency-Aware Dedup¶
ReplayControl.should_skip() checks ToolSpec.idempotent and the status of the
prior call before suppressing a duplicate:
- Non-idempotent → always allow (e.g. send_email may legitimately be called twice)
- Idempotent + prior SUCCESS → skip
- Idempotent + prior FAILURE / TIMEOUT → allow (retry)
8. Per-Turn prompt_render_hash¶
TurnPreparer.prepare() computes the render hash from system_prompt_override
on every call. If the system prompt changes mid-session, the hash correctly
reflects the new content. TurnPreparation is a frozen dataclass so consumers
cannot accidentally mutate assembled turn state.
Thread Safety Model¶
| Object | Mutating threads | Safety mechanism |
|---|---|---|
TurnBudget |
Main thread + parallel tool threads | Internal threading.Lock on all counters |
TurnRuntimeState.tool_trace |
Main thread + tool threads | _trace_lock |
TurnRuntimeState._writes_open |
Main thread writes; tool threads read | _writes_lock |
TurnRuntimeState.tool_results |
Main thread only (after complete_trace() gate) |
Write gate |
ArtifactStore._index |
Main thread + tool threads | Internal threading.Lock |
ReplayControl._history |
Main thread only | No lock needed (per-turn instance) |
Invariants¶
These hold for every turn regardless of timeouts, tool failures, or exceptions:
state.close_writes()is always called beforefinalize().- Every
ToolCallin the LLM response receives exactly one tool-role message appended tomessages(the exception handler in the loop guarantees this). TurnFinalizer.finalize()never raises — all sub-steps are exception-wrapped.ReflectionEngine.reflect()always returns aReflectionResult(stub on budget exhaustion, deadline expiry, or LLM failure).BoundedToolExecutor.execute()never raises — it returns a typedToolOutcome.