AgentHarness — API Reference¶

File: brain/agent/harness.py

AgentHarness is the public entry point for executing a single chat turn. It is a thin façade that wires together 10 single-responsibility collaborators. After __init__ completes, the instance is effectively immutable — no fields are mutated during run_turn().

Constructor¶

AgentHarness(
    provider: LLMProvider,
    tools: AgentToolRegistry,
    event_log: EventLog,
    *,
    # Optional dependencies
    memory_retriever: MemoryRetriever | None = None,
    memory_extractor: MemoryExtractor | None = None,
    autonomous_research_learner: AutonomousResearchLearner | None = None,
    decision_store: RuntimeDecisionStore | None = None,
    # Turn limits
    max_steps: int = 6,
    timeout_s: int = 60,
    # Thinking / reflection
    think_level: ThinkLevel | str = ThinkLevel.OFF,
    reflect_every: int = 0,
    # Callbacks + streaming
    callbacks: AgentCallbacks | None = None,
    stream: bool = False,
    # Prompt tracking
    system_prompt: str | None = None,
    prompt_id: str | None = None,
    prompt_version: str | None = None,
    prompt_tags: frozenset[str] | None = None,
    prompt_engine: str | None = None,
    prompt_render_hash: str | None = None,
    prompt_variables: tuple[str, ...] | None = None,
    # Logging
    redact_logs: bool = False,
    # Session management
    checkpoint_store: CheckpointStore | None = None,
    cost_tracker: SessionCostTracker | None = None,
    # Tool management
    parallel_enabled: bool = True,
    context_warning_tracker: ContextWarningTracker | None = None,
    run_context_metadata: dict[str, Any] | None = None,
    enable_tool_validation: bool = True,
    enable_semantic_selection: bool = False,
    max_tools_per_turn: int = 25,
    auto_compact: bool = True,
)

Required Parameters¶

Parameter	Type	Description
`provider`	`LLMProvider`	LLM provider (Anthropic, OpenAI, Ollama, chain, etc.)
`tools`	`AgentToolRegistry`	Registry of available tools with permission mode
`event_log`	`EventLog`	Persists all events to SQLite

Turn Limits¶

Parameter	Default	Description
`max_steps`	`6`	Maximum LLM generation steps per turn (loop iterations)
`timeout_s`	`60`	Absolute wall-clock deadline for the entire turn, in seconds

Thinking & Reflection¶

Parameter	Default	Description
`think_level`	`ThinkLevel.OFF`	`OFF` / `LOW` / `MEDIUM` / `HIGH` — controls the thinking prefix in the system prompt and token budget
`reflect_every`	`0`	0 = never reflect; N = run a reflection step every N turns

Tool Management¶

Parameter	Default	Description
`parallel_enabled`	`True`	Global kill-switch for parallel wave execution
`enable_tool_validation`	`True`	Validate tool arguments against the tool schema before execution; auto-repair up to `MAX_REPAIR_ATTEMPTS`
`enable_semantic_selection`	`False`	Use `SemanticToolSelector` to reduce the tool list to `max_tools_per_turn` most relevant tools
`max_tools_per_turn`	`25`	Maximum tools exposed to the LLM per turn (only relevant when `enable_semantic_selection=True`)
`auto_compact`	`True`	Automatically summarise older messages when the context hits a "critical" token threshold

Optional Integrations¶

Parameter	Description
`memory_retriever`	Provides episodic memories for context augmentation
`memory_extractor`	Extracts and persists memories from completed turns
`autonomous_research_learner`	Observes turns to improve research quality over time
`decision_store`	Persists `RuntimeDecisionRecord` for each turn
`checkpoint_store`	Snapshots vault files before confirmed writes (enables undo)
`cost_tracker`	Accumulates token costs per session
`context_warning_tracker`	Monitors context window usage; triggers compaction at thresholds

Prompt Tracking¶

These fields are used only for OTEL span attributes and event log provenance. They do not affect the system prompt content unless system_prompt is set.

Parameter	Description
`prompt_id`	Logical prompt template ID (e.g. `"agent.profile.planner"`)
`prompt_version`	Semver string of the prompt template
`prompt_tags`	Arbitrary frozenset of string tags for filtering
`prompt_engine`	Rendering engine identifier (default `"minimal"`)
`prompt_render_hash`	Initial render hash (overwritten per-turn by `TurnPreparer`)
`prompt_variables`	Variable names used in the prompt template

`run_turn()`¶

def run_turn(
    self,
    session_id: str,
    history: list[dict[str, str]],
    user_message: str,
    top_k: int,
    web_mode: str,
    show_citations: bool,
    runtime_provider: str = "",
    runtime_model: str = "",
    skill_context: str | None = None,
) -> TurnResult:

Parameters¶

Parameter	Description
`session_id`	Stable string identifier for this session; used as a key in event log, artifact store, and decision records
`history`	Previous turns as `[{"role": "user"/"assistant", "content": "..."}]`
`user_message`	The current user query
`top_k`	Number of vault facts to retrieve via `get_context_pack`
`web_mode`	`"off"` disables `web_search`; `"on"` enables it
`show_citations`	If `True`, append citation lists to `final_text`
`runtime_provider`	Provider name injected into the runtime metadata system message
`runtime_model`	Model name injected into the runtime metadata system message
`skill_context`	Optional additional system message (e.g. for active skill context)

Returns: `TurnResult`¶

@dataclass
class TurnResult:
    text: str                               # Final assistant response
    local_citations: list[str]             # Vault anchors cited this turn
    web_citations: list[dict[str, str]]    # Web source dicts cited this turn
    rendered_content_paths: list[str]      # Paths to rendered web content
    context_pack_json: dict[str, Any] | None  # Full context pack used
    input_tokens: int                       # Total input tokens consumed
    output_tokens: int                      # Total output tokens produced
    cache_read_tokens: int                  # Tokens read from provider cache
    cache_creation_tokens: int             # Tokens used to populate provider cache

Internal Methods (Reference)¶

These are not public API but are documented for contributors:

`_execute_turn_loop_v2(state, messages, web_mode, resolved_model, user_message)`¶

The main bounded tool-calling loop. Mutates messages (appending assistant and tool role messages) and state (token counts, citations, final text) in place.

`_exec_call_v2(call, state, web_mode, replay_control) → ToolOutcome`¶

Runs the 7-gate pre-execution pipeline then delegates to BoundedToolExecutor. Never raises.

`_apply_outcome_to_state(tool_name, outcome, state)`¶

Updates state.blocked_tool_names on non-retryable failure and extracts citations from local_search / get_context_pack / web_search outputs.

`_load_memory_context(user_message, session_id) → _MemoryContextBundle`¶

Retrieves episodic memory for the query. Returns an empty bundle if the memory_retriever is None or if retrieval fails.

`_tool_timeout_cap(tool_name) → float`¶

Reads tool_spec.metadata["tool_timeout_s"] from the registry. Falls back to TOOL_CALL_TIMEOUT_S = 45.

`_tool_timeout_retryable(tool_name) → bool`¶

Reads tool_spec.metadata["retry_on_timeout"]. Defaults to True.

`_generate_with_span(messages, tool_schemas, model_name, timeout_s) → ProviderResult`¶

Calls provider.generate() (or generate_stream()) wrapped in an OTEL LLM span. Sets secondbrain.prompt.* attributes on the span.

`_consume_stream(messages, tool_schemas) → ProviderResult`¶

Collects streaming events (text_delta, tool_call_start, done) and fires on_stream_chunk callbacks. Returns a complete ProviderResult.

Module-Level Constants¶

Constant	Value	Description
`SYSTEM_PROMPT`	From prompt store	Default agent profile system prompt
`TOOL_MESSAGE_MAX_CHARS`	`12_000`	Max inline tool output size; larger outputs go to `ArtifactStore`
`TOOL_CALL_TIMEOUT_S`	`45`	Default per-tool timeout if not specified in metadata

AgentHarness — API Reference¶

Constructor¶

Required Parameters¶

Turn Limits¶

Thinking & Reflection¶

Tool Management¶

Optional Integrations¶

Prompt Tracking¶

run_turn()¶

Parameters¶

Returns: TurnResult¶

Internal Methods (Reference)¶

_execute_turn_loop_v2(state, messages, web_mode, resolved_model, user_message)¶

_exec_call_v2(call, state, web_mode, replay_control) → ToolOutcome¶

_apply_outcome_to_state(tool_name, outcome, state)¶

_load_memory_context(user_message, session_id) → _MemoryContextBundle¶

_tool_timeout_cap(tool_name) → float¶

_tool_timeout_retryable(tool_name) → bool¶

_generate_with_span(messages, tool_schemas, model_name, timeout_s) → ProviderResult¶

_consume_stream(messages, tool_schemas) → ProviderResult¶