Skip to content

AgentHarness — API Reference

File: brain/agent/harness.py

AgentHarness is the public entry point for executing a single chat turn. It is a thin façade that wires together 10 single-responsibility collaborators. After __init__ completes, the instance is effectively immutable — no fields are mutated during run_turn().


Constructor

AgentHarness(
    provider: LLMProvider,
    tools: AgentToolRegistry,
    event_log: EventLog,
    *,
    # Optional dependencies
    memory_retriever: MemoryRetriever | None = None,
    memory_extractor: MemoryExtractor | None = None,
    autonomous_research_learner: AutonomousResearchLearner | None = None,
    decision_store: RuntimeDecisionStore | None = None,
    # Turn limits
    max_steps: int = 6,
    timeout_s: int = 60,
    # Thinking / reflection
    think_level: ThinkLevel | str = ThinkLevel.OFF,
    reflect_every: int = 0,
    # Callbacks + streaming
    callbacks: AgentCallbacks | None = None,
    stream: bool = False,
    # Prompt tracking
    system_prompt: str | None = None,
    prompt_id: str | None = None,
    prompt_version: str | None = None,
    prompt_tags: frozenset[str] | None = None,
    prompt_engine: str | None = None,
    prompt_render_hash: str | None = None,
    prompt_variables: tuple[str, ...] | None = None,
    # Logging
    redact_logs: bool = False,
    # Session management
    checkpoint_store: CheckpointStore | None = None,
    cost_tracker: SessionCostTracker | None = None,
    # Tool management
    parallel_enabled: bool = True,
    context_warning_tracker: ContextWarningTracker | None = None,
    run_context_metadata: dict[str, Any] | None = None,
    enable_tool_validation: bool = True,
    enable_semantic_selection: bool = False,
    max_tools_per_turn: int = 25,
    auto_compact: bool = True,
)

Required Parameters

Parameter Type Description
provider LLMProvider LLM provider (Anthropic, OpenAI, Ollama, chain, etc.)
tools AgentToolRegistry Registry of available tools with permission mode
event_log EventLog Persists all events to SQLite

Turn Limits

Parameter Default Description
max_steps 6 Maximum LLM generation steps per turn (loop iterations)
timeout_s 60 Absolute wall-clock deadline for the entire turn, in seconds

Thinking & Reflection

Parameter Default Description
think_level ThinkLevel.OFF OFF / LOW / MEDIUM / HIGH — controls the thinking prefix in the system prompt and token budget
reflect_every 0 0 = never reflect; N = run a reflection step every N turns

Tool Management

Parameter Default Description
parallel_enabled True Global kill-switch for parallel wave execution
enable_tool_validation True Validate tool arguments against the tool schema before execution; auto-repair up to MAX_REPAIR_ATTEMPTS
enable_semantic_selection False Use SemanticToolSelector to reduce the tool list to max_tools_per_turn most relevant tools
max_tools_per_turn 25 Maximum tools exposed to the LLM per turn (only relevant when enable_semantic_selection=True)
auto_compact True Automatically summarise older messages when the context hits a "critical" token threshold

Optional Integrations

Parameter Description
memory_retriever Provides episodic memories for context augmentation
memory_extractor Extracts and persists memories from completed turns
autonomous_research_learner Observes turns to improve research quality over time
decision_store Persists RuntimeDecisionRecord for each turn
checkpoint_store Snapshots vault files before confirmed writes (enables undo)
cost_tracker Accumulates token costs per session
context_warning_tracker Monitors context window usage; triggers compaction at thresholds

Prompt Tracking

These fields are used only for OTEL span attributes and event log provenance. They do not affect the system prompt content unless system_prompt is set.

Parameter Description
prompt_id Logical prompt template ID (e.g. "agent.profile.planner")
prompt_version Semver string of the prompt template
prompt_tags Arbitrary frozenset of string tags for filtering
prompt_engine Rendering engine identifier (default "minimal")
prompt_render_hash Initial render hash (overwritten per-turn by TurnPreparer)
prompt_variables Variable names used in the prompt template

run_turn()

def run_turn(
    self,
    session_id: str,
    history: list[dict[str, str]],
    user_message: str,
    top_k: int,
    web_mode: str,
    show_citations: bool,
    runtime_provider: str = "",
    runtime_model: str = "",
    skill_context: str | None = None,
) -> TurnResult:

Parameters

Parameter Description
session_id Stable string identifier for this session; used as a key in event log, artifact store, and decision records
history Previous turns as [{"role": "user"/"assistant", "content": "..."}]
user_message The current user query
top_k Number of vault facts to retrieve via get_context_pack
web_mode "off" disables web_search; "on" enables it
show_citations If True, append citation lists to final_text
runtime_provider Provider name injected into the runtime metadata system message
runtime_model Model name injected into the runtime metadata system message
skill_context Optional additional system message (e.g. for active skill context)

Returns: TurnResult

@dataclass
class TurnResult:
    text: str                               # Final assistant response
    local_citations: list[str]             # Vault anchors cited this turn
    web_citations: list[dict[str, str]]    # Web source dicts cited this turn
    rendered_content_paths: list[str]      # Paths to rendered web content
    context_pack_json: dict[str, Any] | None  # Full context pack used
    input_tokens: int                       # Total input tokens consumed
    output_tokens: int                      # Total output tokens produced
    cache_read_tokens: int                  # Tokens read from provider cache
    cache_creation_tokens: int             # Tokens used to populate provider cache

Internal Methods (Reference)

These are not public API but are documented for contributors:

_execute_turn_loop_v2(state, messages, web_mode, resolved_model, user_message)

The main bounded tool-calling loop. Mutates messages (appending assistant and tool role messages) and state (token counts, citations, final text) in place.

_exec_call_v2(call, state, web_mode, replay_control) → ToolOutcome

Runs the 7-gate pre-execution pipeline then delegates to BoundedToolExecutor. Never raises.

_apply_outcome_to_state(tool_name, outcome, state)

Updates state.blocked_tool_names on non-retryable failure and extracts citations from local_search / get_context_pack / web_search outputs.

_load_memory_context(user_message, session_id) → _MemoryContextBundle

Retrieves episodic memory for the query. Returns an empty bundle if the memory_retriever is None or if retrieval fails.

_tool_timeout_cap(tool_name) → float

Reads tool_spec.metadata["tool_timeout_s"] from the registry. Falls back to TOOL_CALL_TIMEOUT_S = 45.

_tool_timeout_retryable(tool_name) → bool

Reads tool_spec.metadata["retry_on_timeout"]. Defaults to True.

_generate_with_span(messages, tool_schemas, model_name, timeout_s) → ProviderResult

Calls provider.generate() (or generate_stream()) wrapped in an OTEL LLM span. Sets secondbrain.prompt.* attributes on the span.

_consume_stream(messages, tool_schemas) → ProviderResult

Collects streaming events (text_delta, tool_call_start, done) and fires on_stream_chunk callbacks. Returns a complete ProviderResult.


Module-Level Constants

Constant Value Description
SYSTEM_PROMPT From prompt store Default agent profile system prompt
TOOL_MESSAGE_MAX_CHARS 12_000 Max inline tool output size; larger outputs go to ArtifactStore
TOOL_CALL_TIMEOUT_S 45 Default per-tool timeout if not specified in metadata