AgentHarness — API Reference¶
File: brain/agent/harness.py
AgentHarness is the public entry point for executing a single chat turn.
It is a thin façade that wires together 10 single-responsibility collaborators.
After __init__ completes, the instance is effectively immutable — no fields
are mutated during run_turn().
Constructor¶
AgentHarness(
provider: LLMProvider,
tools: AgentToolRegistry,
event_log: EventLog,
*,
# Optional dependencies
memory_retriever: MemoryRetriever | None = None,
memory_extractor: MemoryExtractor | None = None,
autonomous_research_learner: AutonomousResearchLearner | None = None,
decision_store: RuntimeDecisionStore | None = None,
# Turn limits
max_steps: int = 6,
timeout_s: int = 60,
# Thinking / reflection
think_level: ThinkLevel | str = ThinkLevel.OFF,
reflect_every: int = 0,
# Callbacks + streaming
callbacks: AgentCallbacks | None = None,
stream: bool = False,
# Prompt tracking
system_prompt: str | None = None,
prompt_id: str | None = None,
prompt_version: str | None = None,
prompt_tags: frozenset[str] | None = None,
prompt_engine: str | None = None,
prompt_render_hash: str | None = None,
prompt_variables: tuple[str, ...] | None = None,
# Logging
redact_logs: bool = False,
# Session management
checkpoint_store: CheckpointStore | None = None,
cost_tracker: SessionCostTracker | None = None,
# Tool management
parallel_enabled: bool = True,
context_warning_tracker: ContextWarningTracker | None = None,
run_context_metadata: dict[str, Any] | None = None,
enable_tool_validation: bool = True,
enable_semantic_selection: bool = False,
max_tools_per_turn: int = 25,
auto_compact: bool = True,
)
Required Parameters¶
| Parameter | Type | Description |
|---|---|---|
provider |
LLMProvider |
LLM provider (Anthropic, OpenAI, Ollama, chain, etc.) |
tools |
AgentToolRegistry |
Registry of available tools with permission mode |
event_log |
EventLog |
Persists all events to SQLite |
Turn Limits¶
| Parameter | Default | Description |
|---|---|---|
max_steps |
6 |
Maximum LLM generation steps per turn (loop iterations) |
timeout_s |
60 |
Absolute wall-clock deadline for the entire turn, in seconds |
Thinking & Reflection¶
| Parameter | Default | Description |
|---|---|---|
think_level |
ThinkLevel.OFF |
OFF / LOW / MEDIUM / HIGH — controls the thinking prefix in the system prompt and token budget |
reflect_every |
0 |
0 = never reflect; N = run a reflection step every N turns |
Tool Management¶
| Parameter | Default | Description |
|---|---|---|
parallel_enabled |
True |
Global kill-switch for parallel wave execution |
enable_tool_validation |
True |
Validate tool arguments against the tool schema before execution; auto-repair up to MAX_REPAIR_ATTEMPTS |
enable_semantic_selection |
False |
Use SemanticToolSelector to reduce the tool list to max_tools_per_turn most relevant tools |
max_tools_per_turn |
25 |
Maximum tools exposed to the LLM per turn (only relevant when enable_semantic_selection=True) |
auto_compact |
True |
Automatically summarise older messages when the context hits a "critical" token threshold |
Optional Integrations¶
| Parameter | Description |
|---|---|
memory_retriever |
Provides episodic memories for context augmentation |
memory_extractor |
Extracts and persists memories from completed turns |
autonomous_research_learner |
Observes turns to improve research quality over time |
decision_store |
Persists RuntimeDecisionRecord for each turn |
checkpoint_store |
Snapshots vault files before confirmed writes (enables undo) |
cost_tracker |
Accumulates token costs per session |
context_warning_tracker |
Monitors context window usage; triggers compaction at thresholds |
Prompt Tracking¶
These fields are used only for OTEL span attributes and event log provenance.
They do not affect the system prompt content unless system_prompt is set.
| Parameter | Description |
|---|---|
prompt_id |
Logical prompt template ID (e.g. "agent.profile.planner") |
prompt_version |
Semver string of the prompt template |
prompt_tags |
Arbitrary frozenset of string tags for filtering |
prompt_engine |
Rendering engine identifier (default "minimal") |
prompt_render_hash |
Initial render hash (overwritten per-turn by TurnPreparer) |
prompt_variables |
Variable names used in the prompt template |
run_turn()¶
def run_turn(
self,
session_id: str,
history: list[dict[str, str]],
user_message: str,
top_k: int,
web_mode: str,
show_citations: bool,
runtime_provider: str = "",
runtime_model: str = "",
skill_context: str | None = None,
) -> TurnResult:
Parameters¶
| Parameter | Description |
|---|---|
session_id |
Stable string identifier for this session; used as a key in event log, artifact store, and decision records |
history |
Previous turns as [{"role": "user"/"assistant", "content": "..."}] |
user_message |
The current user query |
top_k |
Number of vault facts to retrieve via get_context_pack |
web_mode |
"off" disables web_search; "on" enables it |
show_citations |
If True, append citation lists to final_text |
runtime_provider |
Provider name injected into the runtime metadata system message |
runtime_model |
Model name injected into the runtime metadata system message |
skill_context |
Optional additional system message (e.g. for active skill context) |
Returns: TurnResult¶
@dataclass
class TurnResult:
text: str # Final assistant response
local_citations: list[str] # Vault anchors cited this turn
web_citations: list[dict[str, str]] # Web source dicts cited this turn
rendered_content_paths: list[str] # Paths to rendered web content
context_pack_json: dict[str, Any] | None # Full context pack used
input_tokens: int # Total input tokens consumed
output_tokens: int # Total output tokens produced
cache_read_tokens: int # Tokens read from provider cache
cache_creation_tokens: int # Tokens used to populate provider cache
Internal Methods (Reference)¶
These are not public API but are documented for contributors:
_execute_turn_loop_v2(state, messages, web_mode, resolved_model, user_message)¶
The main bounded tool-calling loop. Mutates messages (appending assistant
and tool role messages) and state (token counts, citations, final text) in place.
_exec_call_v2(call, state, web_mode, replay_control) → ToolOutcome¶
Runs the 7-gate pre-execution pipeline then delegates to BoundedToolExecutor.
Never raises.
_apply_outcome_to_state(tool_name, outcome, state)¶
Updates state.blocked_tool_names on non-retryable failure and extracts
citations from local_search / get_context_pack / web_search outputs.
_load_memory_context(user_message, session_id) → _MemoryContextBundle¶
Retrieves episodic memory for the query. Returns an empty bundle if the
memory_retriever is None or if retrieval fails.
_tool_timeout_cap(tool_name) → float¶
Reads tool_spec.metadata["tool_timeout_s"] from the registry. Falls back
to TOOL_CALL_TIMEOUT_S = 45.
_tool_timeout_retryable(tool_name) → bool¶
Reads tool_spec.metadata["retry_on_timeout"]. Defaults to True.
_generate_with_span(messages, tool_schemas, model_name, timeout_s) → ProviderResult¶
Calls provider.generate() (or generate_stream()) wrapped in an OTEL LLM span.
Sets secondbrain.prompt.* attributes on the span.
_consume_stream(messages, tool_schemas) → ProviderResult¶
Collects streaming events (text_delta, tool_call_start, done) and fires
on_stream_chunk callbacks. Returns a complete ProviderResult.
Module-Level Constants¶
| Constant | Value | Description |
|---|---|---|
SYSTEM_PROMPT |
From prompt store | Default agent profile system prompt |
TOOL_MESSAGE_MAX_CHARS |
12_000 |
Max inline tool output size; larger outputs go to ArtifactStore |
TOOL_CALL_TIMEOUT_S |
45 |
Default per-tool timeout if not specified in metadata |