Agent Harness — Testing Guide
Test Files
| File |
Tests |
What it covers |
tests/agent/test_harness_refactor.py |
78 |
All 10 collaborators, unit-level |
tests/agent/test_agent_harness_timeouts.py |
2 |
Timeout handling, _tool_timeout_cap() |
Running Tests
# All harness tests
.venv/bin/python -m pytest tests/agent/test_harness_refactor.py tests/agent/test_agent_harness_timeouts.py -v
# Just the collaborator unit tests
.venv/bin/python -m pytest tests/agent/test_harness_refactor.py -v
# Quick smoke run (no output capture)
.venv/bin/python -m pytest tests/agent/ -q
Coverage Map
TurnBudget / DeadlineToken (8 tests)
| Test |
What it verifies |
test_turn_budget_create_sets_deadline |
deadline = perf_counter() + timeout_s |
test_turn_budget_claim_step_exhaustion |
Returns False after max_steps claims |
test_turn_budget_is_expired |
Wall-clock deadline check |
test_turn_budget_per_tool_remaining_min_floor |
Never returns less than 5.0s |
test_turn_budget_snapshot |
Snapshot dict has all expected keys |
test_deadline_token_expires |
Expiry after deadline |
test_deadline_token_cancel |
cancel() makes it immediately expired |
test_deadline_token_from_budget |
Deadline = min(turn_deadline, now + cap) |
TurnRuntimeState (11 tests)
| Test |
What it verifies |
test_turn_state_initial_values |
All fields at safe defaults |
test_turn_state_add_tokens |
Accumulation is additive |
test_turn_state_trace_lifecycle_normal |
running → complete returns True |
test_turn_state_trace_lifecycle_timeout |
running → timed_out → late returns False |
test_turn_state_late_write_gate |
close_writes() makes is_accepting_writes() False |
test_turn_state_repair_count |
increment_repair + repair_count |
test_turn_state_record_tool_result |
Appends to tool_results |
test_turn_state_blocked_tool_names |
Set operations |
test_turn_state_citations |
CitationAccumulator fields |
test_turn_state_trace_lock_thread_safety |
No data race under concurrent writes |
test_turn_state_writes_lock_thread_safety |
No data race under concurrent close/check |
| Test |
What it verifies |
test_tool_execution_result_frozen |
Mutation raises FrozenInstanceError |
test_outcome_to_model_content_success |
JSON of output dict |
test_outcome_to_model_content_timeout |
Error JSON with timed_out: true |
test_outcome_to_model_content_failure |
Error JSON |
test_outcome_to_model_content_denied_duplicate |
Warning dict |
test_outcome_to_model_content_denied_validation |
Hint dict |
test_outcome_to_model_content_artifact |
Reference dict |
test_outcome_predicates |
outcome_is_error, outcome_blocks_tool |
ArtifactStore (8 tests)
| Test |
What it verifies |
test_artifact_store_creates_file |
File written to disk |
test_artifact_store_retrieve_success |
Round-trip read |
test_artifact_store_retrieve_expired |
Returns None after TTL |
test_artifact_store_retrieve_missing |
Returns None for unknown ID |
test_artifact_store_cleanup_session |
Removes all session artifacts |
test_artifact_store_cleanup_expired |
Removes expired artifacts |
test_artifact_store_stats |
Returns count and total bytes |
test_get_global_artifact_store_singleton |
Same instance returned twice |
ReplayControl (8 tests)
| Test |
What it verifies |
test_replay_first_call_allowed |
No prior history → allow |
test_replay_idempotent_duplicate_skipped |
Prior success → skip |
test_replay_idempotent_failure_retry_allowed |
Prior failure → allow |
test_replay_idempotent_timeout_retry_allowed |
Prior timeout → allow |
test_replay_non_idempotent_always_allowed |
idempotent=False → never skip |
test_replay_no_spec_conservative_dedup |
No ToolSpec → treated as idempotent |
test_replay_record_success_then_skip |
Record + check cycle |
test_replay_history_size |
Counter increments |
| Test |
What it verifies |
test_scheduler_empty_calls |
Returns empty list |
test_scheduler_single_call |
Always sequential |
test_scheduler_parallel_read_only |
Two non-overlapping READ_ONLY → parallel wave |
test_scheduler_local_write_sequential |
LOCAL_WRITE → own sequential wave |
test_scheduler_resource_overlap_splits |
Overlapping READ_ONLY calls → two waves |
test_scheduler_parallel_disabled |
parallel_enabled=False → all sequential |
TurnPreparer (8 tests)
| Test |
What it verifies |
test_turn_preparer_builds_messages |
System + user message present |
test_turn_preparer_includes_history |
History injected at correct position |
test_turn_preparer_memory_context |
Memory text injected as system message |
test_turn_preparer_render_hash_per_turn |
Hash changes when prompt changes |
test_turn_preparer_hash_not_sticky |
Two calls with different prompts → different hashes |
test_turn_preparer_skill_context |
Skill context message injected |
test_turn_preparer_no_skill_context |
No extra message when skill_context=None |
test_turn_preparer_resolved_model_inferred |
Model inferred from provider |
| Test |
What it verifies |
test_executor_deadline_gate |
Expired budget → ToolDenied(reason="deadline") |
test_executor_successful_execution |
Returns ToolExecutionResult |
test_executor_tool_failure |
Exception → ToolFailure |
test_executor_timeout |
Thread timeout → ToolTimeout |
test_executor_late_write_suppressed |
Post-deadline completion → side effects skipped |
test_executor_oversized_output |
Output > 12k → ToolArtifactReference |
test_executor_web_mode_off |
web_search disabled |
test_executor_write_confirmation_denied |
Write denied → ToolDenied(reason="write_denied") |
test_executor_write_confirmation_approved |
Write approved → tool re-executed |
ReflectionEngine (5 tests)
| Test |
What it verifies |
test_reflection_budget_exhausted |
Budget=0 → stub result returned |
test_reflection_deadline_expired |
Expired budget → stub result |
test_reflection_success |
LLM call returns text → ReflectionResult |
test_reflection_llm_error |
Provider raises → stub result, no exception |
test_reflection_timeout_capped |
timeout_s ≤ budget.remaining_s() |
TurnFinalizer (10 tests)
| Test |
What it verifies |
test_finalizer_returns_turn_result |
Return type and field values |
test_finalizer_citation_append |
Citations added when show_citations=True |
test_finalizer_no_citations |
No append when show_citations=False |
test_finalizer_memory_extraction |
extract_turn() called |
test_finalizer_memory_extraction_exception |
Exception swallowed |
test_finalizer_decision_record_direct |
direct_answer strategy |
test_finalizer_decision_record_retrieval |
retrieval_augmented strategy |
test_finalizer_decision_record_exception |
Exception swallowed |
test_finalizer_no_decision_store |
Skipped when decision_store=None |
test_finalizer_judge_scheduling |
schedule_prompt_judges called |
Writing New Tests
Testing a collaborator in isolation
Each collaborator accepts all its dependencies as constructor arguments.
Use MagicMock for anything you don't need:
from unittest.mock import MagicMock
from brain.agent.turn_budget import TurnBudget
from brain.agent.turn_state import TurnRuntimeState
from brain.agent.tool_executor_v2 import BoundedToolExecutor
def test_my_feature():
budget = TurnBudget.create(timeout_s=60)
state = TurnRuntimeState(session_id="test", budget=budget)
executor = BoundedToolExecutor(
tools_registry=MagicMock(),
callbacks=MagicMock(),
event_log=MagicMock(),
checkpoint_store=None,
artifact_store=MagicMock(),
thinking_manager=MagicMock(),
)
...
Testing timeout behaviour
Patch BoundedToolExecutor.execute directly on a harness._bounded_executor
instance to simulate ToolTimeout or ToolFailure outcomes without needing
real tool threads:
from brain.agent.tool_outcomes import ToolFailure
def fake_execute(call, state, *, web_mode, budget, tool_timeout_cap_s):
return ToolFailure(call_id=call.id, tool_name=call.name,
error="simulated timeout", retryable=False)
harness._bounded_executor.execute = fake_execute
Testing run_turn() end-to-end
Provide a _Provider class that returns ProviderResult with controlled
tool_calls / text values. Use a _Tools stub that responds to tools.run()
and tools.available_tool_schemas(). See
tests/agent/test_agent_harness_timeouts.py for a complete example.