Testing Agent Patterns¶

brain.patterns.testing provides first-class utilities for writing deterministic, fast, zero-API-key tests against any pattern.

Why test patterns?¶

Real LLM providers are slow, costly, and non-deterministic. The testing module gives you:

MockProvider — replay fixed strings as if they were LLM responses
RecordingProvider — wrap any provider and capture every call for inspection
AgentHarness — a one-liner context manager with built-in assertions
Assertion helpers — assert_ok, assert_answer_contains, assert_tool_called, etc.

All utilities run offline — no API key required.

MockProvider¶

from brain.patterns import ReActAgent
from brain.patterns.testing import MockProvider

provider = MockProvider(responses=[
    "Thought: I should search\nAction: search\nAction Input: RAG",
    "Final Answer: RAG stands for Retrieval-Augmented Generation.",
])

agent = ReActAgent(
    tools={"search": lambda q: f"[result: {q}]"},
    provider=provider,
    max_steps=4,
)
result = agent.run("What is RAG?")
assert result.ok
print(provider.call_count)   # → 2
print(provider.call_log[0])  # → {"messages": [...], "tools": [...], ...}

Key behaviours:

Feature	Details
Fixed responses	Cycles through `responses` list
`cycle=True`	Wraps around; `cycle=False` (default) repeats last response
`call_count`	Number of `generate()` calls so far
`call_log`	Full argument list for every call
`reset()`	Clears count and log without changing responses
`add_response(text)`	Append a response to the queue at runtime

RecordingProvider¶

Wraps any real (or mock) provider and records all interactions:

from brain.patterns import ReflexionAgent
from brain.patterns.testing import MockProvider, RecordingProvider

inner = MockProvider(responses=["Draft.", "Refined draft."])
recorder = RecordingProvider(inner)

agent = ReflexionAgent(provider=recorder, max_iterations=2)
result = agent.run("Explain fine-tuning")

print(len(recorder.records))        # → 2 (one per LLM call)
print(recorder.total_input_tokens)  # → 0 with MockProvider (real provider: actual counts)

Useful for cost auditing and snapshot testing.

Assertion helpers¶

from brain.patterns.testing import (
    assert_ok,
    assert_answer_contains,
    assert_tool_called,
    assert_steps_taken,
    assert_iterations,
)

assert_ok(result)
assert_answer_contains(result, "rag", "retrieval")   # case-insensitive
assert_tool_called(result, "search")                 # checks result.steps[*].action
assert_steps_taken(result, min_steps=1, max_steps=5)
assert_iterations(result, min_iters=2)

All functions raise AssertionError with a descriptive message on failure — compatible with pytest, unittest, and any test runner.

AgentHarness¶

The highest-level API — bundle everything into a context manager:

from brain.patterns.testing import AgentHarness

def test_search_agent():
    with AgentHarness(responses=["Final Answer: RAG uses retrieval."]) as h:
        result = h.react("Explain RAG", tools={"search": lambda q: "[result]"})
        h.assert_ok()
        h.assert_answer_contains("rag")

Pattern factory methods on AgentHarness:

Method	Pattern
`h.react(task, tools=..., max_steps=5)`	`ReActAgent`
`h.rag(task, corpus=..., top_k=3)`	`RAGAgent`
`h.reflexion(task, max_iterations=2)`	`ReflexionAgent`
`h.plan(task, max_plan_steps=3)`	`PlanAndExecuteAgent`

Assertion methods:

Method	Checks
`h.assert_ok(result=None)`	`result.ok == True`
`h.assert_answer_contains(result_or_kw, *kws)`	Keywords in answer
`h.assert_tool_called(tool_name, result=None)`	Tool appears in steps
`h.assert_call_count(n)`	Mock called exactly n times

When result is omitted, assertion methods use h.last_result.

Streaming capture¶

from brain.patterns import StreamingReActAgent, EventKind
from brain.patterns.testing import MockProvider, capture_events

provider = MockProvider(responses=["Final Answer: Done!"])
agent = StreamingReActAgent(tools={}, provider=provider, max_steps=3)

events = capture_events(agent, "Summarise AI safety")
final = next(e for e in events if e.kind == EventKind.FINAL)
print(final.text)  # "Done!"

Complete test example¶

# tests/test_my_agent.py
import pytest
from brain.patterns.testing import AgentHarness, MockProvider, assert_answer_contains

def test_react_finds_answer():
    with AgentHarness(responses=[
        "Thought: search\nAction: web\nAction Input: transformer",
        "Final Answer: Transformers use self-attention mechanisms.",
    ]) as h:
        result = h.react("What is a transformer?", tools={"web": lambda q: "[wiki]"})
        h.assert_ok()
        h.assert_answer_contains("attention")

def test_rag_cites_corpus():
    corpus = ["The API rate limit is 100 req/min.", "Auth uses OAuth2 with JWT."]
    with AgentHarness(responses=["Based on context, OAuth2 is used with JWT tokens."]) as h:
        result = h.rag("What auth method is used?", corpus=corpus)
        h.assert_ok()
        h.assert_answer_contains("oauth2", "jwt")

API Reference¶

brain.patterns.testing.MockProvider ¶

MockProvider(responses: list[str] | None = None, *, cycle: bool = False, model_name: str = 'mock')

Bases: LLMProvider

Deterministic provider that replays a fixed list of responses.

Each generate() call consumes one response from the responses list (cycling if cycle=True). Responses may contain ReAct-style Thought/Action/Final Answer strings — patterns parse them exactly as they would parse real LLM output.

Parameters:

Name	Type	Description	Default
`responses`	`list[str] \| None`	List of strings returned by successive `generate()` calls.	`None`
`cycle`	`bool`	If `True`, restart from the beginning when the list is exhausted (default: `False` — repeats the last response).	`False`
`model_name`	`str`	Provider name reported via `name` property.	`'mock'`

Source code in brain/patterns/testing.py

def __init__(
    self,
    responses: list[str] | None = None,
    *,
    cycle: bool = False,
    model_name: str = "mock",
) -> None:
    self._responses = list(responses or ["Final Answer: [mock response]"])
    self._cycle = cycle
    self._model_name = model_name
    self._call_count = 0
    self._call_log: list[dict[str, Any]] = []

call_count `property` ¶

call_count: int

Number of times generate() has been called.

call_log `property` ¶

call_log: list[dict[str, Any]]

Recorded arguments for every generate() call.

generate ¶

generate(messages: list[dict[str, Any]], tools: list[dict[str, Any]], tool_choice: str = 'auto', stream: bool = False, timeout_s: int = 60) -> ProviderResult

Source code in brain/patterns/testing.py

def generate(
    self,
    messages: list[dict[str, Any]],
    tools: list[dict[str, Any]],
    tool_choice: str = "auto",
    stream: bool = False,
    timeout_s: int = 60,
) -> ProviderResult:
    self._call_log.append(
        {
            "messages": messages,
            "tools": tools,
            "tool_choice": tool_choice,
            "stream": stream,
        }
    )
    text = self._next_response()
    self._call_count += 1
    return ProviderResult(text=text)

generate_stream ¶

generate_stream(messages: list[dict[str, Any]], tools: list[dict[str, Any]], tool_choice: str = 'auto', timeout_s: int = 60) -> Iterator[StreamEvent]

Source code in brain/patterns/testing.py

def generate_stream(
    self,
    messages: list[dict[str, Any]],
    tools: list[dict[str, Any]],
    tool_choice: str = "auto",
    timeout_s: int = 60,
) -> Iterator[StreamEvent]:
    result = self.generate(messages=messages, tools=tools, tool_choice=tool_choice)
    if result.text:
        yield StreamEvent(type="text_delta", text=result.text)
    yield StreamEvent(type="done")

reset ¶

reset() -> None

Reset call count and log without changing responses.

Source code in brain/patterns/testing.py

def reset(self) -> None:
    """Reset call count and log without changing responses."""
    self._call_count = 0
    self._call_log.clear()

add_response ¶

add_response(text: str) -> None

Append a response to the queue.

Source code in brain/patterns/testing.py

def add_response(self, text: str) -> None:
    """Append a response to the queue."""
    self._responses.append(text)

brain.patterns.testing.RecordingProvider ¶

RecordingProvider(provider: LLMProvider)

Bases: LLMProvider

Wraps any provider and records every call and result.

Useful for snapshot testing and cost auditing.

Parameters:

Name	Type	Description	Default
`provider`	`LLMProvider`	The real (or mock) provider to delegate to.	required

Source code in brain/patterns/testing.py

def __init__(self, provider: LLMProvider) -> None:
    self._provider = provider
    self._records: list[RecordingProvider.Record] = []

records `property` ¶

records: list['RecordingProvider.Record']

All recorded interactions.

total_input_tokens `property` ¶

total_input_tokens: int

total_output_tokens `property` ¶

total_output_tokens: int

generate ¶

generate(messages: list[dict[str, Any]], tools: list[dict[str, Any]], tool_choice: str = 'auto', stream: bool = False, timeout_s: int = 60) -> ProviderResult

Source code in brain/patterns/testing.py

def generate(
    self,
    messages: list[dict[str, Any]],
    tools: list[dict[str, Any]],
    tool_choice: str = "auto",
    stream: bool = False,
    timeout_s: int = 60,
) -> ProviderResult:
    result = self._provider.generate(
        messages=messages,
        tools=tools,
        tool_choice=tool_choice,
        stream=stream,
        timeout_s=timeout_s,
    )
    self._records.append(
        RecordingProvider.Record(
            call_index=len(self._records),
            messages=messages,
            tools=tools,
            tool_choice=tool_choice,
            result_text=result.text,
            input_tokens=result.input_tokens,
            output_tokens=result.output_tokens,
        )
    )
    return result

clear ¶

clear() -> None

Clear all recorded interactions.

Source code in brain/patterns/testing.py

def clear(self) -> None:
    """Clear all recorded interactions."""
    self._records.clear()

brain.patterns.testing.AgentHarness ¶

AgentHarness(responses: list[str] | None = None, *, cycle: bool = False)

High-level test harness — batteries included.

Bundles a :class:MockProvider with factory methods for every pattern and assertion helpers on self.

Usage::

def test_my_agent():
    with AgentHarness(responses=["Final Answer: 42"]) as h:
        result = h.react("What is 6×7?", tools={"calc": lambda q: "42"})
        h.assert_ok(result)
        h.assert_answer_contains(result, "42")

Parameters:

Name	Type	Description	Default
`responses`	`list[str] \| None`	Forwarded to :class:`MockProvider`.	`None`
`cycle`	`bool`	Forwarded to :class:`MockProvider`.	`False`

Source code in brain/patterns/testing.py

def __init__(
    self,
    responses: list[str] | None = None,
    *,
    cycle: bool = False,
) -> None:
    self.provider = MockProvider(responses=responses, cycle=cycle)
    self._results: list[Any] = []

last_result `property` ¶

last_result: Any | None

The most recent result returned by any factory method.

all_results `property` ¶

all_results: list[Any]

All results collected during this harness session.

react ¶

react(task: str, *, tools: dict[str, Callable[..., Any]] | None = None, max_steps: int = 5) -> Any

Run a :class:~brain.patterns.ReActAgent and return its result.

Source code in brain/patterns/testing.py

def react(
    self,
    task: str,
    *,
    tools: dict[str, Callable[..., Any]] | None = None,
    max_steps: int = 5,
) -> Any:
    """Run a :class:`~brain.patterns.ReActAgent` and return its result."""
    from brain.patterns import ReActAgent

    agent = ReActAgent(tools=tools or {}, provider=self.provider, max_steps=max_steps)
    result = agent.run(task)
    self._results.append(result)
    return result

rag ¶

rag(task: str, *, corpus: list[str] | None = None, top_k: int = 3) -> Any

Run a :class:~brain.patterns.RAGAgent over corpus.

Source code in brain/patterns/testing.py

def rag(
    self,
    task: str,
    *,
    corpus: list[str] | None = None,
    top_k: int = 3,
) -> Any:
    """Run a :class:`~brain.patterns.RAGAgent` over ``corpus``."""
    from brain.patterns import RAGAgent

    chunks = corpus or []
    agent = RAGAgent(
        retriever=lambda q, k=top_k: chunks[:k],
        provider=self.provider,
        top_k=top_k,
    )
    result = agent.run(task)
    self._results.append(result)
    return result

reflexion ¶

reflexion(task: str, *, max_iterations: int = 2) -> Any

Run a :class:~brain.patterns.ReflexionAgent.

Source code in brain/patterns/testing.py

def reflexion(self, task: str, *, max_iterations: int = 2) -> Any:
    """Run a :class:`~brain.patterns.ReflexionAgent`."""
    from brain.patterns import ReflexionAgent

    agent = ReflexionAgent(provider=self.provider, max_iterations=max_iterations)
    result = agent.run(task)
    self._results.append(result)
    return result

plan ¶

plan(task: str, *, max_plan_steps: int = 3) -> Any

Run a :class:~brain.patterns.PlanAndExecuteAgent.

Source code in brain/patterns/testing.py

def plan(self, task: str, *, max_plan_steps: int = 3) -> Any:
    """Run a :class:`~brain.patterns.PlanAndExecuteAgent`."""
    from brain.patterns import PlanAndExecuteAgent

    agent = PlanAndExecuteAgent(provider=self.provider, max_plan_steps=max_plan_steps)
    result = agent.run(task)
    self._results.append(result)
    return result

assert_ok ¶

assert_ok(result: Any | None = None) -> None

Assert the most recent (or specified) result is ok.

Source code in brain/patterns/testing.py

def assert_ok(self, result: Any | None = None) -> None:
    """Assert the most recent (or specified) result is ok."""
    target = result if result is not None else (self._results[-1] if self._results else None)
    if target is None:
        raise AssertionError("No result to assert on")
    assert_ok(target)

assert_answer_contains ¶

assert_answer_contains(result_or_keyword: Any, *extra_keywords: str) -> None

Assert answer contains keywords.

Accepts either (result, kw1, kw2) or (kw1, kw2) (uses last result).

Source code in brain/patterns/testing.py

def assert_answer_contains(self, result_or_keyword: Any, *extra_keywords: str) -> None:
    """Assert answer contains keywords.

    Accepts either ``(result, kw1, kw2)`` or ``(kw1, kw2)`` (uses last
    result).
    """
    if isinstance(result_or_keyword, str):
        target = self._results[-1] if self._results else None
        keywords = (result_or_keyword, *extra_keywords)
    else:
        target = result_or_keyword
        keywords = extra_keywords
    if target is None:
        raise AssertionError("No result to assert on")
    assert_answer_contains(target, *keywords)

assert_tool_called ¶

assert_tool_called(tool_name: str, result: Any | None = None) -> None

Assert that tool_name was called in the result.

Source code in brain/patterns/testing.py

def assert_tool_called(self, tool_name: str, result: Any | None = None) -> None:
    """Assert that ``tool_name`` was called in the result."""
    target = result if result is not None else (self._results[-1] if self._results else None)
    if target is None:
        raise AssertionError("No result to assert on")
    assert_tool_called(target, tool_name)

assert_call_count ¶

assert_call_count(expected: int) -> None

Assert the mock provider was called exactly expected times.

Source code in brain/patterns/testing.py

def assert_call_count(self, expected: int) -> None:
    """Assert the mock provider was called exactly ``expected`` times."""
    actual = self.provider.call_count
    if actual != expected:
        raise AssertionError(f"Expected {expected} provider calls, got {actual}")

brain.patterns.testing.assert_ok ¶

assert_ok(result: Any) -> None

Assert that a pattern result completed without error.

Parameters:

Name	Type	Description	Default
`result`	`Any`	Any :class:`~brain.patterns.base.PatternResult` or `HITLResult`.	required

Raises:

Type	Description
`AssertionError`	if `result.ok` is `False`.

Source code in brain/patterns/testing.py

def assert_ok(result: Any) -> None:
    """Assert that a pattern result completed without error.

    Args:
        result: Any :class:`~brain.patterns.base.PatternResult` or
            ``HITLResult``.

    Raises:
        AssertionError: if ``result.ok`` is ``False``.
    """
    if not result.ok:
        msg = f"Pattern run failed: {result.error}"
        raise AssertionError(msg)

brain.patterns.testing.assert_answer_contains ¶

assert_answer_contains(result: Any, *keywords: str) -> None

Assert that the answer contains all given keywords (case-insensitive).

Parameters:

Name	Type	Description	Default
`result`	`Any`	Pattern result with an `answer` attribute.	required
`*keywords`	`str`	One or more strings that must appear in the answer.	`()`

Raises:

Type	Description
`AssertionError`	listing all missing keywords.

Source code in brain/patterns/testing.py

def assert_answer_contains(result: Any, *keywords: str) -> None:
    """Assert that the answer contains all given keywords (case-insensitive).

    Args:
        result: Pattern result with an ``answer`` attribute.
        *keywords: One or more strings that must appear in the answer.

    Raises:
        AssertionError: listing all missing keywords.
    """
    answer_lower = (result.answer or "").lower()
    missing = [kw for kw in keywords if kw.lower() not in answer_lower]
    if missing:
        raise AssertionError(f"Answer missing keywords {missing!r}.\nGot: {result.answer[:200]!r}")

brain.patterns.testing.assert_tool_called ¶

assert_tool_called(result: Any, tool_name: str) -> None

Assert that a specific tool was invoked during the run.

Checks result.steps for any step whose action matches tool_name.

Parameters:

Name	Type	Description	Default
`result`	`Any`	Pattern result with a `steps` attribute.	required
`tool_name`	`str`	Name of the expected tool.	required

Raises:

Type	Description
`AssertionError`	if the tool was never called.

Source code in brain/patterns/testing.py

def assert_tool_called(result: Any, tool_name: str) -> None:
    """Assert that a specific tool was invoked during the run.

    Checks ``result.steps`` for any step whose ``action`` matches
    ``tool_name``.

    Args:
        result: Pattern result with a ``steps`` attribute.
        tool_name: Name of the expected tool.

    Raises:
        AssertionError: if the tool was never called.
    """
    steps = getattr(result, "steps", []) or []
    called = [s.action for s in steps if hasattr(s, "action")]
    if tool_name not in called:
        raise AssertionError(f"Tool {tool_name!r} was never called.\nTools used: {called!r}")

brain.patterns.testing.assert_steps_taken ¶

assert_steps_taken(result: Any, *, min_steps: int = 0, max_steps: int | None = None) -> None

Assert that the number of steps is within the expected range.

Parameters:

Name	Type	Description	Default
`result`	`Any`	Pattern result with `steps` and/or `iterations`.	required
`min_steps`	`int`	Minimum number of steps expected (inclusive).	`0`
`max_steps`	`int \| None`	Maximum number of steps expected (inclusive, or `None`).	`None`

Raises:

Type	Description
`AssertionError`	if the count is outside the range.

Source code in brain/patterns/testing.py

def assert_steps_taken(result: Any, *, min_steps: int = 0, max_steps: int | None = None) -> None:
    """Assert that the number of steps is within the expected range.

    Args:
        result: Pattern result with ``steps`` and/or ``iterations``.
        min_steps: Minimum number of steps expected (inclusive).
        max_steps: Maximum number of steps expected (inclusive, or ``None``).

    Raises:
        AssertionError: if the count is outside the range.
    """
    count = len(getattr(result, "steps", []) or [])
    if count < min_steps:
        raise AssertionError(f"Expected ≥{min_steps} steps, got {count}")
    if max_steps is not None and count > max_steps:
        raise AssertionError(f"Expected ≤{max_steps} steps, got {count}")

brain.patterns.testing.assert_iterations ¶

assert_iterations(result: Any, *, min_iters: int = 1, max_iters: int | None = None) -> None

Assert that result.iterations is within the expected range.

Parameters:

Name	Type	Description	Default
`result`	`Any`	Pattern result with an `iterations` attribute.	required
`min_iters`	`int`	Minimum iterations expected (inclusive).	`1`
`max_iters`	`int \| None`	Maximum iterations (inclusive, or `None`).	`None`

Raises:

Type	Description
`AssertionError`	if iterations is outside the range.

Source code in brain/patterns/testing.py

def assert_iterations(result: Any, *, min_iters: int = 1, max_iters: int | None = None) -> None:
    """Assert that ``result.iterations`` is within the expected range.

    Args:
        result: Pattern result with an ``iterations`` attribute.
        min_iters: Minimum iterations expected (inclusive).
        max_iters: Maximum iterations (inclusive, or ``None``).

    Raises:
        AssertionError: if iterations is outside the range.
    """
    iters = getattr(result, "iterations", 0)
    if iters < min_iters:
        raise AssertionError(f"Expected ≥{min_iters} iterations, got {iters}")
    if max_iters is not None and iters > max_iters:
        raise AssertionError(f"Expected ≤{max_iters} iterations, got {iters}")

brain.patterns.testing.capture_events ¶

capture_events(agent: Any, task: str) -> list[Any]

Collect all events from a :class:~brain.patterns.StreamingReActAgent.

Parameters:

Name	Type	Description	Default
`agent`	`Any`	A `StreamingReActAgent` (or any pattern with a `stream()` method).	required
`task`	`str`	The task string to pass to `agent.stream()`.	required

Returns:

Type	Description
`list[Any]`	List of :class:`~brain.patterns.PatternEvent` objects in order.

Source code in brain/patterns/testing.py

def capture_events(agent: Any, task: str) -> list[Any]:
    """Collect all events from a :class:`~brain.patterns.StreamingReActAgent`.

    Args:
        agent: A ``StreamingReActAgent`` (or any pattern with a ``stream()``
            method).
        task: The task string to pass to ``agent.stream()``.

    Returns:
        List of :class:`~brain.patterns.PatternEvent` objects in order.
    """
    return list(agent.stream(task))

Testing Agent Patterns¶

Why test patterns?¶

MockProvider¶

RecordingProvider¶

Assertion helpers¶

AgentHarness¶

Streaming capture¶

Complete test example¶

API Reference¶

brain.patterns.testing.MockProvider ¶

call_count property ¶

call_log property ¶

generate ¶

generate_stream ¶

reset ¶

add_response ¶

brain.patterns.testing.RecordingProvider ¶

records property ¶

total_input_tokens property ¶

total_output_tokens property ¶

generate ¶

clear ¶

brain.patterns.testing.AgentHarness ¶

last_result property ¶

all_results property ¶

react ¶

rag ¶

reflexion ¶

plan ¶

assert_ok ¶

assert_answer_contains ¶

assert_tool_called ¶

assert_call_count ¶

brain.patterns.testing.assert_ok ¶

brain.patterns.testing.assert_answer_contains ¶

brain.patterns.testing.assert_tool_called ¶

brain.patterns.testing.assert_steps_taken ¶

brain.patterns.testing.assert_iterations ¶

brain.patterns.testing.capture_events ¶

call_count `property` ¶

call_log `property` ¶

records `property` ¶

total_input_tokens `property` ¶

total_output_tokens `property` ¶

last_result `property` ¶

all_results `property` ¶