Skip to content

Testing Agent Patterns

brain.patterns.testing provides first-class utilities for writing deterministic, fast, zero-API-key tests against any pattern.

Why test patterns?

Real LLM providers are slow, costly, and non-deterministic. The testing module gives you:

  • MockProvider — replay fixed strings as if they were LLM responses
  • RecordingProvider — wrap any provider and capture every call for inspection
  • AgentHarness — a one-liner context manager with built-in assertions
  • Assertion helpersassert_ok, assert_answer_contains, assert_tool_called, etc.

All utilities run offline — no API key required.


MockProvider

from brain.patterns import ReActAgent
from brain.patterns.testing import MockProvider

provider = MockProvider(responses=[
    "Thought: I should search\nAction: search\nAction Input: RAG",
    "Final Answer: RAG stands for Retrieval-Augmented Generation.",
])

agent = ReActAgent(
    tools={"search": lambda q: f"[result: {q}]"},
    provider=provider,
    max_steps=4,
)
result = agent.run("What is RAG?")
assert result.ok
print(provider.call_count)   # → 2
print(provider.call_log[0])  # → {"messages": [...], "tools": [...], ...}

Key behaviours:

Feature Details
Fixed responses Cycles through responses list
cycle=True Wraps around; cycle=False (default) repeats last response
call_count Number of generate() calls so far
call_log Full argument list for every call
reset() Clears count and log without changing responses
add_response(text) Append a response to the queue at runtime

RecordingProvider

Wraps any real (or mock) provider and records all interactions:

from brain.patterns import ReflexionAgent
from brain.patterns.testing import MockProvider, RecordingProvider

inner = MockProvider(responses=["Draft.", "Refined draft."])
recorder = RecordingProvider(inner)

agent = ReflexionAgent(provider=recorder, max_iterations=2)
result = agent.run("Explain fine-tuning")

print(len(recorder.records))        # → 2 (one per LLM call)
print(recorder.total_input_tokens)  # → 0 with MockProvider (real provider: actual counts)

Useful for cost auditing and snapshot testing.


Assertion helpers

from brain.patterns.testing import (
    assert_ok,
    assert_answer_contains,
    assert_tool_called,
    assert_steps_taken,
    assert_iterations,
)

assert_ok(result)
assert_answer_contains(result, "rag", "retrieval")   # case-insensitive
assert_tool_called(result, "search")                 # checks result.steps[*].action
assert_steps_taken(result, min_steps=1, max_steps=5)
assert_iterations(result, min_iters=2)

All functions raise AssertionError with a descriptive message on failure — compatible with pytest, unittest, and any test runner.


AgentHarness

The highest-level API — bundle everything into a context manager:

from brain.patterns.testing import AgentHarness

def test_search_agent():
    with AgentHarness(responses=["Final Answer: RAG uses retrieval."]) as h:
        result = h.react("Explain RAG", tools={"search": lambda q: "[result]"})
        h.assert_ok()
        h.assert_answer_contains("rag")

Pattern factory methods on AgentHarness:

Method Pattern
h.react(task, tools=..., max_steps=5) ReActAgent
h.rag(task, corpus=..., top_k=3) RAGAgent
h.reflexion(task, max_iterations=2) ReflexionAgent
h.plan(task, max_plan_steps=3) PlanAndExecuteAgent

Assertion methods:

Method Checks
h.assert_ok(result=None) result.ok == True
h.assert_answer_contains(result_or_kw, *kws) Keywords in answer
h.assert_tool_called(tool_name, result=None) Tool appears in steps
h.assert_call_count(n) Mock called exactly n times

When result is omitted, assertion methods use h.last_result.


Streaming capture

from brain.patterns import StreamingReActAgent, EventKind
from brain.patterns.testing import MockProvider, capture_events

provider = MockProvider(responses=["Final Answer: Done!"])
agent = StreamingReActAgent(tools={}, provider=provider, max_steps=3)

events = capture_events(agent, "Summarise AI safety")
final = next(e for e in events if e.kind == EventKind.FINAL)
print(final.text)  # "Done!"

Complete test example

# tests/test_my_agent.py
import pytest
from brain.patterns.testing import AgentHarness, MockProvider, assert_answer_contains

def test_react_finds_answer():
    with AgentHarness(responses=[
        "Thought: search\nAction: web\nAction Input: transformer",
        "Final Answer: Transformers use self-attention mechanisms.",
    ]) as h:
        result = h.react("What is a transformer?", tools={"web": lambda q: "[wiki]"})
        h.assert_ok()
        h.assert_answer_contains("attention")

def test_rag_cites_corpus():
    corpus = ["The API rate limit is 100 req/min.", "Auth uses OAuth2 with JWT."]
    with AgentHarness(responses=["Based on context, OAuth2 is used with JWT tokens."]) as h:
        result = h.rag("What auth method is used?", corpus=corpus)
        h.assert_ok()
        h.assert_answer_contains("oauth2", "jwt")

API Reference

brain.patterns.testing.MockProvider

MockProvider(responses: list[str] | None = None, *, cycle: bool = False, model_name: str = 'mock')

Bases: LLMProvider

Deterministic provider that replays a fixed list of responses.

Each generate() call consumes one response from the responses list (cycling if cycle=True). Responses may contain ReAct-style Thought/Action/Final Answer strings — patterns parse them exactly as they would parse real LLM output.

Parameters:

Name Type Description Default
responses list[str] | None

List of strings returned by successive generate() calls.

None
cycle bool

If True, restart from the beginning when the list is exhausted (default: False — repeats the last response).

False
model_name str

Provider name reported via name property.

'mock'
Source code in brain/patterns/testing.py
def __init__(
    self,
    responses: list[str] | None = None,
    *,
    cycle: bool = False,
    model_name: str = "mock",
) -> None:
    self._responses = list(responses or ["Final Answer: [mock response]"])
    self._cycle = cycle
    self._model_name = model_name
    self._call_count = 0
    self._call_log: list[dict[str, Any]] = []

call_count property

call_count: int

Number of times generate() has been called.

call_log property

call_log: list[dict[str, Any]]

Recorded arguments for every generate() call.

generate

generate(messages: list[dict[str, Any]], tools: list[dict[str, Any]], tool_choice: str = 'auto', stream: bool = False, timeout_s: int = 60) -> ProviderResult
Source code in brain/patterns/testing.py
def generate(
    self,
    messages: list[dict[str, Any]],
    tools: list[dict[str, Any]],
    tool_choice: str = "auto",
    stream: bool = False,
    timeout_s: int = 60,
) -> ProviderResult:
    self._call_log.append(
        {
            "messages": messages,
            "tools": tools,
            "tool_choice": tool_choice,
            "stream": stream,
        }
    )
    text = self._next_response()
    self._call_count += 1
    return ProviderResult(text=text)

generate_stream

generate_stream(messages: list[dict[str, Any]], tools: list[dict[str, Any]], tool_choice: str = 'auto', timeout_s: int = 60) -> Iterator[StreamEvent]
Source code in brain/patterns/testing.py
def generate_stream(
    self,
    messages: list[dict[str, Any]],
    tools: list[dict[str, Any]],
    tool_choice: str = "auto",
    timeout_s: int = 60,
) -> Iterator[StreamEvent]:
    result = self.generate(messages=messages, tools=tools, tool_choice=tool_choice)
    if result.text:
        yield StreamEvent(type="text_delta", text=result.text)
    yield StreamEvent(type="done")

reset

reset() -> None

Reset call count and log without changing responses.

Source code in brain/patterns/testing.py
def reset(self) -> None:
    """Reset call count and log without changing responses."""
    self._call_count = 0
    self._call_log.clear()

add_response

add_response(text: str) -> None

Append a response to the queue.

Source code in brain/patterns/testing.py
def add_response(self, text: str) -> None:
    """Append a response to the queue."""
    self._responses.append(text)

brain.patterns.testing.RecordingProvider

RecordingProvider(provider: LLMProvider)

Bases: LLMProvider

Wraps any provider and records every call and result.

Useful for snapshot testing and cost auditing.

Parameters:

Name Type Description Default
provider LLMProvider

The real (or mock) provider to delegate to.

required
Source code in brain/patterns/testing.py
def __init__(self, provider: LLMProvider) -> None:
    self._provider = provider
    self._records: list[RecordingProvider.Record] = []

records property

records: list['RecordingProvider.Record']

All recorded interactions.

total_input_tokens property

total_input_tokens: int

total_output_tokens property

total_output_tokens: int

generate

generate(messages: list[dict[str, Any]], tools: list[dict[str, Any]], tool_choice: str = 'auto', stream: bool = False, timeout_s: int = 60) -> ProviderResult
Source code in brain/patterns/testing.py
def generate(
    self,
    messages: list[dict[str, Any]],
    tools: list[dict[str, Any]],
    tool_choice: str = "auto",
    stream: bool = False,
    timeout_s: int = 60,
) -> ProviderResult:
    result = self._provider.generate(
        messages=messages,
        tools=tools,
        tool_choice=tool_choice,
        stream=stream,
        timeout_s=timeout_s,
    )
    self._records.append(
        RecordingProvider.Record(
            call_index=len(self._records),
            messages=messages,
            tools=tools,
            tool_choice=tool_choice,
            result_text=result.text,
            input_tokens=result.input_tokens,
            output_tokens=result.output_tokens,
        )
    )
    return result

clear

clear() -> None

Clear all recorded interactions.

Source code in brain/patterns/testing.py
def clear(self) -> None:
    """Clear all recorded interactions."""
    self._records.clear()

brain.patterns.testing.AgentHarness

AgentHarness(responses: list[str] | None = None, *, cycle: bool = False)

High-level test harness — batteries included.

Bundles a :class:MockProvider with factory methods for every pattern and assertion helpers on self.

Usage::

def test_my_agent():
    with AgentHarness(responses=["Final Answer: 42"]) as h:
        result = h.react("What is 6×7?", tools={"calc": lambda q: "42"})
        h.assert_ok(result)
        h.assert_answer_contains(result, "42")

Parameters:

Name Type Description Default
responses list[str] | None

Forwarded to :class:MockProvider.

None
cycle bool

Forwarded to :class:MockProvider.

False
Source code in brain/patterns/testing.py
def __init__(
    self,
    responses: list[str] | None = None,
    *,
    cycle: bool = False,
) -> None:
    self.provider = MockProvider(responses=responses, cycle=cycle)
    self._results: list[Any] = []

last_result property

last_result: Any | None

The most recent result returned by any factory method.

all_results property

all_results: list[Any]

All results collected during this harness session.

react

react(task: str, *, tools: dict[str, Callable[..., Any]] | None = None, max_steps: int = 5) -> Any

Run a :class:~brain.patterns.ReActAgent and return its result.

Source code in brain/patterns/testing.py
def react(
    self,
    task: str,
    *,
    tools: dict[str, Callable[..., Any]] | None = None,
    max_steps: int = 5,
) -> Any:
    """Run a :class:`~brain.patterns.ReActAgent` and return its result."""
    from brain.patterns import ReActAgent

    agent = ReActAgent(tools=tools or {}, provider=self.provider, max_steps=max_steps)
    result = agent.run(task)
    self._results.append(result)
    return result

rag

rag(task: str, *, corpus: list[str] | None = None, top_k: int = 3) -> Any

Run a :class:~brain.patterns.RAGAgent over corpus.

Source code in brain/patterns/testing.py
def rag(
    self,
    task: str,
    *,
    corpus: list[str] | None = None,
    top_k: int = 3,
) -> Any:
    """Run a :class:`~brain.patterns.RAGAgent` over ``corpus``."""
    from brain.patterns import RAGAgent

    chunks = corpus or []
    agent = RAGAgent(
        retriever=lambda q, k=top_k: chunks[:k],
        provider=self.provider,
        top_k=top_k,
    )
    result = agent.run(task)
    self._results.append(result)
    return result

reflexion

reflexion(task: str, *, max_iterations: int = 2) -> Any

Run a :class:~brain.patterns.ReflexionAgent.

Source code in brain/patterns/testing.py
def reflexion(self, task: str, *, max_iterations: int = 2) -> Any:
    """Run a :class:`~brain.patterns.ReflexionAgent`."""
    from brain.patterns import ReflexionAgent

    agent = ReflexionAgent(provider=self.provider, max_iterations=max_iterations)
    result = agent.run(task)
    self._results.append(result)
    return result

plan

plan(task: str, *, max_plan_steps: int = 3) -> Any

Run a :class:~brain.patterns.PlanAndExecuteAgent.

Source code in brain/patterns/testing.py
def plan(self, task: str, *, max_plan_steps: int = 3) -> Any:
    """Run a :class:`~brain.patterns.PlanAndExecuteAgent`."""
    from brain.patterns import PlanAndExecuteAgent

    agent = PlanAndExecuteAgent(provider=self.provider, max_plan_steps=max_plan_steps)
    result = agent.run(task)
    self._results.append(result)
    return result

assert_ok

assert_ok(result: Any | None = None) -> None

Assert the most recent (or specified) result is ok.

Source code in brain/patterns/testing.py
def assert_ok(self, result: Any | None = None) -> None:
    """Assert the most recent (or specified) result is ok."""
    target = result if result is not None else (self._results[-1] if self._results else None)
    if target is None:
        raise AssertionError("No result to assert on")
    assert_ok(target)

assert_answer_contains

assert_answer_contains(result_or_keyword: Any, *extra_keywords: str) -> None

Assert answer contains keywords.

Accepts either (result, kw1, kw2) or (kw1, kw2) (uses last result).

Source code in brain/patterns/testing.py
def assert_answer_contains(self, result_or_keyword: Any, *extra_keywords: str) -> None:
    """Assert answer contains keywords.

    Accepts either ``(result, kw1, kw2)`` or ``(kw1, kw2)`` (uses last
    result).
    """
    if isinstance(result_or_keyword, str):
        target = self._results[-1] if self._results else None
        keywords = (result_or_keyword, *extra_keywords)
    else:
        target = result_or_keyword
        keywords = extra_keywords
    if target is None:
        raise AssertionError("No result to assert on")
    assert_answer_contains(target, *keywords)

assert_tool_called

assert_tool_called(tool_name: str, result: Any | None = None) -> None

Assert that tool_name was called in the result.

Source code in brain/patterns/testing.py
def assert_tool_called(self, tool_name: str, result: Any | None = None) -> None:
    """Assert that ``tool_name`` was called in the result."""
    target = result if result is not None else (self._results[-1] if self._results else None)
    if target is None:
        raise AssertionError("No result to assert on")
    assert_tool_called(target, tool_name)

assert_call_count

assert_call_count(expected: int) -> None

Assert the mock provider was called exactly expected times.

Source code in brain/patterns/testing.py
def assert_call_count(self, expected: int) -> None:
    """Assert the mock provider was called exactly ``expected`` times."""
    actual = self.provider.call_count
    if actual != expected:
        raise AssertionError(f"Expected {expected} provider calls, got {actual}")

brain.patterns.testing.assert_ok

assert_ok(result: Any) -> None

Assert that a pattern result completed without error.

Parameters:

Name Type Description Default
result Any

Any :class:~brain.patterns.base.PatternResult or HITLResult.

required

Raises:

Type Description
AssertionError

if result.ok is False.

Source code in brain/patterns/testing.py
def assert_ok(result: Any) -> None:
    """Assert that a pattern result completed without error.

    Args:
        result: Any :class:`~brain.patterns.base.PatternResult` or
            ``HITLResult``.

    Raises:
        AssertionError: if ``result.ok`` is ``False``.
    """
    if not result.ok:
        msg = f"Pattern run failed: {result.error}"
        raise AssertionError(msg)

brain.patterns.testing.assert_answer_contains

assert_answer_contains(result: Any, *keywords: str) -> None

Assert that the answer contains all given keywords (case-insensitive).

Parameters:

Name Type Description Default
result Any

Pattern result with an answer attribute.

required
*keywords str

One or more strings that must appear in the answer.

()

Raises:

Type Description
AssertionError

listing all missing keywords.

Source code in brain/patterns/testing.py
def assert_answer_contains(result: Any, *keywords: str) -> None:
    """Assert that the answer contains all given keywords (case-insensitive).

    Args:
        result: Pattern result with an ``answer`` attribute.
        *keywords: One or more strings that must appear in the answer.

    Raises:
        AssertionError: listing all missing keywords.
    """
    answer_lower = (result.answer or "").lower()
    missing = [kw for kw in keywords if kw.lower() not in answer_lower]
    if missing:
        raise AssertionError(f"Answer missing keywords {missing!r}.\nGot: {result.answer[:200]!r}")

brain.patterns.testing.assert_tool_called

assert_tool_called(result: Any, tool_name: str) -> None

Assert that a specific tool was invoked during the run.

Checks result.steps for any step whose action matches tool_name.

Parameters:

Name Type Description Default
result Any

Pattern result with a steps attribute.

required
tool_name str

Name of the expected tool.

required

Raises:

Type Description
AssertionError

if the tool was never called.

Source code in brain/patterns/testing.py
def assert_tool_called(result: Any, tool_name: str) -> None:
    """Assert that a specific tool was invoked during the run.

    Checks ``result.steps`` for any step whose ``action`` matches
    ``tool_name``.

    Args:
        result: Pattern result with a ``steps`` attribute.
        tool_name: Name of the expected tool.

    Raises:
        AssertionError: if the tool was never called.
    """
    steps = getattr(result, "steps", []) or []
    called = [s.action for s in steps if hasattr(s, "action")]
    if tool_name not in called:
        raise AssertionError(f"Tool {tool_name!r} was never called.\nTools used: {called!r}")

brain.patterns.testing.assert_steps_taken

assert_steps_taken(result: Any, *, min_steps: int = 0, max_steps: int | None = None) -> None

Assert that the number of steps is within the expected range.

Parameters:

Name Type Description Default
result Any

Pattern result with steps and/or iterations.

required
min_steps int

Minimum number of steps expected (inclusive).

0
max_steps int | None

Maximum number of steps expected (inclusive, or None).

None

Raises:

Type Description
AssertionError

if the count is outside the range.

Source code in brain/patterns/testing.py
def assert_steps_taken(result: Any, *, min_steps: int = 0, max_steps: int | None = None) -> None:
    """Assert that the number of steps is within the expected range.

    Args:
        result: Pattern result with ``steps`` and/or ``iterations``.
        min_steps: Minimum number of steps expected (inclusive).
        max_steps: Maximum number of steps expected (inclusive, or ``None``).

    Raises:
        AssertionError: if the count is outside the range.
    """
    count = len(getattr(result, "steps", []) or [])
    if count < min_steps:
        raise AssertionError(f"Expected ≥{min_steps} steps, got {count}")
    if max_steps is not None and count > max_steps:
        raise AssertionError(f"Expected ≤{max_steps} steps, got {count}")

brain.patterns.testing.assert_iterations

assert_iterations(result: Any, *, min_iters: int = 1, max_iters: int | None = None) -> None

Assert that result.iterations is within the expected range.

Parameters:

Name Type Description Default
result Any

Pattern result with an iterations attribute.

required
min_iters int

Minimum iterations expected (inclusive).

1
max_iters int | None

Maximum iterations (inclusive, or None).

None

Raises:

Type Description
AssertionError

if iterations is outside the range.

Source code in brain/patterns/testing.py
def assert_iterations(result: Any, *, min_iters: int = 1, max_iters: int | None = None) -> None:
    """Assert that ``result.iterations`` is within the expected range.

    Args:
        result: Pattern result with an ``iterations`` attribute.
        min_iters: Minimum iterations expected (inclusive).
        max_iters: Maximum iterations (inclusive, or ``None``).

    Raises:
        AssertionError: if iterations is outside the range.
    """
    iters = getattr(result, "iterations", 0)
    if iters < min_iters:
        raise AssertionError(f"Expected ≥{min_iters} iterations, got {iters}")
    if max_iters is not None and iters > max_iters:
        raise AssertionError(f"Expected ≤{max_iters} iterations, got {iters}")

brain.patterns.testing.capture_events

capture_events(agent: Any, task: str) -> list[Any]

Collect all events from a :class:~brain.patterns.StreamingReActAgent.

Parameters:

Name Type Description Default
agent Any

A StreamingReActAgent (or any pattern with a stream() method).

required
task str

The task string to pass to agent.stream().

required

Returns:

Type Description
list[Any]

List of :class:~brain.patterns.PatternEvent objects in order.

Source code in brain/patterns/testing.py
def capture_events(agent: Any, task: str) -> list[Any]:
    """Collect all events from a :class:`~brain.patterns.StreamingReActAgent`.

    Args:
        agent: A ``StreamingReActAgent`` (or any pattern with a ``stream()``
            method).
        task: The task string to pass to ``agent.stream()``.

    Returns:
        List of :class:`~brain.patterns.PatternEvent` objects in order.
    """
    return list(agent.stream(task))