Skip to content

ReplayControl

File: brain/agent/replay_control.py


Purpose

ReplayControl prevents the LLM from wasting tool budget by repeating calls it has already made with the same arguments, while correctly allowing:

  • Retries of failed idempotent calls (e.g. transient web_search errors)
  • Multiple calls to non-idempotent tools (e.g. send_email, create_event)

One instance is created per turn inside _execute_turn_loop_v2 and passed to every _exec_call_v2 invocation.


Idempotency

Idempotency is declared on the ToolSpec:

ToolSpec(
    name="local_search",
    idempotent=True,    # same query → same result; second call is wasteful
)
ToolSpec(
    name="send_email",
    idempotent=False,   # second call sends a second email; must not be suppressed
)

If a tool has no ToolSpec (registry miss), it is treated as idempotent (conservative dedup — same as the original harness behaviour).


Decision Rules

Condition Decision
Tool is non-idempotent Always allow (never skip)
Idempotent, first call Allow
Idempotent, prior call succeeded Skip — emit ToolDenied(reason="duplicate")
Idempotent, prior call failed or timed out Allow (retry)
Idempotent, prior call was denied Allow (denied calls don't count as "prior success")

API

replay = ReplayControl()  # one per turn

# Called before execution
skip, reason = replay.should_skip(tool_name, arguments, spec)
# skip=True  → return ToolDenied(reason="duplicate", details=reason)
# skip=False → proceed to execution

# Called after outcome is known
replay.record_success(tool_name, arguments, spec)
replay.record_failure(tool_name, arguments, spec)
replay.record_timeout(tool_name, arguments, spec)
replay.record_denied(tool_name, arguments, spec)

replay.history_size()  # → int: number of distinct (name, args) pairs seen

Call Key

The deduplication key is a stable MD5 hash of the normalized call:

raw = f"{tool_name}:{json.dumps(arguments, sort_keys=True)}"
call_key = hashlib.md5(raw.encode()).hexdigest()

Arguments are sorted by key before hashing so {"b": 2, "a": 1} and {"a": 1, "b": 2} produce the same key.


Harness Integration

def _exec_call_v2(call, state, web_mode, replay_control):
    # Gate 1: replay dedup
    skip, reason = replay_control.should_skip(call.name, call.arguments, spec)
    if skip:
        return ToolDenied(call_id=call.id, tool_name=call.name,
                          reason="duplicate", details=reason)
    ...
    outcome = bounded_executor.execute(call, ...)
    ...
    # Record result
    if outcome_is_error(outcome):
        replay_control.record_failure(call.name, call.arguments, spec)
    elif not isinstance(outcome, ToolDenied):
        replay_control.record_success(call.name, call.arguments, spec)
    return outcome

Example Turn

Step 1: LLM calls web_search(q="capital of France")
        → should_skip? No (first call)
        → execute → error (timeout)
        → record_failure("web_search", {"q": "capital of France"})

Step 2: LLM retries web_search(q="capital of France")
        → should_skip? No (prior call FAILED → allow retry)
        → execute → success
        → record_success("web_search", {"q": "capital of France"})

Step 3: LLM calls web_search(q="capital of France") again
        → should_skip? Yes (prior call SUCCEEDED → duplicate)
        → ToolDenied(reason="duplicate")

Step 4: LLM calls web_search(q="population of France")
        → should_skip? No (different arguments → different key)
        → execute → success