ReplayControl¶
File: brain/agent/replay_control.py
Purpose¶
ReplayControl prevents the LLM from wasting tool budget by repeating calls
it has already made with the same arguments, while correctly allowing:
- Retries of failed idempotent calls (e.g. transient
web_searcherrors) - Multiple calls to non-idempotent tools (e.g.
send_email,create_event)
One instance is created per turn inside _execute_turn_loop_v2 and passed to
every _exec_call_v2 invocation.
Idempotency¶
Idempotency is declared on the ToolSpec:
ToolSpec(
name="local_search",
idempotent=True, # same query → same result; second call is wasteful
)
ToolSpec(
name="send_email",
idempotent=False, # second call sends a second email; must not be suppressed
)
If a tool has no ToolSpec (registry miss), it is treated as idempotent
(conservative dedup — same as the original harness behaviour).
Decision Rules¶
| Condition | Decision |
|---|---|
| Tool is non-idempotent | Always allow (never skip) |
| Idempotent, first call | Allow |
| Idempotent, prior call succeeded | Skip — emit ToolDenied(reason="duplicate") |
| Idempotent, prior call failed or timed out | Allow (retry) |
| Idempotent, prior call was denied | Allow (denied calls don't count as "prior success") |
API¶
replay = ReplayControl() # one per turn
# Called before execution
skip, reason = replay.should_skip(tool_name, arguments, spec)
# skip=True → return ToolDenied(reason="duplicate", details=reason)
# skip=False → proceed to execution
# Called after outcome is known
replay.record_success(tool_name, arguments, spec)
replay.record_failure(tool_name, arguments, spec)
replay.record_timeout(tool_name, arguments, spec)
replay.record_denied(tool_name, arguments, spec)
replay.history_size() # → int: number of distinct (name, args) pairs seen
Call Key¶
The deduplication key is a stable MD5 hash of the normalized call:
raw = f"{tool_name}:{json.dumps(arguments, sort_keys=True)}"
call_key = hashlib.md5(raw.encode()).hexdigest()
Arguments are sorted by key before hashing so {"b": 2, "a": 1} and
{"a": 1, "b": 2} produce the same key.
Harness Integration¶
def _exec_call_v2(call, state, web_mode, replay_control):
# Gate 1: replay dedup
skip, reason = replay_control.should_skip(call.name, call.arguments, spec)
if skip:
return ToolDenied(call_id=call.id, tool_name=call.name,
reason="duplicate", details=reason)
...
outcome = bounded_executor.execute(call, ...)
...
# Record result
if outcome_is_error(outcome):
replay_control.record_failure(call.name, call.arguments, spec)
elif not isinstance(outcome, ToolDenied):
replay_control.record_success(call.name, call.arguments, spec)
return outcome
Example Turn¶
Step 1: LLM calls web_search(q="capital of France")
→ should_skip? No (first call)
→ execute → error (timeout)
→ record_failure("web_search", {"q": "capital of France"})
Step 2: LLM retries web_search(q="capital of France")
→ should_skip? No (prior call FAILED → allow retry)
→ execute → success
→ record_success("web_search", {"q": "capital of France"})
Step 3: LLM calls web_search(q="capital of France") again
→ should_skip? Yes (prior call SUCCEEDED → duplicate)
→ ToolDenied(reason="duplicate")
Step 4: LLM calls web_search(q="population of France")
→ should_skip? No (different arguments → different key)
→ execute → success