The Survival Layer¶
Purpose¶
Policy, sandboxing, and approvals (see Policy and Approvals) are prevention — they reduce the chance an agent does the wrong thing. But prevention has a residual failure rate, and some actions are only discovered to be wrong after they were validly approved (a refund issued while an order was unshipped, then the order ships). The Survival Layer is what SecondBrain does after a consequential action commits.
The framing is one formula:
Prevention shrinks the first term and never reaches zero. The Survival Layer bounds and observes the other two — it makes consequences declarable, recoverable, budgeted, and measurable.
It is five composable subsystems, all in brain/kernel/, all pure/deterministic
and opt-in (nothing changes behaviour until a caller wires it in):
| Subsystem | Module | Answers |
|---|---|---|
| Reversibility | reversibility.py + contracts.py |
How can this action be undone? |
| Compensation & Recovery | compensation.py, recovery_loop.py |
What do we do when it fails or its assumption breaks? |
| Earned Autonomy | autonomy.py |
How much is this agent allowed to do right now? |
| Tokenomics | tokenomics.py |
What did a trusted outcome cost? |
| Harness Audit + Trajectory Evals | harness_audit.py, trajectory_eval.py |
Is the harness actually wired, and did the run take the right path? |
1. Reversibility — declare how an action is undone¶
A tool declares how it can be undone, separately from how risky it is. The
declaration lives on the ToolSpec:
from brain.kernel.contracts import ReversalSpec, ReversalClass, BlastRadius, ToolSpec
spec = ToolSpec(
tool_id="payments.issue_refund",
name="Issue refund",
safety_class=SafetyClass.NETWORK,
reversal=ReversalSpec(
reversal_class=ReversalClass.COMPENSABLE, # reversible | compensable | irreversible
compensation_capability="payments.reverse_refund",
recovery_approval_mode=RecoveryApprovalMode.HUMAN,
reversal_window_ms=3_600_000, # 1h to compensate
blast_radius=BlastRadius(max_calls_per_run=1, max_value_per_run=5000),
),
metadata={"risk_dimension": "money", "value_arg": "amount"},
)
If a side-effecting tool declares no reversal, ToolSpec.resolved_reversal()
derives a conservative default from its safety_class: READ_ONLY →
reversible no-op, LOCAL_WRITE → reversible (checkpoint the prior value),
NETWORK → compensable, DESTRUCTIVE → irreversible (gate it). Reversal
specs are kept out of the v1 tool-contract fingerprint, so adding one never
forces an approved baseline to re-approve.
At commit time the ToolExecutor (when constructed with reversibility="observe"
or "enforce") binds a typed MutationRef — an executable recovery
pointer, not a log breadcrumb. Reversal args, the conditions that justified the
action (assumption_refs), the external transaction handle, and an idempotency
key are all captured now so recovery never reconstructs intent from logs.
MutationRefs are persisted in a MutationStore (SQLite) for the Recovery Loop
to read.
An IrreversibilityBudget on the RunContext tracks un-undoable consequence
as a vector across money / comms / commitments / exposure / regulatory.
Reversible actions accrue no debt, compensable ones a fraction (0.1×),
irreversible ones full value (1.0×). Crossing a ceiling escalates (asks for
human judgment) — it does not hard-deny.
The reversibility layer binds, accounts, and persists. It deliberately executes no compensation — that is the Recovery Loop, below.
2. Compensation & Recovery — what to do when things go wrong¶
There are two distinct failure shapes, and they share one engine.
Synchronous: typed-verdict compensation (compensation.py)¶
When a tool call fails, "what failed" (a typed ErrorKind) is kept separate
from "what to do" (CompensationAction). The PLAYBOOK maps one to the other,
and the routings a naive retry loop gets wrong are the whole point:
| Verdict | Compensation | Why |
|---|---|---|
IDEMPOTENCY_CONFLICT |
DEPRECATE_AND_REPLAN |
the duplicate already succeeded upstream — retrying is the double-refund bug |
EVIDENCE_STALE |
REFRESH_EVIDENCE |
re-retrieve then retry once; a blind retry just fails the same way |
POLICY_DENIED / CONTENT_FILTER / REFUSAL |
ESCALATE_TO_HUMAN |
the model can't work around a policy decision |
TIMEOUT / TRANSIENT_NETWORK / RATE_LIMITED / SERVER |
RETRY_WITH_BACKOFF |
genuinely transient |
from brain.kernel.compensation import CompensationDispatcher, verdict_for
dispatcher = CompensationDispatcher(mutation_store=store)
outcome = dispatcher.dispatch(ctx, tool_id="payments.issue_refund",
call_id=call_id, args=args, result=failed_result)
# outcome.kind ∈ {retry_scheduled, evidence_refreshed, deprecated, escalated, reversed, ...}
verdict_for() unifies the two failure representations SB has — ToolErrorCode
on a ToolResult (executor path) and exceptions via the classifier (provider
path). The dispatcher is pure and injectable: retries/refreshes/escalations are
returned as directives (or run through caller-supplied callables), so the
decision logic is unit-testable without a live run.
Post-hoc: the Recovery Loop (recovery_loop.py)¶
When a successful action's justifying assumption later turns false, an
AssumptionInvalidated event on the in-process EventBus drives the
RecoveryLoop:
from brain.kernel.recovery_loop import RecoveryLoop, registry_approval_resolver
from brain.kernel.events import AssumptionInvalidated
loop = RecoveryLoop(
mutation_store=store,
dispatcher=dispatcher,
reversal_runner=run_compensation, # performs the actual reversal
approval_resolver=registry_approval_resolver(registry),
decision_store=decision_store,
)
loop.subscribe(bus)
bus.emit(AssumptionInvalidated(run_id=run_id, assumption_ref="order_not_shipped"))
The loop finds in-window MutationRefs whose assumption_refs contain the
invalidated ref and recovers them. Its defining property is approval-mode
separation: a reversal does not inherit the original action's approval mode.
A refund may have auto-executed, but reversing it can require human review. Only
RecoveryApprovalMode.AUTO auto-fires; HUMAN / DUAL_CONTROL escalate and are
never auto-run (the default resolver returns HUMAN). Every recovery emits a
RuntimeDecisionRecord linked to the original via reverses_mutation_ref +
assumption_refs.
The reversal-token guard¶
Both paths route reversals through one guard: a reversal never fires without a
paired token. In SecondBrain the token is the reversibility layer's MutationRef. No
in-window MutationRef for the call → reversal_refused, always.
3. Earned Autonomy (autonomy.py)¶
Autonomy is a budget an agent earns per task, not a static switch. A weighted score over eight signals maps to a five-rung mode ladder, then hard caps clamp it down:
from brain.kernel.autonomy import AutonomySignals, AutonomyCaps, compute_autonomy
decision = compute_autonomy(
AutonomySignals(
evidence_confidence=0.9, policy_confidence=1.0, eval_score=0.85,
tool_reliability=0.9, reversibility=0.5, # ← fed from the ReversalClass
task_risk=0.2, user_impact=0.3,
),
AutonomyCaps(policy_ceiling=AutonomyMode.EXECUTE_WITH_APPROVAL),
)
# decision.mode → read | recommend | draft | execute_with_approval | auto_execute
# decision.capped_by names any ceiling that held it down
Signals: five earn autonomy (evidence/policy/eval 0.20 each,
tool-reliability/reversibility 0.15 each), two spend it (task-risk,
user-impact −0.10). The lowest cap wins, so a clean eval record can't buy
past a policy ceiling. reversibility_score(ReversalClass) (1.0 / 0.5 / 0.0)
is the concrete tie from the reversibility layer — an easily-undone action earns more authority.
Autonomy also degrades automatically. degrade() drops a rung on a policy
violation (and floors to READ on a safety violation); AutonomyState freezes
promotion after a failed replay or a cluster of operator corrections until
unfreeze() — while frozen, reconsider() may lower the mode but never raise it.
4. Tokenomics (tokenomics.py)¶
"Cost per token is the AI-factory metric; cost per trusted outcome is the enterprise metric." An outcome is trusted only when it is accepted ∧ grounded ∧ policy-compliant. The ledger supplies the missing denominator:
from brain.kernel.tokenomics import TokenomicsLedger, OutcomeRecord, CostBreakdown
ledger = TokenomicsLedger()
ledger.record(OutcomeRecord(
workflow_id="support.refund",
cost=CostBreakdown(model_cost_cents=120, eval_tokens=40, retry_tokens=10, ...),
accepted=True, grounded=True, policy_compliant=True,
))
report = ledger.report(workflow_id="support.refund")
report.cost_per_trusted_outcome_cents # None (not 0) when zero trusted — honest
report.retry_token_ratio, report.eval_overhead_ratio, report.cache_hit_rate
cost_per_trusted_outcome is None when there are no trusted outcomes — an
honest "undefined" rather than a misleading zero.
5. Harness Audit + Trajectory Evals¶
Self-grading audit (harness_audit.py)¶
sb harness audit grades the runtime against the eight-outcome / forty-control
ContextOS harness audit. Each control runs a deterministic probe (is the
subsystem wired?) and returns pass / partial / fail with evidence and a P0/P1/P2
severity — an honest structural scorecard, not a green wall.
sb harness audit # rich scorecard + by-outcome rollup
sb harness audit --json # machine-readable
sb harness audit --fail-under 0.8 # CI gate; exits 2 on a P0 blocker, 1 below threshold
It is deterministic (pure import/attribute checks — same answer every run), so it can gate CI. The current self-grade is 100% (40 pass / 0 partial / 0 fail), no P0 blockers — every one of the eight outcomes is fully wired. The follow-on modules below (context-source registry, red-team suite, replay packets) plus the governance/measurement modules (agent charter, memory-read policy, data classification, durable execution, release tuple, incident response, contradiction ledger, business metrics) closed the last structural gaps.
Trajectory evals (trajectory_eval.py)¶
A final-answer scorecard can't catch a run that lands the right answer the wrong
way (a double refund, a skipped eligibility check). Trajectory evals grade the
tool path. They run standalone or as part of an eval suite — add a
trajectory: block to any sb agent eval case:
cases:
- id: refund_path
input: "refund my unshipped order"
matchers:
- kind: contains
text: "refund issued"
trajectory:
mode: subsequence # strict | subsequence | unordered
forbid: ["payments.issue_refund_again"]
steps:
- tool: orders.read_booking
- tool: payments.issue_refund
args: {amount: 500}
The case passes only when every matcher and the trajectory pass; the report
carries a trajectory_score and, on failure, a [trajectory] score=… (forbidden:…,
missing:…) line. Modes: strict (exact ordered), subsequence (in order, extras
allowed), unordered (set). forbid names tools that must never appear; steps
support optional, require_success, tool_pattern, and args_subset.
How it composes¶
A consequential action flows through the whole layer:
declare (ReversalSpec on ToolSpec)
→ commit (ToolExecutor binds MutationRef, debits IrreversibilityBudget)
→ on failure → CompensationDispatcher (typed verdict → action, token-guarded)
→ on stale assumption → RecoveryLoop (in-window, approval-mode-separated)
→ grant authority for the *next* action (Autonomy, fed by ReversalClass)
→ account for it (Tokenomics: cost per trusted outcome)
→ prove the harness is wired + the path was right (Harness Audit + Trajectory Evals)
Every piece is additive and opt-in. The bright lines are deliberate: the reversibility layer
never executes a compensation; the dispatcher never reverses without a token; the
Recovery Loop never auto-reverses a non-AUTO action; the audit never claims a
control it can't probe.
Source map¶
| Concern | Module(s) |
|---|---|
| Reversal contracts | brain/kernel/contracts.py (ReversalSpec, MutationRef, ReversalClass, BlastRadius) |
| Binding + ledger | brain/kernel/reversibility.py; brain/kernel/run_context.py (IrreversibilityBudget) |
| Compensation dispatch | brain/kernel/compensation.py; brain/kernel/error_classifier.py |
| Recovery loop | brain/kernel/recovery_loop.py; brain/kernel/events.py (AssumptionInvalidated) |
| Earned autonomy | brain/kernel/autonomy.py |
| Tokenomics | brain/kernel/tokenomics.py |
| Audit + trajectory | brain/kernel/harness_audit.py; brain/kernel/trajectory_eval.py |
| Context source registry | brain/kernel/context_sources.py (owner / sensitivity / access mode / TTL) |
| Red-team coverage | brain/kernel/red_team.py (injection / tool-abuse / leakage / jailbreak fixture + checker) |
| Replay packets | brain/kernel/replay.py (pinned inputs/outputs/versions; reproduce with no live effects) |
| Agent charter | brain/kernel/agent_charter.py (purpose/owner/allowed+denied intents; deny-wins permits) |
| Memory read policy | brain/kernel/memory_read_policy.py (consent + freshness gate + query log) |
| Data classification | brain/kernel/data_classification.py (content → Sensitivity via PII/payment/regulatory detectors) |
| Durable execution | brain/kernel/durable_execution.py (WorkflowJournal step checkpoints + resume) |
| Release tuple | brain/kernel/release_tuple.py (prompt/model/policy/tools/context/eval/memory pin + diff) |
| Incident response | brain/kernel/incident_response.py (severity matrix runbook + scoped KillSwitch) |
| Contradiction ledger | brain/kernel/contradiction_ledger.py (conflict records + recency supersession) |
| Business metrics | brain/kernel/business_metrics.py (success/conversion/deflection/CSAT/revenue by intent+version) |
| CLI | sb harness audit; sb agent eval (trajectory block) |
All symbols are re-exported from brain.kernel. See
Policy and Approvals for the prevention layer this
builds on.