Skip to content

The Survival Layer

Purpose

Policy, sandboxing, and approvals (see Policy and Approvals) are prevention — they reduce the chance an agent does the wrong thing. But prevention has a residual failure rate, and some actions are only discovered to be wrong after they were validly approved (a refund issued while an order was unshipped, then the order ships). The Survival Layer is what SecondBrain does after a consequential action commits.

The framing is one formula:

expected_harm ≈ P(unintended action) × blast_radius × recovery_cost

Prevention shrinks the first term and never reaches zero. The Survival Layer bounds and observes the other two — it makes consequences declarable, recoverable, budgeted, and measurable.

It is five composable subsystems, all in brain/kernel/, all pure/deterministic and opt-in (nothing changes behaviour until a caller wires it in):

Subsystem Module Answers
Reversibility reversibility.py + contracts.py How can this action be undone?
Compensation & Recovery compensation.py, recovery_loop.py What do we do when it fails or its assumption breaks?
Earned Autonomy autonomy.py How much is this agent allowed to do right now?
Tokenomics tokenomics.py What did a trusted outcome cost?
Harness Audit + Trajectory Evals harness_audit.py, trajectory_eval.py Is the harness actually wired, and did the run take the right path?

1. Reversibility — declare how an action is undone

A tool declares how it can be undone, separately from how risky it is. The declaration lives on the ToolSpec:

from brain.kernel.contracts import ReversalSpec, ReversalClass, BlastRadius, ToolSpec

spec = ToolSpec(
    tool_id="payments.issue_refund",
    name="Issue refund",
    safety_class=SafetyClass.NETWORK,
    reversal=ReversalSpec(
        reversal_class=ReversalClass.COMPENSABLE,        # reversible | compensable | irreversible
        compensation_capability="payments.reverse_refund",
        recovery_approval_mode=RecoveryApprovalMode.HUMAN,
        reversal_window_ms=3_600_000,                    # 1h to compensate
        blast_radius=BlastRadius(max_calls_per_run=1, max_value_per_run=5000),
    ),
    metadata={"risk_dimension": "money", "value_arg": "amount"},
)

If a side-effecting tool declares no reversal, ToolSpec.resolved_reversal() derives a conservative default from its safety_class: READ_ONLY → reversible no-op, LOCAL_WRITE → reversible (checkpoint the prior value), NETWORK → compensable, DESTRUCTIVEirreversible (gate it). Reversal specs are kept out of the v1 tool-contract fingerprint, so adding one never forces an approved baseline to re-approve.

At commit time the ToolExecutor (when constructed with reversibility="observe" or "enforce") binds a typed MutationRef — an executable recovery pointer, not a log breadcrumb. Reversal args, the conditions that justified the action (assumption_refs), the external transaction handle, and an idempotency key are all captured now so recovery never reconstructs intent from logs. MutationRefs are persisted in a MutationStore (SQLite) for the Recovery Loop to read.

An IrreversibilityBudget on the RunContext tracks un-undoable consequence as a vector across money / comms / commitments / exposure / regulatory. Reversible actions accrue no debt, compensable ones a fraction (0.1×), irreversible ones full value (1.0×). Crossing a ceiling escalates (asks for human judgment) — it does not hard-deny.

The reversibility layer binds, accounts, and persists. It deliberately executes no compensation — that is the Recovery Loop, below.


2. Compensation & Recovery — what to do when things go wrong

There are two distinct failure shapes, and they share one engine.

Synchronous: typed-verdict compensation (compensation.py)

When a tool call fails, "what failed" (a typed ErrorKind) is kept separate from "what to do" (CompensationAction). The PLAYBOOK maps one to the other, and the routings a naive retry loop gets wrong are the whole point:

Verdict Compensation Why
IDEMPOTENCY_CONFLICT DEPRECATE_AND_REPLAN the duplicate already succeeded upstream — retrying is the double-refund bug
EVIDENCE_STALE REFRESH_EVIDENCE re-retrieve then retry once; a blind retry just fails the same way
POLICY_DENIED / CONTENT_FILTER / REFUSAL ESCALATE_TO_HUMAN the model can't work around a policy decision
TIMEOUT / TRANSIENT_NETWORK / RATE_LIMITED / SERVER RETRY_WITH_BACKOFF genuinely transient
from brain.kernel.compensation import CompensationDispatcher, verdict_for

dispatcher = CompensationDispatcher(mutation_store=store)
outcome = dispatcher.dispatch(ctx, tool_id="payments.issue_refund",
                              call_id=call_id, args=args, result=failed_result)
# outcome.kind ∈ {retry_scheduled, evidence_refreshed, deprecated, escalated, reversed, ...}

verdict_for() unifies the two failure representations SB has — ToolErrorCode on a ToolResult (executor path) and exceptions via the classifier (provider path). The dispatcher is pure and injectable: retries/refreshes/escalations are returned as directives (or run through caller-supplied callables), so the decision logic is unit-testable without a live run.

Post-hoc: the Recovery Loop (recovery_loop.py)

When a successful action's justifying assumption later turns false, an AssumptionInvalidated event on the in-process EventBus drives the RecoveryLoop:

from brain.kernel.recovery_loop import RecoveryLoop, registry_approval_resolver
from brain.kernel.events import AssumptionInvalidated

loop = RecoveryLoop(
    mutation_store=store,
    dispatcher=dispatcher,
    reversal_runner=run_compensation,                 # performs the actual reversal
    approval_resolver=registry_approval_resolver(registry),
    decision_store=decision_store,
)
loop.subscribe(bus)
bus.emit(AssumptionInvalidated(run_id=run_id, assumption_ref="order_not_shipped"))

The loop finds in-window MutationRefs whose assumption_refs contain the invalidated ref and recovers them. Its defining property is approval-mode separation: a reversal does not inherit the original action's approval mode. A refund may have auto-executed, but reversing it can require human review. Only RecoveryApprovalMode.AUTO auto-fires; HUMAN / DUAL_CONTROL escalate and are never auto-run (the default resolver returns HUMAN). Every recovery emits a RuntimeDecisionRecord linked to the original via reverses_mutation_ref + assumption_refs.

The reversal-token guard

Both paths route reversals through one guard: a reversal never fires without a paired token. In SecondBrain the token is the reversibility layer's MutationRef. No in-window MutationRef for the call → reversal_refused, always.


3. Earned Autonomy (autonomy.py)

Autonomy is a budget an agent earns per task, not a static switch. A weighted score over eight signals maps to a five-rung mode ladder, then hard caps clamp it down:

from brain.kernel.autonomy import AutonomySignals, AutonomyCaps, compute_autonomy

decision = compute_autonomy(
    AutonomySignals(
        evidence_confidence=0.9, policy_confidence=1.0, eval_score=0.85,
        tool_reliability=0.9, reversibility=0.5,   # ← fed from the ReversalClass
        task_risk=0.2, user_impact=0.3,
    ),
    AutonomyCaps(policy_ceiling=AutonomyMode.EXECUTE_WITH_APPROVAL),
)
# decision.mode → read | recommend | draft | execute_with_approval | auto_execute
# decision.capped_by names any ceiling that held it down

Signals: five earn autonomy (evidence/policy/eval 0.20 each, tool-reliability/reversibility 0.15 each), two spend it (task-risk, user-impact −0.10). The lowest cap wins, so a clean eval record can't buy past a policy ceiling. reversibility_score(ReversalClass) (1.0 / 0.5 / 0.0) is the concrete tie from the reversibility layer — an easily-undone action earns more authority.

Autonomy also degrades automatically. degrade() drops a rung on a policy violation (and floors to READ on a safety violation); AutonomyState freezes promotion after a failed replay or a cluster of operator corrections until unfreeze() — while frozen, reconsider() may lower the mode but never raise it.


4. Tokenomics (tokenomics.py)

"Cost per token is the AI-factory metric; cost per trusted outcome is the enterprise metric." An outcome is trusted only when it is accepted ∧ grounded ∧ policy-compliant. The ledger supplies the missing denominator:

from brain.kernel.tokenomics import TokenomicsLedger, OutcomeRecord, CostBreakdown

ledger = TokenomicsLedger()
ledger.record(OutcomeRecord(
    workflow_id="support.refund",
    cost=CostBreakdown(model_cost_cents=120, eval_tokens=40, retry_tokens=10, ...),
    accepted=True, grounded=True, policy_compliant=True,
))
report = ledger.report(workflow_id="support.refund")
report.cost_per_trusted_outcome_cents   # None (not 0) when zero trusted — honest
report.retry_token_ratio, report.eval_overhead_ratio, report.cache_hit_rate

cost_per_trusted_outcome is None when there are no trusted outcomes — an honest "undefined" rather than a misleading zero.


5. Harness Audit + Trajectory Evals

Self-grading audit (harness_audit.py)

sb harness audit grades the runtime against the eight-outcome / forty-control ContextOS harness audit. Each control runs a deterministic probe (is the subsystem wired?) and returns pass / partial / fail with evidence and a P0/P1/P2 severity — an honest structural scorecard, not a green wall.

sb harness audit                 # rich scorecard + by-outcome rollup
sb harness audit --json          # machine-readable
sb harness audit --fail-under 0.8   # CI gate; exits 2 on a P0 blocker, 1 below threshold

It is deterministic (pure import/attribute checks — same answer every run), so it can gate CI. The current self-grade is 100% (40 pass / 0 partial / 0 fail), no P0 blockers — every one of the eight outcomes is fully wired. The follow-on modules below (context-source registry, red-team suite, replay packets) plus the governance/measurement modules (agent charter, memory-read policy, data classification, durable execution, release tuple, incident response, contradiction ledger, business metrics) closed the last structural gaps.

Trajectory evals (trajectory_eval.py)

A final-answer scorecard can't catch a run that lands the right answer the wrong way (a double refund, a skipped eligibility check). Trajectory evals grade the tool path. They run standalone or as part of an eval suite — add a trajectory: block to any sb agent eval case:

cases:
  - id: refund_path
    input: "refund my unshipped order"
    matchers:
      - kind: contains
        text: "refund issued"
    trajectory:
      mode: subsequence              # strict | subsequence | unordered
      forbid: ["payments.issue_refund_again"]
      steps:
        - tool: orders.read_booking
        - tool: payments.issue_refund
          args: {amount: 500}

The case passes only when every matcher and the trajectory pass; the report carries a trajectory_score and, on failure, a [trajectory] score=… (forbidden:…, missing:…) line. Modes: strict (exact ordered), subsequence (in order, extras allowed), unordered (set). forbid names tools that must never appear; steps support optional, require_success, tool_pattern, and args_subset.


How it composes

A consequential action flows through the whole layer:

declare (ReversalSpec on ToolSpec)
  → commit (ToolExecutor binds MutationRef, debits IrreversibilityBudget)
    → on failure   → CompensationDispatcher (typed verdict → action, token-guarded)
    → on stale assumption → RecoveryLoop (in-window, approval-mode-separated)
  → grant authority for the *next* action (Autonomy, fed by ReversalClass)
  → account for it (Tokenomics: cost per trusted outcome)
  → prove the harness is wired + the path was right (Harness Audit + Trajectory Evals)

Every piece is additive and opt-in. The bright lines are deliberate: the reversibility layer never executes a compensation; the dispatcher never reverses without a token; the Recovery Loop never auto-reverses a non-AUTO action; the audit never claims a control it can't probe.


Source map

Concern Module(s)
Reversal contracts brain/kernel/contracts.py (ReversalSpec, MutationRef, ReversalClass, BlastRadius)
Binding + ledger brain/kernel/reversibility.py; brain/kernel/run_context.py (IrreversibilityBudget)
Compensation dispatch brain/kernel/compensation.py; brain/kernel/error_classifier.py
Recovery loop brain/kernel/recovery_loop.py; brain/kernel/events.py (AssumptionInvalidated)
Earned autonomy brain/kernel/autonomy.py
Tokenomics brain/kernel/tokenomics.py
Audit + trajectory brain/kernel/harness_audit.py; brain/kernel/trajectory_eval.py
Context source registry brain/kernel/context_sources.py (owner / sensitivity / access mode / TTL)
Red-team coverage brain/kernel/red_team.py (injection / tool-abuse / leakage / jailbreak fixture + checker)
Replay packets brain/kernel/replay.py (pinned inputs/outputs/versions; reproduce with no live effects)
Agent charter brain/kernel/agent_charter.py (purpose/owner/allowed+denied intents; deny-wins permits)
Memory read policy brain/kernel/memory_read_policy.py (consent + freshness gate + query log)
Data classification brain/kernel/data_classification.py (content → Sensitivity via PII/payment/regulatory detectors)
Durable execution brain/kernel/durable_execution.py (WorkflowJournal step checkpoints + resume)
Release tuple brain/kernel/release_tuple.py (prompt/model/policy/tools/context/eval/memory pin + diff)
Incident response brain/kernel/incident_response.py (severity matrix runbook + scoped KillSwitch)
Contradiction ledger brain/kernel/contradiction_ledger.py (conflict records + recency supersession)
Business metrics brain/kernel/business_metrics.py (success/conversion/deflection/CSAT/revenue by intent+version)
CLI sb harness audit; sb agent eval (trajectory block)

All symbols are re-exported from brain.kernel. See Policy and Approvals for the prevention layer this builds on.