Environments¶

The environment subsystem is SecondBrain's local control plane for replayable agent-task episodes: reset a bounded state, apply structured actions, record observations, score rewards, and persist the trajectory in work.db.

Purpose¶

brain.environments provides an OpenEnv-shaped contract without moving tool execution or MCP calls into the training loop itself. Environment runners call reset, step, and state; MCP remains the production tool plane, while environment steps preserve reward and trajectory accounting.

The current slice is intentionally local and deterministic:

counter is a pure fixture used to validate lifecycle, rewards, persistence, and CLI behavior.
workspace is backed by SessionEnv, so file actions are root-bounded and suitable for the first coding-task environment shape.
/environments exposes the same local control plane through sb serve without routing trainer control calls through MCP.

Main Commands¶

sb env list --json
sb env run counter --target 3 --json
sb env run workspace --target-path answer.txt --answer "ready" --json
sb env run --task-file examples/env-task.yaml --json
sb env episodes --env-id workspace --json
sb env show <episode_id> --json
sb env replay <episode_id> --json
sb env export <episode_id> --format openenv-json
sb env export <episode_id> --format steps-jsonl --output rollouts.jsonl

sb env run persists episodes by default. Use --no-store for a transient demo run. For the workspace environment, omit --workspace-root to use an in-memory workspace, or pass a path to exercise a local SessionEnv root.

A checked-in example run lives at examples/environments/checkin/workspace-verifier-run/. It includes the task manifest, native episode JSON, OpenEnv-shaped rollout, steps JSONL, and replay result for one complete workspace verifier episode.

Task Manifests¶

sb env run --task-file <path> loads a JSON or YAML manifest and uses the manifest's env_id, reset options, optional task metadata, and optional action plan. The manifest format is intentionally small:

schema_version: secondbrain.environment_task.v1
env_id: workspace
task:
  task_id: workspace.write_answer
  goal: Write the expected answer file.
  success_conditions:
    - answer.txt equals expected_text
reset_options:
  target_path: answer.txt
verifiers:
  - type: file_exists
    name: answer_exists
    path: answer.txt
  - type: file_equals
    name: answer_exact
    path: answer.txt
    expected_text: "ready\n"
actions:
  - type: write_file
    payload:
      path: answer.txt
      content: "ready\n"
  - type: submit
metadata:
  suite: local-fixture

When actions is omitted, the CLI uses the built-in demo action plan for the chosen environment. Manifests do not create a new trust boundary; actions still flow through the same environment lifecycle and persistence path as hand-built actions.

reset_options are persisted with new CLI and serve episodes so a stored trajectory can be replayed later without guessing how the environment was initialized.

Workspace Verifiers¶

Workspace manifests can include declarative, non-command verifiers. submit runs every verifier, emits one reward component per verifier, and normalizes the score as passing weight divided by total weight.

Supported verifier types:

file_exists: requires path.
file_equals: requires path and exact expected_text.
file_contains: requires path and substring.
file_matches_regex: requires path and pattern.

Verifier paths are relative to the SessionEnv root and use the same path guard as workspace actions, so absolute paths and .. escapes fail visibly. Command execution remains an explicit run_command action; verifiers do not run shell commands in this slice.

Runtime Model¶

The core contracts live in brain/environments/models.py:

EnvironmentSpec and TaskSpec describe static environment and task shape.
Action, Observation, EnvironmentState, and StepResult form the reset/step API.
RewardComponentResult and RewardResult keep rewards inspectable.
EpisodeRecord and StepRecord are the persisted trajectory envelope.
VerifierSpec describes declarative workspace submit checks.

BaseEnvironment supplies lifecycle guardrails: callers must reset before stepping, terminal environments reject extra steps, and step indexes must remain consistent with observations.

Rollout Export¶

Persisted episodes can be exported for replay, evals, or trainer ingestion:

episode writes the native EpisodeRecord JSON envelope.
steps-jsonl writes one transition row per step, with action, observation, reward, terminal flags, per-step status, episode status, and step metadata.
openenv-json writes one OpenEnv-shaped rollout JSON object without requiring the OpenEnv package as a runtime dependency.

The OpenEnv-shaped export preserves SecondBrain's source-of-truth episode and step records. It is a data interchange format, not a separate execution path: training and evaluation still consume stored step results rather than calling MCP tools directly.

Replay¶

sb env replay <episode_id> replays a stored episode into a fresh environment instance and compares reward, terminal state, and stable state values. The command exits with a non-zero status when the replay diverges, which makes it a small regression check for environment changes.

Replay is intentionally separate from export:

export serializes a trajectory for another consumer.
replay re-executes the stored actions against current environment code.

The replay comparator ignores volatile workspace-root values, so a workspace episode can be replayed against a fresh in-memory SessionEnv while still detecting changes in target path, file entries, reward, or terminal status.

Autotune And Antahkarana¶

Environment episodes now feed the existing self-improvement loop as diagnostic pressure. SelfImprovementPlanner scans environment_episodes in work.db for task groups with low normalized reward or incomplete runs and emits environment_score targets. EnvironmentReplayBridge converts those targets into derived IdeaMemory hypotheses for existing autotune lanes such as repl_prompt. The same weak task group also queues a pending BenchmarkCandidate with source_type="environment_episode" and origin="real_failure" so operators can enrich and promote it into recent_failures for measurable regression pressure. Ideas and candidates are deduped by lane plus environment/task tags so repeated scans do not inflate the queue for the same weak task group.

SelfImprovementOrchestrator uses those targets in the same Antahkarana-backed cycle as Karma regrets and grounded trajectory gaps: it can seed ideas, register Sankalpa goals, run bounded autotune attempts, and let Chitta record strategy priors from the outcome. This does not add a new environment-specific mutation lane yet; environment failures are routed into established lane contracts until there is a deterministic evaluator and mutation surface for a dedicated lane.

Environment episodes are also projected into the quality control plane. Failed or low-reward task groups appear as environment:* scenario suites in sb quality summary, contribute to sb quality gate --surface environments, and can be promoted into internal replay cases with sb quality promote-replay. That replay promotion also creates or reuses a pending autotune quality_replay_case benchmark candidate when the replay maps to a known lane. Pending real-failure benchmark candidates are diagnosed as benchmark_pressure targets, so promoted environment replay pressure can select a lane even before another autotune run fails. When those failures drive sb autotune improve, the resulting target execution plan selects the evidence-bound pack (recent_failures, hard, or regression as needed) and carries the source episode id through the run. The persisted self_improvement_cycles record keeps the target-to-pack mapping, environment episode ids, seeded ideas, seeded benchmark candidate ids, goals, run summaries, and source closure status so a later reviewer can open the exact fixture queue entries created by the cycle.

Serve API¶

The serve API exposes environment control routes separately from /mcp:

GET  /environments
GET  /environments/episodes
POST /environments/{env_id}/episodes
GET  /environments/{env_id}/episodes/{episode_id}
GET  /environments/{env_id}/episodes/{episode_id}/state
POST /environments/{env_id}/episodes/{episode_id}/replay
POST /environments/{env_id}/episodes/{episode_id}/step

POST /environments/{env_id}/episodes resets an in-process fixture environment and persists the initial episode. POST .../step applies one structured Action, persists the resulting StepRecord, and removes the environment from the active in-process map once it terminates or truncates. POST .../replay creates a fresh fixture instance and returns a secondbrain.environment_replay.v1 comparison result without mutating the stored episode.

Workspace episodes created over HTTP use an in-memory SessionEnv; callers cannot provide an arbitrary local workspace root through the API. Local filesystem roots remain a CLI-only capability through sb env run workspace --workspace-root ....

Example workspace reset payload with verifiers:

{
  "options": {
    "target_path": "answer.txt",
    "verifiers": [
      {
        "type": "file_contains",
        "name": "mentions_ready",
        "path": "answer.txt",
        "substring": "ready"
      }
    ]
  }
}

Persistence¶

Episodes and steps are stored in work.db through EnvironmentStore.

The work-domain migration is:

brain/db/migrations/work/014_environment_subsystem.sql

Tables:

environment_episodes
environment_steps

The tables are registered in brain/db/topology.py::WORK_TABLES, so split-state checks and table ownership stay explicit.

Workspace Environment¶

WorkspaceEnvironment uses brain.runtime.SessionEnv as its trust boundary. Actions include:

write_file
read_file
list_dir
run_command
submit

Paths are interpreted relative to the environment root. Absolute paths and .. escapes are rejected by SessionEnv. Without explicit verifiers, submit keeps the backward-compatible default of checking whether the target file exactly matches the expected text.

This is not yet the full coding-agent environment. The next larger step should connect task workspaces and background sessions while keeping trainer control calls separate from MCP tool calls.

Important Code Paths¶

brain/environments/models.py
brain/environments/base.py
brain/environments/rewards.py
brain/environments/export.py
brain/environments/replay.py
brain/environments/manifest.py
brain/environments/fixtures.py
brain/environments/workspace_env.py
brain/environments/store.py
brain/autotune/environment_bridge.py
brain/autotune/self_improve.py
brain/cli/env_cmd.py
brain/serve/routers/environments.py
brain/runtime/session_env.py

Tests¶

Focused tests live under:

tests/environments/
tests/infra/test_env_cli.py
tests/infra/test_env_serve_api.py

Useful narrow loop:

.venv/bin/python -m pytest -q tests/environments tests/infra/test_env_cli.py
.venv/bin/ruff check brain/environments brain/cli/env_cmd.py tests/environments tests/infra/test_env_cli.py

When the CLI surface changes, regenerate the schema and reference docs:

.venv/bin/python -m brain.cli ui-schema --write-default
.venv/bin/python -m brain.cli docs cli-reference --write

When this page changes, refresh the docs index:

.venv/bin/python -m brain.cli codebase docs-index --write