Environments¶
The environment subsystem is SecondBrain's local control plane for replayable
agent-task episodes: reset a bounded state, apply structured actions, record
observations, score rewards, and persist the trajectory in work.db.
Purpose¶
brain.environments provides an OpenEnv-shaped contract without moving tool
execution or MCP calls into the training loop itself. Environment runners call
reset, step, and state; MCP remains the production tool plane, while
environment steps preserve reward and trajectory accounting.
The current slice is intentionally local and deterministic:
counteris a pure fixture used to validate lifecycle, rewards, persistence, and CLI behavior.workspaceis backed bySessionEnv, so file actions are root-bounded and suitable for the first coding-task environment shape./environmentsexposes the same local control plane throughsb servewithout routing trainer control calls through MCP.
Main Commands¶
sb env list --json
sb env run counter --target 3 --json
sb env run workspace --target-path answer.txt --answer "ready" --json
sb env run --task-file examples/env-task.yaml --json
sb env episodes --env-id workspace --json
sb env show <episode_id> --json
sb env replay <episode_id> --json
sb env export <episode_id> --format openenv-json
sb env export <episode_id> --format steps-jsonl --output rollouts.jsonl
sb env run persists episodes by default. Use --no-store for a transient demo
run. For the workspace environment, omit --workspace-root to use an in-memory
workspace, or pass a path to exercise a local SessionEnv root.
A checked-in example run lives at
examples/environments/checkin/workspace-verifier-run/. It includes the task
manifest, native episode JSON, OpenEnv-shaped rollout, steps JSONL, and replay
result for one complete workspace verifier episode.
Task Manifests¶
sb env run --task-file <path> loads a JSON or YAML manifest and uses the
manifest's env_id, reset options, optional task metadata, and optional action
plan. The manifest format is intentionally small:
schema_version: secondbrain.environment_task.v1
env_id: workspace
task:
task_id: workspace.write_answer
goal: Write the expected answer file.
success_conditions:
- answer.txt equals expected_text
reset_options:
target_path: answer.txt
verifiers:
- type: file_exists
name: answer_exists
path: answer.txt
- type: file_equals
name: answer_exact
path: answer.txt
expected_text: "ready\n"
actions:
- type: write_file
payload:
path: answer.txt
content: "ready\n"
- type: submit
metadata:
suite: local-fixture
When actions is omitted, the CLI uses the built-in demo action plan for the
chosen environment. Manifests do not create a new trust boundary; actions still
flow through the same environment lifecycle and persistence path as hand-built
actions.
reset_options are persisted with new CLI and serve episodes so a stored
trajectory can be replayed later without guessing how the environment was
initialized.
Workspace Verifiers¶
Workspace manifests can include declarative, non-command verifiers. submit
runs every verifier, emits one reward component per verifier, and normalizes the
score as passing weight divided by total weight.
Supported verifier types:
file_exists: requirespath.file_equals: requirespathand exactexpected_text.file_contains: requirespathandsubstring.file_matches_regex: requirespathandpattern.
Verifier paths are relative to the SessionEnv root and use the same path guard
as workspace actions, so absolute paths and .. escapes fail visibly. Command
execution remains an explicit run_command action; verifiers do not run shell
commands in this slice.
Runtime Model¶
The core contracts live in brain/environments/models.py:
EnvironmentSpecandTaskSpecdescribe static environment and task shape.Action,Observation,EnvironmentState, andStepResultform thereset/stepAPI.RewardComponentResultandRewardResultkeep rewards inspectable.EpisodeRecordandStepRecordare the persisted trajectory envelope.VerifierSpecdescribes declarative workspace submit checks.
BaseEnvironment supplies lifecycle guardrails: callers must reset before
stepping, terminal environments reject extra steps, and step indexes must remain
consistent with observations.
Rollout Export¶
Persisted episodes can be exported for replay, evals, or trainer ingestion:
episodewrites the nativeEpisodeRecordJSON envelope.steps-jsonlwrites one transition row per step, with action, observation, reward, terminal flags, per-step status, episode status, and step metadata.openenv-jsonwrites one OpenEnv-shaped rollout JSON object without requiring the OpenEnv package as a runtime dependency.
The OpenEnv-shaped export preserves SecondBrain's source-of-truth episode and
step records. It is a data interchange format, not a separate execution path:
training and evaluation still consume stored step results rather than calling
MCP tools directly.
Replay¶
sb env replay <episode_id> replays a stored episode into a fresh environment
instance and compares reward, terminal state, and stable state values. The
command exits with a non-zero status when the replay diverges, which makes it a
small regression check for environment changes.
Replay is intentionally separate from export:
- export serializes a trajectory for another consumer.
- replay re-executes the stored actions against current environment code.
The replay comparator ignores volatile workspace-root values, so a workspace
episode can be replayed against a fresh in-memory SessionEnv while still
detecting changes in target path, file entries, reward, or terminal status.
Autotune And Antahkarana¶
Environment episodes now feed the existing self-improvement loop as diagnostic
pressure. SelfImprovementPlanner scans environment_episodes in work.db for
task groups with low normalized reward or incomplete runs and emits
environment_score targets. EnvironmentReplayBridge converts those targets
into derived IdeaMemory hypotheses for existing autotune lanes such as
repl_prompt. The same weak task group also queues a pending
BenchmarkCandidate with source_type="environment_episode" and
origin="real_failure" so operators can enrich and promote it into
recent_failures for measurable regression pressure. Ideas and candidates are
deduped by lane plus environment/task tags so repeated scans do not inflate the
queue for the same weak task group.
SelfImprovementOrchestrator uses those targets in the same Antahkarana-backed
cycle as Karma regrets and grounded trajectory gaps: it can seed ideas, register
Sankalpa goals, run bounded autotune attempts, and let Chitta record strategy
priors from the outcome. This does not add a new environment-specific mutation
lane yet; environment failures are routed into established lane contracts until
there is a deterministic evaluator and mutation surface for a dedicated lane.
Environment episodes are also projected into the quality control plane. Failed
or low-reward task groups appear as environment:* scenario suites in
sb quality summary, contribute to sb quality gate --surface environments,
and can be promoted into internal replay cases with sb quality promote-replay.
That replay promotion also creates or reuses a pending autotune
quality_replay_case benchmark candidate when the replay maps to a known lane.
Pending real-failure benchmark candidates are diagnosed as benchmark_pressure
targets, so promoted environment replay pressure can select a lane even before
another autotune run fails.
When those failures drive sb autotune improve, the resulting
target execution plan selects the evidence-bound pack (recent_failures,
hard, or regression as needed) and carries the source episode id through the
run. The persisted self_improvement_cycles record keeps the target-to-pack
mapping, environment episode ids, seeded ideas, seeded benchmark candidate ids,
goals, run summaries, and source closure status so a later reviewer can open the
exact fixture queue entries created by the cycle.
Serve API¶
The serve API exposes environment control routes separately from /mcp:
GET /environments
GET /environments/episodes
POST /environments/{env_id}/episodes
GET /environments/{env_id}/episodes/{episode_id}
GET /environments/{env_id}/episodes/{episode_id}/state
POST /environments/{env_id}/episodes/{episode_id}/replay
POST /environments/{env_id}/episodes/{episode_id}/step
POST /environments/{env_id}/episodes resets an in-process fixture environment
and persists the initial episode. POST .../step applies one structured
Action, persists the resulting StepRecord, and removes the environment from
the active in-process map once it terminates or truncates. POST .../replay
creates a fresh fixture instance and returns a secondbrain.environment_replay.v1
comparison result without mutating the stored episode.
Workspace episodes created over HTTP use an in-memory SessionEnv; callers
cannot provide an arbitrary local workspace root through the API. Local
filesystem roots remain a CLI-only capability through sb env run workspace
--workspace-root ....
Example workspace reset payload with verifiers:
{
"options": {
"target_path": "answer.txt",
"verifiers": [
{
"type": "file_contains",
"name": "mentions_ready",
"path": "answer.txt",
"substring": "ready"
}
]
}
}
Persistence¶
Episodes and steps are stored in work.db through EnvironmentStore.
The work-domain migration is:
Tables:
environment_episodesenvironment_steps
The tables are registered in brain/db/topology.py::WORK_TABLES, so split-state
checks and table ownership stay explicit.
Workspace Environment¶
WorkspaceEnvironment uses brain.runtime.SessionEnv as its trust boundary.
Actions include:
write_fileread_filelist_dirrun_commandsubmit
Paths are interpreted relative to the environment root. Absolute paths and
.. escapes are rejected by SessionEnv. Without explicit verifiers, submit
keeps the backward-compatible default of checking whether the target file
exactly matches the expected text.
This is not yet the full coding-agent environment. The next larger step should connect task workspaces and background sessions while keeping trainer control calls separate from MCP tool calls.
Important Code Paths¶
brain/environments/models.pybrain/environments/base.pybrain/environments/rewards.pybrain/environments/export.pybrain/environments/replay.pybrain/environments/manifest.pybrain/environments/fixtures.pybrain/environments/workspace_env.pybrain/environments/store.pybrain/autotune/environment_bridge.pybrain/autotune/self_improve.pybrain/cli/env_cmd.pybrain/serve/routers/environments.pybrain/runtime/session_env.py
Tests¶
Focused tests live under:
Useful narrow loop:
.venv/bin/python -m pytest -q tests/environments tests/infra/test_env_cli.py
.venv/bin/ruff check brain/environments brain/cli/env_cmd.py tests/environments tests/infra/test_env_cli.py
When the CLI surface changes, regenerate the schema and reference docs:
.venv/bin/python -m brain.cli ui-schema --write-default
.venv/bin/python -m brain.cli docs cli-reference --write
When this page changes, refresh the docs index: