Local Agent Stack¶
SecondBrain can route private work to local model profiles and run generated code or shell commands under deterministic execution-isolation policy. The local-agent stack has two separable responsibilities:
- Model fit: choose an explainable local model profile from detected RAM, accelerator memory, context requirements, quantization, and native tool-call support.
- Execution isolation: preflight commands, enforce allowlists and denylists, restrict network access, and require Docker/E2B isolation for write-like actions unless unsafe host execution is explicitly enabled.
Model Profiles¶
Local model profiles live in brain/providers/model_profiles.py and are merged
with models.local_profiles from configuration. Each profile records the
provider, model name, backend, quantization, weight format, context policy,
memory envelope, and tool-calling support.
Use the provider diagnostics command to see the current host fit:
sb providers local-profiles --context 4096 --requires-tools
sb providers local-profiles --context 32768 --requires-tools --json
The output includes:
- detected hardware and effective accelerator memory
- every profile decision as
fit,degraded, orunavailable - score, effective context window, reasons, and warnings
- the preferred profile ordering from
model_routing.preferred_local_profile
Configured profiles override built-ins by profile_id:
models:
local_profiles:
- profile_id: local-qwen-custom
provider: ollama
model: qwen2.5:7b-instruct-q4_K_M
backend: ollama
locality: local
parameter_count_b: 7
quantization: q4_K_M
weight_format: gguf
context_tokens: 32768
min_ram_gib: 10
recommended_ram_gib: 16
min_vram_gib: 6
recommended_vram_gib: 10
supports_native_tool_calling: true
model_routing:
preferred_local_profile: local-qwen-custom
Hardware Detection¶
Hardware detection is offline-safe and dependency-light. It uses platform data
for CPU/RAM, nvidia-smi when available for CUDA memory, and Apple Silicon
platform signals for unified-memory MPS fit checks.
Ollama-managed GGUF profiles use a VRAM-aware context policy:
| Effective accelerator memory | Default effective context |
|---|---|
| Below 24 GiB or unknown | 4,096 tokens |
| 24 GiB to below 48 GiB | 32,768 tokens |
| 48 GiB or more | 256,000 tokens |
Operators can override that policy with OLLAMA_CONTEXT_LENGTH or
SB_OLLAMA_CONTEXT_LENGTH.
Execution Isolation¶
The central policy lives in brain/sandbox/policy.py. It is used by
shell.exec and by the sb policy preflight diagnostics command.
Recommended private-agent posture:
sandbox:
backend: docker
fail_if_unavailable: true
shell_exec:
enabled: true
isolation_backend: docker
network_mode: none
require_hard_sandbox_for_writes: true
allow_unsafe_local: false
writable_roots:
- /path/to/approved/workspace
Preflight a command before giving it to an agent surface:
sb policy preflight "python3 -c \"print('ok')\"" --json
sb policy preflight "touch output.txt" --backend docker --network none --json
Policy decisions include parse status, command risk, write intent, backend, network mode, working directory containment, and any approval requirement.
Docker Shell Execution¶
When shell_exec.isolation_backend is docker, shell.exec mounts the approved
working directory at /workspace and runs the command in a hardened ephemeral
container:
- no network by default when
network_mode: none - read-only container filesystem
- writable
/tmptmpfs withnoexecandnosuid - dropped Linux capabilities
no-new-privileges- bounded process count
- host UID/GID for mounted workspace writes
If Docker is not reachable, shell.exec fails closed instead of falling back to
host execution. E2B remains supported for generated Python code paths, but
arbitrary shell execution requires Docker or explicit unsafe local execution.
Diagnostics API¶
sb serve exposes the authenticated local-agent stack diagnostics endpoint:
The response includes a single score and status plus detailed sections:
hardwaremodel_profilesexecution_isolationrecommendations
Use this endpoint for operator UI surfaces that need to show whether the current machine is ready for private local-agent work.