Skip to content

Local Agent Stack

SecondBrain can route private work to local model profiles and run generated code or shell commands under deterministic execution-isolation policy. The local-agent stack has two separable responsibilities:

  • Model fit: choose an explainable local model profile from detected RAM, accelerator memory, context requirements, quantization, and native tool-call support.
  • Execution isolation: preflight commands, enforce allowlists and denylists, restrict network access, and require Docker/E2B isolation for write-like actions unless unsafe host execution is explicitly enabled.

Model Profiles

Local model profiles live in brain/providers/model_profiles.py and are merged with models.local_profiles from configuration. Each profile records the provider, model name, backend, quantization, weight format, context policy, memory envelope, and tool-calling support.

Use the provider diagnostics command to see the current host fit:

sb providers local-profiles --context 4096 --requires-tools
sb providers local-profiles --context 32768 --requires-tools --json

The output includes:

  • detected hardware and effective accelerator memory
  • every profile decision as fit, degraded, or unavailable
  • score, effective context window, reasons, and warnings
  • the preferred profile ordering from model_routing.preferred_local_profile

Configured profiles override built-ins by profile_id:

models:
  local_profiles:
    - profile_id: local-qwen-custom
      provider: ollama
      model: qwen2.5:7b-instruct-q4_K_M
      backend: ollama
      locality: local
      parameter_count_b: 7
      quantization: q4_K_M
      weight_format: gguf
      context_tokens: 32768
      min_ram_gib: 10
      recommended_ram_gib: 16
      min_vram_gib: 6
      recommended_vram_gib: 10
      supports_native_tool_calling: true

model_routing:
  preferred_local_profile: local-qwen-custom

Hardware Detection

Hardware detection is offline-safe and dependency-light. It uses platform data for CPU/RAM, nvidia-smi when available for CUDA memory, and Apple Silicon platform signals for unified-memory MPS fit checks.

Ollama-managed GGUF profiles use a VRAM-aware context policy:

Effective accelerator memory Default effective context
Below 24 GiB or unknown 4,096 tokens
24 GiB to below 48 GiB 32,768 tokens
48 GiB or more 256,000 tokens

Operators can override that policy with OLLAMA_CONTEXT_LENGTH or SB_OLLAMA_CONTEXT_LENGTH.

Execution Isolation

The central policy lives in brain/sandbox/policy.py. It is used by shell.exec and by the sb policy preflight diagnostics command.

Recommended private-agent posture:

sandbox:
  backend: docker
  fail_if_unavailable: true

shell_exec:
  enabled: true
  isolation_backend: docker
  network_mode: none
  require_hard_sandbox_for_writes: true
  allow_unsafe_local: false
  writable_roots:
    - /path/to/approved/workspace

Preflight a command before giving it to an agent surface:

sb policy preflight "python3 -c \"print('ok')\"" --json
sb policy preflight "touch output.txt" --backend docker --network none --json

Policy decisions include parse status, command risk, write intent, backend, network mode, working directory containment, and any approval requirement.

Docker Shell Execution

When shell_exec.isolation_backend is docker, shell.exec mounts the approved working directory at /workspace and runs the command in a hardened ephemeral container:

  • no network by default when network_mode: none
  • read-only container filesystem
  • writable /tmp tmpfs with noexec and nosuid
  • dropped Linux capabilities
  • no-new-privileges
  • bounded process count
  • host UID/GID for mounted workspace writes

If Docker is not reachable, shell.exec fails closed instead of falling back to host execution. E2B remains supported for generated Python code paths, but arbitrary shell execution requires Docker or explicit unsafe local execution.

Diagnostics API

sb serve exposes the authenticated local-agent stack diagnostics endpoint:

GET /diagnostics/local-agent-stack?context=4096&requires_tools=true

The response includes a single score and status plus detailed sections:

  • hardware
  • model_profiles
  • execution_isolation
  • recommendations

Use this endpoint for operator UI surfaces that need to show whether the current machine is ready for private local-agent work.