Skip to content

Evals

SecondBrain has multiple eval and quality layers already.

Main Locations

  • brain/eval/
  • brain/evals/
  • brain/evals/fixtures/ for small committed fixture datasets
  • brain/quality/
  • tests under tests/eval/, tests/evals/, and related feature directories

There is intentionally no root-level evals/ directory. The repo does not try to imply that every external benchmark or private evaluation suite is checked in here; committed eval inputs live beside the harnesses that use them.

Practical Eval Types In The Repo

  • deterministic unit/contract tests
  • CLI smoke tests
  • pattern goldens
  • quality scorecards and report surfaces
  • memory benchmark suites via sb memory benchmark, persisted as memory.* eval runs for synthetic and local public-benchmark-shaped fixtures

Public 0.3.0 Guidance

For new work, prefer:

  1. a focused unit or contract test
  2. one runnable smoke path for the user-facing workflow
  3. only then a heavier eval harness if the feature really needs it

Adding More

  • keep eval inputs synthetic and safe
  • avoid provider/network dependence by default
  • add evals or fixtures near the subsystem they validate
  • document the manual command used to run the eval