Evals¶

SecondBrain has multiple eval and quality layers already.

Main Locations¶

brain/eval/
brain/evals/
brain/evals/fixtures/ for small committed fixture datasets
brain/quality/
tests under tests/eval/, tests/evals/, and related feature directories

There is intentionally no root-level evals/ directory. The repo does not try to imply that every external benchmark or private evaluation suite is checked in here; committed eval inputs live beside the harnesses that use them.

Practical Eval Types In The Repo¶

deterministic unit/contract tests
CLI smoke tests
pattern goldens
quality scorecards and report surfaces
memory benchmark suites via sb memory benchmark, persisted as memory.* eval runs for synthetic and local public-benchmark-shaped fixtures

Public `0.3.0` Guidance¶

For new work, prefer:

a focused unit or contract test
one runnable smoke path for the user-facing workflow
only then a heavier eval harness if the feature really needs it

Adding More¶

keep eval inputs synthetic and safe
avoid provider/network dependence by default
add evals or fixtures near the subsystem they validate
document the manual command used to run the eval

Evals¶

Main Locations¶

Practical Eval Types In The Repo¶

Public 0.3.0 Guidance¶

Adding More¶

Public `0.3.0` Guidance¶