Evals¶
SecondBrain has multiple eval and quality layers already.
Main Locations¶
brain/eval/brain/evals/brain/evals/fixtures/for small committed fixture datasetsbrain/quality/- tests under
tests/eval/,tests/evals/, and related feature directories
There is intentionally no root-level evals/ directory. The repo does not
try to imply that every external benchmark or private evaluation suite is
checked in here; committed eval inputs live beside the harnesses that use them.
Practical Eval Types In The Repo¶
- deterministic unit/contract tests
- CLI smoke tests
- pattern goldens
- quality scorecards and report surfaces
- memory benchmark suites via
sb memory benchmark, persisted asmemory.*eval runs for synthetic and local public-benchmark-shaped fixtures
Public 0.3.0 Guidance¶
For new work, prefer:
- a focused unit or contract test
- one runnable smoke path for the user-facing workflow
- only then a heavier eval harness if the feature really needs it
Adding More¶
- keep eval inputs synthetic and safe
- avoid provider/network dependence by default
- add evals or fixtures near the subsystem they validate
- document the manual command used to run the eval