AutoData — Agentic Self-Instruct¶
Re-implementation of Meta AI's AutoData (Kulikov et al., 2026) inside SecondBrain. Generates high-quality grounded QA datasets through a five-agent loop, scores quality with the AutoData paper's gates, and exposes its tuning surface as a first-class autotune lane plus an MCP tool.
When to use it¶
| Use it for | Don't use it for |
|---|---|
| Building a synthetic QA training set from a corpus you trust | One-off Q&A — use sb ask |
| Stress-testing a new strong model against your weak baseline | Inference at runtime |
| Producing eval data with verifiable quality gates and gap discrimination | Anything time-critical (real-LLM mode burns tokens) |
| Tuning the data-quality thresholds via the autotune lane | Free-form text generation |
The five sub-agents¶
┌────────────────┐
source.md ──────▶│ Challenger │── proposes QA + rubric
└───────┬────────┘
│
▼
┌────────────────┐ rejects on context-leak,
│ QualityVerifier│ rubric coverage, generic Q
└───────┬────────┘
│ pass
▼
┌───────────────────────────────┐
│ N × WeakSolver + Judge │── weak_avg, weak_max
└───────────────────────────────┘
┌───────────────────────────────┐
│ N × StrongSolver + Judge │── strong_avg, strong_min
└───────────────────────────────┘
│
▼
┌────────────────┐
│ Acceptance gates│
│ • weak_avg ≤ 0.65
│ • strong_avg ∈ [0.60,0.95]
│ • gap ≥ 0.20
└───────┬────────┘
│ │
accept reject → categorized feedback
│ ↑ │
│ └───────────────────┘
▼
result.json
The thresholds are the AutoData paper's CS-research defaults; they live in
brain/autodata/contracts.py::AcceptanceCriteria and are configurable via
brain/autodata/tuning.yaml.
Layers¶
brain/autodata/
├── contracts.py Pydantic schemas (Rubric, QAItem, AcceptanceCriteria,
│ SolverScores, RoundRecord, AutoDataResult)
├── prompts.py System prompts for the five sub-agents + refinement
│ feedback templating
├── agents.py Challenger, QualityVerifier, Weak/StrongSolver, Judge
│ — each wraps an LLMProvider
├── judge.py score_solver_outputs(): rubric-graded judgment with
│ parser fallbacks
├── loop.py AutoDataLoop — inner loop (refinement) + middle loop
│ (sweep over sources). Emits autodata.{round,accepted,
│ qv_failed} events.
├── meta.py MetaOptimizer — Boltzmann (T=0.1) evolutionary search
│ over HarnessVariant tunings
├── tuning.py load/apply/coerce/render of brain/autodata/tuning.yaml
├── store.py AutoDataStore — JSONL persistence under
│ state/autodata/<run_id>/
├── karma_feed.py record_round() — KarmaEntry per round; rejects → regrets
├── fixtures.py StubProvider + fixture router for offline evaluation
├── proposer.py Bridge: meta-optimizer proposes; autotune lane validates
└── tuning.yaml THE ONLY MUTABLE FILE in brain/autodata/. The autotune
`autodata` lane is allowed to mutate this and nothing else.
CLI surface¶
# Generate QA items from grounded source files (real LLMs)
sb autodata generate vault/01_projects/paper-*.md \
[--max-rounds 8] [--weak-n 3 --strong-n 3] \
[--feed-karma] [--json]
# Search the tuning surface stochastically against the autotune fixture
sb autodata meta-optimize \
--train fixtures/a.md --validate fixtures/b.md \
[--iterations 30] [--temperature 0.1] [--seed 0]
# Meta-opt proposes; autotune-lane judges; commit if accepted
sb autodata propose-validate --pack core --iterations 30 --seed 0 [--apply]
# Bootstrap an autotune fixture from real vault markdown
sb autodata fixture-init "vault/01_projects/*.md" --out my_fixture.yaml \
[--weak-score 0.40 --strong-score 0.85]
# Dataset-level analysis of a generated run
sb autodata diversity state/autodata/<run-id> [--dedup-threshold 0.85] [--json]
# Export accepted items as a flat dataset (CSV/JSONL/JSON)
sb autodata to-dataset state/autodata/<run-id> --out dataset.jsonl [--dedup]
# Profile an export via brain.datasets.profile_dataset
sb autodata profile state/autodata/<run-id> [--dedup]
# Show summary for a generated run
sb autodata status state/autodata/<run-id>
MCP surface¶
brain/mcp/cc_server.py exposes one AutoData tool today:
secondbrain_autodata_propose_validate(
pack: str = "core", # autotune fixture pack name
iterations: int = 30, # meta-optimizer iterations
seed: int = 0,
temperature: float = 0.1, # Boltzmann temperature
) -> str # JSON: {proposal:{...}, validation:{...}}
Read-only — never writes tuning.yaml or commits. Use it from Claude Code,
the API, or any MCP client.
Autotune lane integration¶
| Aspect | Value |
|---|---|
| Lane spec | brain/autotune/specs/autodata.yaml |
| Evaluator | autodata.synth → brain/autotune/evaluators/autodata.py::AutoDataEvaluator |
| Mutable path | brain/autodata/tuning.yaml (only) |
| Mutation strategy | AutoDataTuningMutationStrategy (±0.05 grid + prompt-patch toggle) |
| Primary metric | autodata_acceptance_rate (accepted / total sources) |
| Guards | schema_valid, avg_gap, qv_pass_rate, avg_strong_score, p95_latency_ms |
| Fixtures | brain/autotune/fixtures/autodata.{smoke,core}.yaml |
| Karma mapping | autodata_round → autodata lane (regrets surface as autotune ideas) |
Run the lane the same way as any other:
sb autotune run autodata --pack core --attempts 12 # actual commits land
sb autotune run autodata --pack core --dry-run --attempts 12 # preview only
The lane uses a stub provider + canned fixture responses so it runs
offline (no LLM calls). Dispatch lives in
brain/autotune/runner.py::_mutation_strategy(spec) — lane-name match
first (autodata → AutoDataTuningMutationStrategy), kind-fallback second.
Tuning surface¶
# brain/autodata/tuning.yaml — bounded, lane-mutable
acceptance:
weak_avg_max: 0.65 # paper default; bounds [0.05, 0.95]
weak_max_cap: 0.75 # bounds [0.05, 0.99]
strong_avg_min: 0.60 # bounds [0.05, 0.99]
strong_avg_max: 0.95 # bounds [0.10, 0.99]
gap_min: 0.20 # bounds [0.0, 0.80]
prompt_patches:
paper_specific_insight: false
source_unique_knowledge: false
criterion_non_redundancy: false
Each prompt patch is a canned addition to the Challenger system prompt;
the patch text lives in brain/autodata/tuning.py::PROMPT_PATCH_TEXT so
diffs are reviewable in source control.
Karma feed¶
Every refinement round emits an autodata_round KarmaEntry:
| Round outcome | KarmaEntry shape |
|---|---|
| Accepted (gates pass) | outcome="success", regret=False, no lesson |
| Rejected (gates fail) | outcome="failure", regret=True, lesson= joined gate reasons |
Regrets flow through brain/autotune/karma_bridge.py::propose_idea_from_karma
and surface as autotune ideas on the autodata lane. Wire it on:
The next sb autotune run autodata cycle picks up the lessons as research
direction.
Propose-validate flow¶
The meta-optimizer is fast (~0.03 s for 30 iterations) but ungoverned;
the autotune lane has the full judge ensemble + complexity tax + regression
detection but searches via a deterministic grid that doesn't compound
within a single invocation. sb autodata propose-validate combines them:
meta-opt finds the best variant in-process, then the same evaluator +
ensemble that the autotune lane runs scores baseline-vs-candidate and
decides whether to accept. With --apply, a single autotune-style commit
replaces the chain of incremental --resume invocations.
Meta-optimizer (30 iters): Autotune lane validation:
best score: 1.000 baseline: 0.571
accepted: 4 / rejected: 26 candidate: 1.000
proposed: weak=0.70, strong=0.55, raw_gain: +0.429
gap=0.15 ensemble: ACCEPTED
metric: pass score=+0.429 required
contract: pass score=+1.000
regression:pass score=+1.000
pairwise: pass score=+1.000
Fixture format¶
# brain/autotune/fixtures/autodata.<pack>.yaml
name: autodata.<pack>
sources:
- id: paper-foo
title: "Short title"
text: |
Source markdown / text the challenger grounds on.
responses:
challenger: | # canned strict-JSON QA + rubric
{"question": "...", "context": "...",
"reference_answer": "...",
"rubric": {"criteria": [{"name": "...", "description": "...", "weight": 5}]}}
qv: '{"passed": true, "issues": [], "feedback": "ok"}'
weak_score: 0.40 # judge score for weak-solver responses on this source
strong_score: 0.85 # judge score for strong-solver responses
The fixture router in brain/autodata/fixtures.py latches the active
source on each challenger call (since solver/judge calls don't carry the
source id), so per-source scores are honored across the inner loop.
Result schema¶
Mirrors AutoData's result.json:
{
"source_id": "paper-foo",
"source_title": "...",
"rounds": [{
"refinement_round": 1,
"question": "...", "context": "...", "reference_answer": "...",
"rubric": [{"name": "...", "weight": 5, ...}, ...],
"accepted": true,
"quality_verifier_passed": true,
"weak_solver_avg": 0.40,
"strong_solver_avg": 0.80,
"gap": 0.40,
"reject_reasons": [],
"eval_report": "..."
}, ...],
"final_accepted_round": 1,
"total_rounds": 1
}
Persisted as JSONL under state/autodata/<run_id>/sources.jsonl; accepted
items also append to accepted.jsonl and a run-level summary lands in
index.json.
Tests¶
| File | Coverage |
|---|---|
tests/autodata/test_autodata.py |
Contracts, gates, refinement, judge fallbacks, inner loop, persistence, CLI registration (26) |
tests/autodata/test_autodata_autotune.py |
Tuning load/apply, lane spec resolution, evaluator end-to-end, MetaOptimizer over the new tuning surface (11) |
tests/autodata/test_autodata_karma.py |
record_round shapes, make_karma_feed filter, loop integration, karma_bridge mapping (8) |
tests/autodata/test_autodata_proposer.py |
propose/validate/apply flow, no-op handling, seed determinism (9) |
tests/autodata/test_autodata_mcp.py |
MCP tool registration, return shapes, error handling, determinism (5) |
tests/autodata/test_autodata_diversity.py |
Token-Jaccard math, near-dup clustering, accepted.jsonl loading, full report (16) |
tests/autodata/test_autodata_fixture_init.py |
Glob discovery, H1 title extraction, id collisions, score-bound validation, evaluator absolute-path support (14) |
tests/autodata/test_autodata_dataset_export.py |
flatten_item shape, CSV/JSONL/JSON formats, dedup integration, profile_run round-trip via brain.datasets (15) |
123 tests total, all offline (deterministic stub provider). Total ~4.7 s.
References¶
- Kulikov et al., "Autodata: an automatic data scientist to create high quality data", Meta AI RAM, 2026 — https://facebookresearch.github.io/RAM/blogs/autodata/
- Source repo: https://github.com/facebookresearch/RAM/tree/main/projects/autodata