[Forge] Nightly mock-data sentry - fail loudly when synthesis cheats done

← Real Data Pipeline
Nightly walker scans analyses+evidence+synthesis payloads for mock fingerprints; /senate/data-fidelity dashboard.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (12 commits) (#623)2026-04-27
[Forge] Data fidelity sentry — detect mock/placeholder data in outputs [task:ce6a52a7-658b-4acc-bb7c-68dea26d9f17] (#618)2026-04-27
Spec File

Goal

Build a nightly verifier that walks every analysis run, hypothesis evidence
block, and synthesizer score from the prior 24 h and asserts none of them
silently relied on mock / placeholder / random-seeded data. Produce a /senate/data-fidelity dashboard and an alert that pages the watchdog when
the mock-rate creeps above 1 %.

Why this matters

Quest q-555b6bea3848 already has 7 done tasks but a recurring theme is
"audit could not verify the work actually landed on main." A mock-data sentry
turns the once-off audit into a continuous guarantee and makes regressions
visible the morning after they ship instead of weeks later.

Acceptance Criteria

☐ New module scidex/senate/data_fidelity.py defines a sentinel set
of mock fingerprints: np.random.seed(, mock_, placeholder_,
"source": "fallback", "synthetic": true, hard-coded gene lists
["BRCA1","TP53","APP","MAPT"] known from prior fakes.
☐ Walker iterates analyses, hypothesis_evidence_blocks, and the
most recent 200 synthesis_session payload_jsons; flags any row that
matches.
☐ Each flag carries (table, row_id, fingerprint, sample_excerpt,
severity) and is upserted into data_fidelity_findings.
/senate/data-fidelity renders a 30-day trend chart and a top-20
offender list with click-through to the offending row.
☐ Threshold mock_rate > 0.01 triggers a watchdog notification
(reuses the fleet-health watchdog plumbing — journalctl -u
orchestra-fleet-watchdog.service
).
☐ Sentry runs as scidex-data-fidelity.timer (daily 03:30 UTC) and
writes a JSONL run log under logs/data_fidelity/.
☐ CI hook: PR build fails if a new fingerprint is introduced under
synthesis_engine.py / tools.py (regex grep gate).

Approach

  • Fingerprint library lives in code, easy to extend.
  • Walker uses chunked SELECTs (LIMIT 1000) so it stays under 60 s on
  • the prod PG instance.
  • Reuse epistemic_health.py for the dashboard scaffolding.
  • Dependencies

    • Quest q-555b6bea3848.
    • Fleet watchdog (reference_fleet_health_watchdog.md).

    Work Log

    • 2026-04-27: Created scidex/senate/data_fidelity.py — mock fingerprint scanner with sentinel patterns (np.random.seed, mock_, placeholder_, "source":"fallback", "synthetic":true, hardcoded gene lists [BRCA1,TP53,APP,MAPT]). Walks analyses, evidence_entries, artifact_syntheses tables in 1000-row chunks. Upserts findings to data_fidelity_findings table. Triggers watchdog notification when mock_rate > 0.01. Writes JSONL logs to logs/data_fidelity/.
    • 2026-04-27: Added /api/senate/data-fidelity dashboard route and /api/senate/data-fidelity/run trigger to api_routes/senate.py. Dashboard returns 30-day trend and top-20 offenders.
    • 2026-04-27: Created scidex-data-fidelity.service (oneshot) and scidex-data-fidelity.timer (daily 03:30 UTC) systemd units.
    • 2026-04-27: Created scripts/ci_check_mock_fingerprints.py — CI grep gate for synthesis_engine.py and tools.py. Fails build if new mock patterns detected.

    Sibling Tasks in Quest (Real Data Pipeline) ↗