Goal
Build a nightly verifier that walks every analysis run, hypothesis evidence
block, and synthesizer score from the prior 24 h and asserts none of them
silently relied on mock / placeholder / random-seeded data. Produce a
/senate/data-fidelity dashboard and an alert that pages the watchdog when
the mock-rate creeps above 1 %.
Why this matters
Quest
q-555b6bea3848 already has 7 done tasks but a recurring theme is
"audit could not verify the work actually landed on main." A mock-data sentry
turns the once-off audit into a continuous guarantee and makes regressions
visible the morning after they ship instead of weeks later.
Acceptance Criteria
☐ New module scidex/senate/data_fidelity.py defines a sentinel set
of mock fingerprints:
np.random.seed(,
mock_,
placeholder_,
"source": "fallback",
"synthetic": true, hard-coded gene lists
["BRCA1","TP53","APP","MAPT"] known from prior fakes.
☐ Walker iterates analyses, hypothesis_evidence_blocks, and the
most recent 200 synthesis_session payload_jsons; flags any row that
matches.
☐ Each flag carries (table, row_id, fingerprint, sample_excerpt,
severity) and is upserted into
data_fidelity_findings.
☐ /senate/data-fidelity renders a 30-day trend chart and a top-20
offender list with click-through to the offending row.
☐ Threshold mock_rate > 0.01 triggers a watchdog notification
(reuses the fleet-health watchdog plumbing —
journalctl -u
orchestra-fleet-watchdog.service).
☐ Sentry runs as scidex-data-fidelity.timer (daily 03:30 UTC) and
writes a JSONL run log under
logs/data_fidelity/.
☐ CI hook: PR build fails if a new fingerprint is introduced under
synthesis_engine.py /
tools.py (regex grep gate).
Approach
Fingerprint library lives in code, easy to extend.
Walker uses chunked SELECTs (LIMIT 1000) so it stays under 60 s on
the prod PG instance.
Reuse epistemic_health.py for the dashboard scaffolding.Dependencies
- Quest
q-555b6bea3848.
- Fleet watchdog (
reference_fleet_health_watchdog.md).
Work Log
- 2026-04-27: Created
scidex/senate/data_fidelity.py — mock fingerprint scanner with sentinel patterns (np.random.seed, mock_, placeholder_, "source":"fallback", "synthetic":true, hardcoded gene lists [BRCA1,TP53,APP,MAPT]). Walks analyses, evidence_entries, artifact_syntheses tables in 1000-row chunks. Upserts findings to data_fidelity_findings table. Triggers watchdog notification when mock_rate > 0.01. Writes JSONL logs to logs/data_fidelity/.
- 2026-04-27: Added
/api/senate/data-fidelity dashboard route and /api/senate/data-fidelity/run trigger to api_routes/senate.py. Dashboard returns 30-day trend and top-20 offenders.
- 2026-04-27: Created
scidex-data-fidelity.service (oneshot) and scidex-data-fidelity.timer (daily 03:30 UTC) systemd units.
- 2026-04-27: Created
scripts/ci_check_mock_fingerprints.py — CI grep gate for synthesis_engine.py and tools.py. Fails build if new mock patterns detected.