[Senate/feat] Reproducibility check job — daily walk_provenance_to_leaves over top-N artifacts; alert on broken chains

Goal

Build the watchdog that turns provenance from data into an SLO. Every 24 h, walk
the artifact_provenance table for the top 200 artifacts ranked by usage_score + citation_count. For each artifact, follow its provenance chain
back to leaf sources (papers, datasets, external origins). Flag any chain that
exceeds max_hops=50 or references a missing source_artifact_id. Attempt
auto-repair where possible and write action-typed comments on broken artifacts
so the Percolation Engine can pick them up. Emit a snapshot dashboard
artifact with the health report and per-week trend.

Acceptance Criteria

☑ Recurring job: every-24h systemd timer drives economics_drivers/provenance_health_driver.py

☑ Walks artifact_provenance for top 200 artifacts by usage_score + citation_count

☑ Flags broken chains: walk exceeds max_hops=50 OR source_artifact_id not in artifacts

☑ Auto-repair: if missing source can be matched by content hash or slug, writes a repair provenance row and links to the renamed artifact

☑ Alerts: writes artifact_comments(comment_type='action') on each broken artifact

☑ Health report: upserts a dashboard artifact with report summary and per-week trend

☑ Per-week trend: % of top-200 with intact chains (target ≥ 99%)

☑ Spec file at docs/planning/specs/reproducibility_check_job.md

Approach

Query top 200 artifacts by COALESCE(usage_score,0) + COALESCE(citation_count,0) DESC

For each artifact, BFS-walk artifact_provenance following source_artifact_id links,

stopping at leaf types ('paper','dataset') or artifacts with origin_type='external',
or when no further provenance rows exist

Track: hop count, missing sources, cycle detection via visited set

On missing source: check artifacts for same content hash (suffix match) or slug-style

ID similarity → if found, write artifact_provenance(action_kind='repair') row

On broken (any issue): insert artifact_comments(comment_type='action', author_id='system')

After processing all 200: compute intact_pct = intact_count / total * 100

Load last 6 daily snapshots from the health report artifact metadata, append today's,

keep 7 days rolling → compute week trend

Upsert the dashboard artifact provenance-health-daily with full report in metadata

Key Design Decisions

Leaf detection: artifact_type in {'paper','dataset'} OR origin_type='external' OR

no outgoing provenance edges → treat as source leaf, stop walk

Empty chains are not broken: an artifact with zero provenance rows has an empty

(untraced) chain, not a broken one. We report traced=False but don't flag it broken
unless it explicitly references a missing source

Stable report artifact ID: provenance-health-daily → same artifact updated each run

so there's one canonical health dashboard, not a new artifact per run

Idempotency: the repair provenance row uses ON CONFLICT DO NOTHING keyed on

(target_artifact_id, source_artifact_id, action_kind='repair')

Auto-repair heuristic: match on same artifact ID prefix (base ID without UUID suffix),

or on content_hash. If multiple candidates, pick most recently created one

Dependencies

c7f07845 — Provenance schema (artifact_provenance table) shipped
f80549eb — Percolation Engine quest (uses action-typed comments as triggers)

Dependents

Percolation Engine reads artifact_comments(comment_type='action') on broken artifacts
Future reproducibility SLO dashboard can read provenance-health-daily artifact

Work Log

2026-04-26 — Task agent (1881cb73)

Verified artifact_provenance table exists (0 rows at time of writing — schema shipped

but no writes yet)

Verified artifact_comments trigger allows comment_type='action'
Implemented economics_drivers/provenance_health_driver.py with full walk, repair, and

reporting logic

Created systemd service + timer: scidex-provenance-health.{service,timer}
Result: Driver runs successfully; 200 artifacts checked, all chains empty-but-intact

since artifact_provenance has no rows yet. Job is live and ready to catch issues as
provenance writes accumulate.

Verification — 2026-04-28 07:10:00Z

Result: PARTIAL Verified by: minimax:78 via task 010c79f0-feb9-4333-9304-2e513659f383

Audit scope

10 completed analyses sampled from the 197 total. Query:

SELECT id, title, notebook_path, artifact_path, report_url, metadata
FROM analyses
WHERE status='completed'
  AND notebook_path IS NOT NULL
  AND notebook_path != 'completed'
LIMIT 10;

Tests run

Analysis ID	Notebook Path	Exists?	Executed?	Report HTML	Notes
SDA-BIOMNI-GENE_REG-785b71fe	site/notebooks/SDA-BIOMNI-GENE_REG-785b71fe.ipynb	NO	—	MISSING	Flag: re-execute
sda-2026-04-01-gap-v2-bc5f270e	site/notebooks/sda-2026-04-01-gap-v2-bc5f270e.ipynb	NO	—	EXISTS (13KB)	Flag: re-execute
sda-2026-04-01-gap-v2-89432b95	site/notebooks/sda-2026-04-01-gap-v2-89432b95.ipynb	NO	—	EXISTS (14KB)	Flag: re-execute
SDA-BIOMNI-BIOMARKE-34ec007c	site/notebooks/SDA-BIOMNI-BIOMARKE-34ec007c.ipynb	NO	—	MISSING	Flag: re-execute
SDA-2026-04-04-analysis_sea_ad_001	site/notebooks/SDA-2026-04-04-analysis_sea_ad_001.ipynb	NO	—	EXISTS (22KB)	Flag: re-execute
SDA-BIOMNI-MICROBIO-337ee37a	site/notebooks/SDA-BIOMNI-MICROBIO-337ee37a.ipynb	NO	—	MISSING	Flag: re-execute
sda-2026-04-01-gap-014	site/notebooks/nb-sda-2026-04-01-gap-014.ipynb	YES	YES (6 cells, 12 outputs)	EXISTS (14KB)	PASS
sda-2026-04-01-gap-012	site/notebooks/sda-2026-04-01-gap-012.ipynb	YES	YES (15 cells, 30 outputs)	EXISTS (17KB)	PASS
SDA-2026-04-01-gap-20260401-225149	site/notebooks/SDA-2026-04-01-gap-20260401-225149.ipynb	YES	YES (12 cells, 24 outputs)	EXISTS (22KB)	PASS
SDA-2026-01-gap-9137255b	site/notebooks/nb-SDA-2026-01-gap-9137255b.ipynb	YES	YES (6 cells, 12 outputs)	EXISTS (22KB)	PASS

Additional finding — BIOMNI parity template notebooks (5 analyses)

5 completed analyses point to artifacts/biomni_parity// notebooks (gene_regulatory_network, microbiome_analysis, biomarker_panel, cas13_primer_design, clinical_trial_landscaping). All 5 exist on disk at artifacts/biomni_parity//*.ipynb but have 0 executed cell outputs — each contains exactly 8 code cells, all with empty outputs and null execution_count. All 5 share task_id: a4c450f7-df61-405c-9e95-16d08119c5be in metadata, indicating they are placeholder/template notebooks from a single pipeline run that never executed. These also need re-execution.

Total flagged for re-execution: 6 analyses (10 from sample + 5 BIOMNI parity = 6 unique analyses; the BIOMNI ones overlap with some from the primary 10).

Check 3 — result_summary

The analyses table has no result_summary column. The closest proxy is report_url (path to HTML report) and metadata (JSON). Both checks were performed above using report_url presence as the non-empty result_summary proxy. Note: notebook_path has the literal string 'completed' in 20 rows — those were excluded from this audit as the field appears to have been backfilled incorrectly.

Summary

4/10 sampled analyses: fully reproducible (notebook exists + executed + report exists)
6/10 sampled analyses: notebook missing on disk — requires re-execution
5 additional BIOMNI parity analyses confirmed to have zero-output placeholder notebooks

Recommended follow-up

Create a bug-fix task to re-execute the 11 analyses (6 from sample + 5 BIOMNI parity) whose notebooks are either missing or contain zero executed outputs.

File: reproducibility_check_job.md

Modified: 2026-05-01 20:13

Size: 7.9 KB