[Senate/feat] Reproducibility check job — daily walk_provenance_to_leaves over top-N artifacts; alert on broken chains

← All Specs

Goal

Build the watchdog that turns provenance from data into an SLO. Every 24 h, walk
the artifact_provenance table for the top 200 artifacts ranked by usage_score + citation_count. For each artifact, follow its provenance chain
back to leaf sources (papers, datasets, external origins). Flag any chain that
exceeds max_hops=50 or references a missing source_artifact_id. Attempt
auto-repair where possible and write action-typed comments on broken artifacts
so the Percolation Engine can pick them up. Emit a snapshot dashboard
artifact with the health report and per-week trend.

Acceptance Criteria

☑ Recurring job: every-24h systemd timer drives economics_drivers/provenance_health_driver.py
☑ Walks artifact_provenance for top 200 artifacts by usage_score + citation_count
☑ Flags broken chains: walk exceeds max_hops=50 OR source_artifact_id not in artifacts
☑ Auto-repair: if missing source can be matched by content hash or slug, writes a repair provenance row and links to the renamed artifact
☑ Alerts: writes artifact_comments(comment_type='action') on each broken artifact
☑ Health report: upserts a dashboard artifact with report summary and per-week trend
☑ Per-week trend: % of top-200 with intact chains (target ≥ 99%)
☑ Spec file at docs/planning/specs/reproducibility_check_job.md

Approach

  • Query top 200 artifacts by COALESCE(usage_score,0) + COALESCE(citation_count,0) DESC
  • For each artifact, BFS-walk artifact_provenance following source_artifact_id links,
  • stopping at leaf types ('paper','dataset') or artifacts with origin_type='external',
    or when no further provenance rows exist
  • Track: hop count, missing sources, cycle detection via visited set
  • On missing source: check artifacts for same content hash (suffix match) or slug-style
  • ID similarity → if found, write artifact_provenance(action_kind='repair') row
  • On broken (any issue): insert artifact_comments(comment_type='action', author_id='system')
  • After processing all 200: compute intact_pct = intact_count / total * 100
  • Load last 6 daily snapshots from the health report artifact metadata, append today's,
  • keep 7 days rolling → compute week trend
  • Upsert the dashboard artifact provenance-health-daily with full report in metadata
  • Key Design Decisions

    • Leaf detection: artifact_type in {'paper','dataset'} OR origin_type='external' OR
    no outgoing provenance edges → treat as source leaf, stop walk
    • Empty chains are not broken: an artifact with zero provenance rows has an empty
    (untraced) chain, not a broken one. We report traced=False but don't flag it broken
    unless it explicitly references a missing source
    • Stable report artifact ID: provenance-health-daily → same artifact updated each run
    so there's one canonical health dashboard, not a new artifact per run
    • Idempotency: the repair provenance row uses ON CONFLICT DO NOTHING keyed on
    (target_artifact_id, source_artifact_id, action_kind='repair')
    • Auto-repair heuristic: match on same artifact ID prefix (base ID without UUID suffix),
    or on content_hash. If multiple candidates, pick most recently created one

    Dependencies

    • c7f07845 — Provenance schema (artifact_provenance table) shipped
    • f80549eb — Percolation Engine quest (uses action-typed comments as triggers)

    Dependents

    • Percolation Engine reads artifact_comments(comment_type='action') on broken artifacts
    • Future reproducibility SLO dashboard can read provenance-health-daily artifact

    Work Log

    2026-04-26 — Task agent (1881cb73)

    • Verified artifact_provenance table exists (0 rows at time of writing — schema shipped
    but no writes yet)
    • Verified artifact_comments trigger allows comment_type='action'
    • Implemented economics_drivers/provenance_health_driver.py with full walk, repair, and
    reporting logic
    • Created systemd service + timer: scidex-provenance-health.{service,timer}
    • Result: Driver runs successfully; 200 artifacts checked, all chains empty-but-intact
    since artifact_provenance has no rows yet. Job is live and ready to catch issues as
    provenance writes accumulate.

    Verification — 2026-04-28 07:10:00Z

    Result: PARTIAL Verified by: minimax:78 via task 010c79f0-feb9-4333-9304-2e513659f383

    Audit scope

    10 completed analyses sampled from the 197 total. Query:

    SELECT id, title, notebook_path, artifact_path, report_url, metadata
    FROM analyses
    WHERE status='completed'
      AND notebook_path IS NOT NULL
      AND notebook_path != 'completed'
    LIMIT 10;

    Tests run

    Analysis IDNotebook PathExists?Executed?Report HTMLNotes
    SDA-BIOMNI-GENE_REG-785b71fesite/notebooks/SDA-BIOMNI-GENE_REG-785b71fe.ipynbNOMISSINGFlag: re-execute
    sda-2026-04-01-gap-v2-bc5f270esite/notebooks/sda-2026-04-01-gap-v2-bc5f270e.ipynbNOEXISTS (13KB)Flag: re-execute
    sda-2026-04-01-gap-v2-89432b95site/notebooks/sda-2026-04-01-gap-v2-89432b95.ipynbNOEXISTS (14KB)Flag: re-execute
    SDA-BIOMNI-BIOMARKE-34ec007csite/notebooks/SDA-BIOMNI-BIOMARKE-34ec007c.ipynbNOMISSINGFlag: re-execute
    SDA-2026-04-04-analysis_sea_ad_001site/notebooks/SDA-2026-04-04-analysis_sea_ad_001.ipynbNOEXISTS (22KB)Flag: re-execute
    SDA-BIOMNI-MICROBIO-337ee37asite/notebooks/SDA-BIOMNI-MICROBIO-337ee37a.ipynbNOMISSINGFlag: re-execute
    sda-2026-04-01-gap-014site/notebooks/nb-sda-2026-04-01-gap-014.ipynbYESYES (6 cells, 12 outputs)EXISTS (14KB)PASS
    sda-2026-04-01-gap-012site/notebooks/sda-2026-04-01-gap-012.ipynbYESYES (15 cells, 30 outputs)EXISTS (17KB)PASS
    SDA-2026-04-01-gap-20260401-225149site/notebooks/SDA-2026-04-01-gap-20260401-225149.ipynbYESYES (12 cells, 24 outputs)EXISTS (22KB)PASS
    SDA-2026-01-gap-9137255bsite/notebooks/nb-SDA-2026-01-gap-9137255b.ipynbYESYES (6 cells, 12 outputs)EXISTS (22KB)PASS

    Additional finding — BIOMNI parity template notebooks (5 analyses)

    5 completed analyses point to artifacts/biomni_parity// notebooks (gene_regulatory_network, microbiome_analysis, biomarker_panel, cas13_primer_design, clinical_trial_landscaping). All 5 exist on disk at artifacts/biomni_parity//*.ipynb but have 0 executed cell outputs — each contains exactly 8 code cells, all with empty outputs and null execution_count. All 5 share task_id: a4c450f7-df61-405c-9e95-16d08119c5be in metadata, indicating they are placeholder/template notebooks from a single pipeline run that never executed. These also need re-execution.

    Total flagged for re-execution: 6 analyses (10 from sample + 5 BIOMNI parity = 6 unique analyses; the BIOMNI ones overlap with some from the primary 10).

    Check 3 — result_summary

    The analyses table has no result_summary column. The closest proxy is report_url (path to HTML report) and metadata (JSON). Both checks were performed above using report_url presence as the non-empty result_summary proxy. Note: notebook_path has the literal string 'completed' in 20 rows — those were excluded from this audit as the field appears to have been backfilled incorrectly.

    Summary

    • 4/10 sampled analyses: fully reproducible (notebook exists + executed + report exists)
    • 6/10 sampled analyses: notebook missing on disk — requires re-execution
    • 5 additional BIOMNI parity analyses confirmed to have zero-output placeholder notebooks

    Recommended follow-up

    Create a bug-fix task to re-execute the 11 analyses (6 from sample + 5 BIOMNI parity) whose notebooks are either missing or contain zero executed outputs.

    File: reproducibility_check_job.md
    Modified: 2026-05-01 20:13
    Size: 7.9 KB