Build the watchdog that turns provenance from data into an SLO. Every 24 h, walk
the artifact_provenance table for the top 200 artifacts ranked by
usage_score + citation_count. For each artifact, follow its provenance chain
back to leaf sources (papers, datasets, external origins). Flag any chain that
exceeds max_hops=50 or references a missing source_artifact_id. Attempt
auto-repair where possible and write action-typed comments on broken artifacts
so the Percolation Engine can pick them up. Emit a snapshot dashboard
artifact with the health report and per-week trend.
economics_drivers/provenance_health_driver.pyartifact_provenance for top 200 artifacts by usage_score + citation_countmax_hops=50 OR source_artifact_id not in artifactsrepair provenance row and links to the renamed artifactartifact_comments(comment_type='action') on each broken artifactdashboard artifact with report summary and per-week trenddocs/planning/specs/reproducibility_check_job.mdCOALESCE(usage_score,0) + COALESCE(citation_count,0) DESCartifact_provenance following source_artifact_id links,origin_type='external',artifacts for same content hash (suffix match) or slug-styleartifact_provenance(action_kind='repair') row
artifact_comments(comment_type='action', author_id='system')intact_pct = intact_count / total * 100dashboard artifact provenance-health-daily with full report in metadatatraced=False but don't flag it brokenprovenance-health-daily → same artifact updated each runc7f07845 — Provenance schema (artifact_provenance table) shippedf80549eb — Percolation Engine quest (uses action-typed comments as triggers)artifact_comments(comment_type='action') on broken artifactsprovenance-health-daily artifactartifact_provenance table exists (0 rows at time of writing — schema shippedartifact_comments trigger allows comment_type='action'economics_drivers/provenance_health_driver.py with full walk, repair, andscidex-provenance-health.{service,timer}Result: PARTIAL
Verified by: minimax:78 via task 010c79f0-feb9-4333-9304-2e513659f383
10 completed analyses sampled from the 197 total. Query:
SELECT id, title, notebook_path, artifact_path, report_url, metadata
FROM analyses
WHERE status='completed'
AND notebook_path IS NOT NULL
AND notebook_path != 'completed'
LIMIT 10;5 completed analyses point to artifacts/biomni_parity// notebooks (gene_regulatory_network, microbiome_analysis, biomarker_panel, cas13_primer_design, clinical_trial_landscaping). All 5 exist on disk at artifacts/biomni_parity//*.ipynb but have 0 executed cell outputs — each contains exactly 8 code cells, all with empty outputs and null execution_count. All 5 share task_id: a4c450f7-df61-405c-9e95-16d08119c5be in metadata, indicating they are placeholder/template notebooks from a single pipeline run that never executed. These also need re-execution.
Total flagged for re-execution: 6 analyses (10 from sample + 5 BIOMNI parity = 6 unique analyses; the BIOMNI ones overlap with some from the primary 10).
The analyses table has no result_summary column. The closest proxy is report_url (path to HTML report) and metadata (JSON). Both checks were performed above using report_url presence as the non-empty result_summary proxy. Note: notebook_path has the literal string 'completed' in 20 rows — those were excluded from this audit as the field appears to have been backfilled incorrectly.
Create a bug-fix task to re-execute the 11 analyses (6 from sample + 5 BIOMNI parity) whose notebooks are either missing or contain zero executed outputs.