Extend the provenance system to capture not just parent-child artifact relationships but
the processing steps between them. When an experiment is extracted from a paper, the
provenance should record: "Paper 12345 was processed by extraction-agent using
llm_structured_extraction method with schema v2, producing experiment artifact X."
This creates a full audit trail of how every artifact was constructed.
artifact_links captures derives_from, cites, extends relationshipsprovenance_chain JSON in artifacts captures parent artifactsprocessing_steps table or extended artifact_links metadata:source_artifact_id — input artifacttarget_artifact_id — output artifactstep_type — extraction, analysis, aggregation, transformation, validation, debateagent_id — which agent performed the stepmethod — what method/tool was usedparameters — JSON of method parametersstarted_at, completed_at — timing (populated in _upsert_processing_step)input_hash, output_hash — for reproducibility verification (auto-derived from artifact content_hashes)
record_processing_step() function in scidex/atlas/artifact_registry.py — public wrapper around _upsert_processing_stepget_provenance_graph enriches edges with processing step metadata via LEFT JOINcheck_reproducibility(source_id, method, params) in artifact_registry; exposed via GET /api/artifact/{id}/reproducibility-checkGET /api/artifact/{id}/processing-history — delegates to get_processing_lineagea17-24-REPR0001 — Reproducible analysis chains use processing stepsd16-24-PROV0001 — Provenance demo showcases processing lineageContext: processing_steps table and _upsert_processing_step existed from a prior agent
(task 7ba524d5). The input_hash/output_hash/started_at/completed_at columns were in the
DB schema but not being populated; record_processing_step() was private and unnamed;
provenance graph didn't join with processing_steps; no /processing-history or
/reproducibility-check endpoints existed.
Changes made:
scidex/atlas/artifact_registry.py:ensure_processing_steps_schema: extended SQLite and PG branches to addinput_hash, output_hash, started_at, completed_at columns (CREATE TABLE_fetch_artifact_content_hash() and _compute_step_input_hash() helpers_upsert_processing_step: now auto-derives input_hash fromoutput_hash fromcontent_hash; populates started_at/completed_atrecord_processing_step() — the named function the spec requirescheck_reproducibility() — compares output_hashes across prior runs withget_provenance_graph: LEFT JOINs processing_steps to enrich each edge withstep_type, agent_id, method, parameters, input_hash, output_hash,api.py:GET /api/artifact/{artifact_id}/processing-history — full transform chainGET /api/artifact/{artifact_id}/reproducibility-check?method=...¶meters=...tests/test_processing_lineage_api_contracts.py: added 9 new AST-based testsrecord_processing_step, check_reproducibility,All 13 contract tests + 4 functional tests pass.