[Forge] Expose verify_reproducibility via API + pinned_artifacts for analyses

← All Specs

[Forge] Expose verify_reproducibility via API + pinned_artifacts for analyses

Goal

Enable the end-to-end provenance demo (d16-24-PROV0001) by adding two missing infrastructure pieces:
  • pinned_artifacts column in the analyses table (via migration) to store exact artifact versions used
  • verify_reproducibility() function exposed at GET /api/analysis/{id}/provenance endpoint
  • The existing render_dashboard() in artifact_registry.py:811 already has all the {{data:KEY}} substitution infrastructure. The provenance demo needs these two additional pieces.

    Background

    From scripts/test_reproducibility.py:29: verify_reproducibility(db, analysis_id) already exists but is not exposed via API. It checks that all pinned artifacts exist and their content_hash matches.

    The analyses table lacks a pinned_artifacts column — provenance is currently computed live from artifact_links, which is not sufficient for reproducibility verification (links can change over time).

    Approach

    Step 1: Add pinned_artifacts column (migration)

    Add pinned_artifacts TEXT DEFAULT NULL to analyses table. This stores a JSON array of pinned artifact IDs with their content hashes and versions.

    Step 2: Add GET /api/analysis/{id}/provenance endpoint

    In api.py, add a new route that:
    • Fetches the analysis by ID
    • Reads pinned_artifacts from the analyses table
    • Reads artifact_links for the analysis (upstream + downstream)
    • Calls verify_reproducibility() if available (from test_reproducibility.py)
    • Returns a JSON object with: provenance chain, pinned artifacts, reproducibility status

    Step 3: Wire up assemble_provenance_demo.py

    The existing scripts/archive/oneoff_scripts/assemble_provenance_demo.py already has the logic to:
    • Create analysis-SDA-PROV-DEMO-001 artifact
    • Store pinned_artifacts in its metadata
    • Create artifact_links for the full chain

    Make sure the analysis's metadata includes a pinned_artifacts array in the analyses DB row.

    Step 4: Test the endpoint

    • GET /api/analysis/analysis-SDA-PROV-DEMO-001/provenance should return:
    - provenance_chain: list of linked artifacts (paper → kg_edges → hypothesis → dataset → tabular → analysis → model → figures)
    - pinned_artifacts: array with artifact IDs, versions, content hashes
    - reproducible: boolean from verify_reproducibility()
    - checks: per-artifact verification results

    Acceptance Criteria

    ☐ Migration adds pinned_artifacts TEXT DEFAULT NULL to analyses table
    GET /api/analysis/{id}/provenance returns JSON with provenance_chain, pinned_artifacts, reproducible status
    analysis-SDA-PROV-DEMO-001 (or similar) has pinned_artifacts populated in the analyses table
    ☐ verify_reproducibility() is callable and returns correct status
    ☐ API returns 404 for non-existent analysis IDs
    ☐ API returns correct provenance chain for analysis with full artifact links
    ☐ Work log updated

    Dependencies

    • scripts/test_reproducibility.py (already exists, verify_reproducibility(db, analysis_id))
    • scripts/archive/oneoff_scripts/assemble_provenance_demo.py (already exists, builds the chain)
    • artifact_registry.py:register_artifact() (already supports artifact_type='analysis')

    Dependents

    • d16-24-PROV0001 (end-to-end provenance demo — needs this infrastructure)
    • d16-86-DDSH0001 (Knowledge Growth Dashboard — uses render_dashboard but not pinned_artifacts)

    Work Log

    _Not started yet_

    File: e90797ac_p88_provenance_infra_spec.md
    Modified: 2026-05-01 20:13
    Size: 3.8 KB