[Atlas] Provenance integrity scanner — verify claimed inputs produced output

← All Specs

Goal

The artifact-provenance graph (artifact_provenance, artifact_links) tells us "artifact Y was produced from inputs X1,
X2, X3 by skill S at time T", but nothing actually verifies that the
claim holds. An attacker (or a buggy skill) can write a provenance
row pointing at unrelated inputs. This task ships a scanner that, for
artifacts whose skill is deterministic / re-runnable, recomputes the
output from the recorded inputs and compares hashes. When the
recomputed hash diverges from the stored hash, the artifact is
flagged as lifecycle='provenance_break' and a Senate review task is
spawned. The scanner runs in two modes: strict for deterministic
skills (recomputes), soft for non-deterministic skills (LLM
generations — checks that the recorded inputs include all PMIDs
referenced in the output, and that no input was modified after the
output's created_at).

Effort: thorough

Acceptance Criteria

☐ New module scidex/atlas/provenance_integrity.py:
- classify_skill(skill_id) -> Mode returns
'strict' | 'soft' | 'skip' from a registry table:

CREATE TABLE skill_provenance_mode (
          skill_id TEXT PRIMARY KEY,
          mode TEXT NOT NULL CHECK (mode IN ('strict','soft','skip')),
          rationale TEXT
        );

Seed strict-mode for: pubmed-search, openalex-works,
compute_content_hash-emitters; soft-mode for: theorist,
skeptic, hypothesis-generation; skip for personas whose
outputs are uniquely contextual.
- scan_artifact(artifact_id) -> ScanResult — per mode:
- strict: re-invoke the skill with stored inputs;
compare output hash. (Re-invocation goes through
scidex/forge/executor.py:isolated_run.)
- soft: parse all PMID/DOI/UUID references from the
output; assert each appears in the provenance inputs;
assert no input row's last_updated_at > output.created_at.

☐ Migration migrations/20260428_provenance_integrity.sql:

CREATE TABLE provenance_integrity_scan (
        id BIGSERIAL PRIMARY KEY,
        artifact_id   TEXT NOT NULL,
        scan_mode     TEXT NOT NULL,
        scanned_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
        verdict       TEXT NOT NULL CHECK (verdict IN
                      ('verified','soft_verified','divergent',
                       'missing_inputs','reinvoke_failed','skipped')),
        recomputed_hash TEXT,
        stored_hash     TEXT,
        details         JSONB
      );
      CREATE INDEX idx_pis_verdict_bad ON provenance_integrity_scan
        (artifact_id) WHERE verdict IN ('divergent','missing_inputs');

(Plus the skill_provenance_mode table above.)

Recurring quest — nightly, sample 200 artifacts/night
weighted toward high-Elo / high-traffic ones (this scales the
cost; a full fleet sweep is unaffordable for strict mode).
Skipped artifacts are still recorded as verdict='skipped'
so coverage is calculable.
Divergence handlerverdict='divergent' artifacts
move to lifecycle='provenance_break', lose their Elo bonus
from q-trust-signed-artifact-attestations if any (set
attestation_status='quarantined'), and a Senate task is
spawned with the recomputed-vs-stored hash diff for review.
Missing-input handlerverdict='missing_inputs'
artifacts are flagged but kept on the public surface; they
appear with a "Provenance incomplete" badge in the artifact UI.
Audit-chain integration — every scan row appends to
audit_chain (event_kind='provenance_integrity_scan')
so verdict deletion is detectable.
☐ Senate dashboard tile "Provenance integrity (30d)" — counts
by verdict + a "Provenance break list" with deep-links to the
Senate review tasks. Coverage metric: scanned / total.
API: GET /api/atlas/provenance/scan/{artifact_id} returns
the latest scan row + a "rescan" trigger button protected by
auth (avoids drive-by re-runs that would burn LLM budget).
☐ Tests tests/test_provenance_integrity.py:
- strict happy path (mock skill returns identical bytes);
- strict divergence (mock skill returns different bytes;
verdict=divergent, lifecycle moves);
- soft happy path (PMIDs in output ⊆ inputs);
- soft missing input (PMID in output not in inputs);
- soft input-after-output time violation;
- skip mode short-circuits with verdict=skipped;
- audit-chain row written on every scan.

Approach

  • Seed skill_provenance_mode with explicit rationales (commit
  • the seed YAML alongside the module).
  • Migration; module; mode-specific scan logic; tests with mocked
  • isolated_run.
  • Recurring registration; dry run on 5 artifacts; record verdicts.
  • Divergence handler — wire to the existing lifecycle-state-
  • machine path (see q-gov-lifecycle-state-machine-enforcement).
  • Audit-chain integration; tile.
  • Dependencies

    • q-trust-audit-log-hash-chain — chain-append target.
    • scidex/forge/executor.isolated_run — re-invocation.
    • q-gov-lifecycle-state-machine-enforcement — handles
    provenance_break lifecycle transition.

    Dependents

    • Future "trusted artifacts only" filter on dashboard listings.

    Work Log

    2026-04-27T14:55:00Z — Verification

    • Task is STALE: none of the acceptance criteria are implemented on main
    • Verified current state below; committing verification block as spec-only edit

    Verification — 2026-04-27T14:55:00Z

    Result: FAIL Verified by: MiniMax-M2 via task 2bcb16bc-b48f-425d-95f0-bff62cc9a9b9

    Tests run

    TargetCommandExpectedActualPass?
    scidex/atlas/provenance_integrity.pyls scidex/atlas/provenance*.pymodule existsNo such file
    skill_provenance_mode tableSELECT ... FROM information_schema.tablestable existsNOT FOUND
    provenance_integrity_scan tableSELECT ... FROM information_schema.tablestable existsNOT FOUND
    lifecycle_state='provenance_break'SELECT DISTINCT lifecycle_state FROM artifactsprovenance_break presentNOT PRESENT (values: active/deprecated/frozen/superseded/validated)
    artifact_provenance tableSELECT COUNT(*) FROM artifact_provenanceexistsEXISTS (25 rows)
    audit_chain tableSELECT ... FROM information_schema.tablesexistsEXISTS
    isolated_run functionfrom scidex.senate.cgroup_isolation import isolated_runcallableTRUE
    GET /api/atlas/provenance/scan/{id}grep -r "provenance/scan" in api.pyroute existsNOT FOUND
    tests/test_provenance_integrity.pyls tests/test_provenance_integrity.pytest fileNOT FOUND

    Notes

    • The spec references scidex/forge/executor.py:isolated_run but the actual function lives in scidex/senate/cgroup_isolation.py:isolated_run() and is called via self._cgroup.isolated_run() from ForgeExecutor
    • artifact_provenance table exists with 25 rows (action_kind='spawn_link' entries linking hypotheses to analyses)
    • lifecycle_state column exists on artifacts table (NOT lifecycle as spec says) with values: active, deprecated, frozen, superseded, validated — provenance_break is NOT a valid state
    • All acceptance criteria remain unimplemented: no module, no tables, no API route, no tests, no handlers, no recurring quest
    • The q-gov-lifecycle-state-machine-enforcement dependency may or may not exist (lifecycle_state machine doesn't have provenance_break state)

    Tasks using this spec (1)
    [Atlas] Provenance integrity scanner - verify claimed inputs
    File: q-trust-provenance-integrity-scanner_spec.md
    Modified: 2026-05-01 20:13
    Size: 7.7 KB