[Atlas] Provenance integrity scanner — verify claimed inputs produced output

Goal

The artifact-provenance graph (artifact_provenance, artifact_links) tells us "artifact Y was produced from inputs X1,
X2, X3 by skill S at time T", but nothing actually verifies that the
claim holds. An attacker (or a buggy skill) can write a provenance
row pointing at unrelated inputs. This task ships a scanner that, for
artifacts whose skill is deterministic / re-runnable, recomputes the
output from the recorded inputs and compares hashes. When the
recomputed hash diverges from the stored hash, the artifact is
flagged as lifecycle='provenance_break' and a Senate review task is
spawned. The scanner runs in two modes: strict for deterministic
skills (recomputes), soft for non-deterministic skills (LLM
generations — checks that the recorded inputs include all PMIDs
referenced in the output, and that no input was modified after the
output's created_at).

Effort: thorough

Acceptance Criteria

☐ New module scidex/atlas/provenance_integrity.py:

- classify_skill(skill_id) -> Mode returns
'strict' | 'soft' | 'skip' from a registry table:

CREATE TABLE skill_provenance_mode (
          skill_id TEXT PRIMARY KEY,
          mode TEXT NOT NULL CHECK (mode IN ('strict','soft','skip')),
          rationale TEXT
        );

Seed strict-mode for: pubmed-search, openalex-works,
compute_content_hash-emitters; soft-mode for: theorist,
skeptic, hypothesis-generation; skip for personas whose
outputs are uniquely contextual.
- scan_artifact(artifact_id) -> ScanResult — per mode:
- strict: re-invoke the skill with stored inputs;
compare output hash. (Re-invocation goes through
scidex/forge/executor.py:isolated_run.)
- soft: parse all PMID/DOI/UUID references from the
output; assert each appears in the provenance inputs;
assert no input row's last_updated_at > output.created_at.

☐ Migration migrations/20260428_provenance_integrity.sql:

CREATE TABLE provenance_integrity_scan (
        id BIGSERIAL PRIMARY KEY,
        artifact_id   TEXT NOT NULL,
        scan_mode     TEXT NOT NULL,
        scanned_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
        verdict       TEXT NOT NULL CHECK (verdict IN
                      ('verified','soft_verified','divergent',
                       'missing_inputs','reinvoke_failed','skipped')),
        recomputed_hash TEXT,
        stored_hash     TEXT,
        details         JSONB
      );
      CREATE INDEX idx_pis_verdict_bad ON provenance_integrity_scan
        (artifact_id) WHERE verdict IN ('divergent','missing_inputs');

(Plus the skill_provenance_mode table above.)

☐ Recurring quest — nightly, sample 200 artifacts/night

weighted toward high-Elo / high-traffic ones (this scales the
cost; a full fleet sweep is unaffordable for strict mode).
Skipped artifacts are still recorded as verdict='skipped'
so coverage is calculable.

☐ Divergence handler — verdict='divergent' artifacts

move to lifecycle='provenance_break', lose their Elo bonus
from q-trust-signed-artifact-attestations if any (set
attestation_status='quarantined'), and a Senate task is
spawned with the recomputed-vs-stored hash diff for review.

☐ Missing-input handler — verdict='missing_inputs'

artifacts are flagged but kept on the public surface; they
appear with a "Provenance incomplete" badge in the artifact UI.

☐ Audit-chain integration — every scan row appends to

audit_chain (event_kind='provenance_integrity_scan')
so verdict deletion is detectable.

☐ Senate dashboard tile "Provenance integrity (30d)" — counts

by verdict + a "Provenance break list" with deep-links to the
Senate review tasks. Coverage metric: scanned / total.

☐ API: GET /api/atlas/provenance/scan/{artifact_id} returns

the latest scan row + a "rescan" trigger button protected by
auth (avoids drive-by re-runs that would burn LLM budget).

☐ Tests tests/test_provenance_integrity.py:

- strict happy path (mock skill returns identical bytes);
- strict divergence (mock skill returns different bytes;
verdict=divergent, lifecycle moves);
- soft happy path (PMIDs in output ⊆ inputs);
- soft missing input (PMID in output not in inputs);
- soft input-after-output time violation;
- skip mode short-circuits with verdict=skipped;
- audit-chain row written on every scan.

Approach

Seed skill_provenance_mode with explicit rationales (commit

the seed YAML alongside the module).

Migration; module; mode-specific scan logic; tests with mocked

isolated_run.

Recurring registration; dry run on 5 artifacts; record verdicts.

Divergence handler — wire to the existing lifecycle-state-

machine path (see q-gov-lifecycle-state-machine-enforcement).

Audit-chain integration; tile.

Dependencies

q-trust-audit-log-hash-chain — chain-append target.
scidex/forge/executor.isolated_run — re-invocation.
q-gov-lifecycle-state-machine-enforcement — handles

provenance_break lifecycle transition.

Dependents

Future "trusted artifacts only" filter on dashboard listings.

Work Log

2026-04-27T14:55:00Z — Verification

Task is STALE: none of the acceptance criteria are implemented on main
Verified current state below; committing verification block as spec-only edit

Verification — 2026-04-27T14:55:00Z

Result: FAIL Verified by: MiniMax-M2 via task 2bcb16bc-b48f-425d-95f0-bff62cc9a9b9

Tests run

Target	Command	Expected	Actual	Pass?
`scidex/atlas/provenance_integrity.py`	`ls scidex/atlas/provenance*.py`	module exists	No such file	✗
`skill_provenance_mode` table	`SELECT ... FROM information_schema.tables`	table exists	NOT FOUND	✗
`provenance_integrity_scan` table	`SELECT ... FROM information_schema.tables`	table exists	NOT FOUND	✗
`lifecycle_state='provenance_break'`	`SELECT DISTINCT lifecycle_state FROM artifacts`	provenance_break present	NOT PRESENT (values: active/deprecated/frozen/superseded/validated)	✗
`artifact_provenance` table	`SELECT COUNT(*) FROM artifact_provenance`	exists	EXISTS (25 rows)	✓
`audit_chain` table	`SELECT ... FROM information_schema.tables`	exists	EXISTS	✓
`isolated_run` function	`from scidex.senate.cgroup_isolation import isolated_run`	callable	TRUE	✓
`GET /api/atlas/provenance/scan/{id}`	`grep -r "provenance/scan"` in api.py	route exists	NOT FOUND	✗
`tests/test_provenance_integrity.py`	`ls tests/test_provenance_integrity.py`	test file	NOT FOUND	✗

Notes

The spec references scidex/forge/executor.py:isolated_run but the actual function lives in scidex/senate/cgroup_isolation.py:isolated_run() and is called via self._cgroup.isolated_run() from ForgeExecutor
artifact_provenance table exists with 25 rows (action_kind='spawn_link' entries linking hypotheses to analyses)
lifecycle_state column exists on artifacts table (NOT lifecycle as spec says) with values: active, deprecated, frozen, superseded, validated — provenance_break is NOT a valid state
All acceptance criteria remain unimplemented: no module, no tables, no API route, no tests, no handlers, no recurring quest
The q-gov-lifecycle-state-machine-enforcement dependency may or may not exist (lifecycle_state machine doesn't have provenance_break state)

Tasks using this spec (1)

[Atlas] Provenance integrity scanner - verify claimed inputs

Artifact Governance & Lifecycle Management done P89

File: q-trust-provenance-integrity-scanner_spec.md

Modified: 2026-05-01 20:13

Size: 7.7 KB