> v1 freeze note (2026-05-13): SciDEX v1 is frozen for code changes
> (see AGENTS.md § "v1 FROZEN — No Code Changes"). This spec touches
> v1 PG data + would land new scripts in v1, so it cannot be implemented
> in v1 by default. Two viable paths: (a) redirect the work into
> SciDEX-Substrate (the v2 backend) if/when substrate has migrated
> the relevant data, or (b) request the narrow "data-corruption fix"
> carve-out from a human, with the new code framed as read-only repair
> against the v1 DB. Until one of those happens, this spec is captured
> for the record but not actionable.
Effort: standard
scidex.artifacts.content_hash is a dedup key computed as
sha256(title + canonicalized_metadata). It is NOT a hash of the artifact's
file. This was conflated for months — including during the 2026-05-18
recovery session — and the conceptual correction made it into memory
(project_scidex_artifact_content_hash_semantics.md) but the corresponding
backfill of an actual file hash was never run.
figure_generator already writes metadata.file_sha256 = sha256(file) on
new figures. Older figures, all notebooks pre-2026-04, analyses, datasets,
models, and most paper_figures have no recorded file hash at all. With
~7,700 artifact files now on disk across types, that's a non-trivial
gap for any future integrity verification.
For every artifact whose file_path resolves to a real file on disk,
compute sha256 and write it to metadata.file_sha256 (preserving any
existing key with the same value; updating if drift is detected with a
loud log line). Also record metadata.file_size_bytes and
metadata.file_sha256_computed_at.
artifacts.content_hash. It's the dedup key; do not touch.file_path resolves to a real file hasmetadata.file_sha256 populated.
metadata.file_size_bytes matches the on-disk size for everyfile_sha256_computed_at marker.
file_path IS NOT NULL. Order byartifact_type, id so resumes are deterministic.
file_path via scidex.core.paths. If the filefile_sha256, file_size_bytes, file_sha256_computed_atfile_sha256 already and it differs.)
dataset_model_file_recovery_spec so we don'tfile_sha256_computed_at and the other will skip.unassigned