> v1 freeze note (2026-05-13): SciDEX v1 is frozen for code changes
> (see AGENTS.md § "v1 FROZEN — No Code Changes"). This spec touches
> v1 PG data + would land new scripts in v1, so it cannot be implemented
> in v1 by default. Two viable paths: (a) redirect the work into
> SciDEX-Substrate (the v2 backend) if/when substrate has migrated
> the relevant data, or (b) request the narrow "data-corruption fix"
> carve-out from a human, with the new code framed as read-only repair
> against the v1 DB. Until one of those happens, this spec is captured
> for the record but not actionable.
Effort: standard
The 2026-05-18 SciDEX artifact-file recovery session left only 15 of 4,302
paper_figure artifacts with a local file on disk (4,287 missing — 99.7%
gap). Each row carries an image_url in metadata (Europe PMC / PMC OA /
publisher CDN), but a sampling pass during the recovery session showed Europe
PMC returns HTTP 404 for ~97% of those URLs. The remaining ~3% are still
fetchable and would close the gap to ~4,150 missing rows; the rest need a
durable "we tried and the upstream is gone" marker so downstream code
doesn't keep retrying.
This is distinct from quest_engine_paper_figure_extraction_backfill_spec.md,
which extracts NEW figures from papers that have no paper_figures row at
all. Here every row already exists; the file behind the URL is what is
missing.
Run a rate-limit-aware redownload pass over the 4,287 paper_figure rows
without a local file, persist successful fetches to the canonical figures
path, and stamp metadata.image_unavailable=true + a metadata.image_404_at
timestamp on rows whose URL is confirmed dead so we stop re-attempting them.
image_url for rows whose URL was never recorded (separatepaper_figure rows without a file have been attempted atimage_url.
scidex.core.paths resolver, NOT a hardcoded site/figures/papers/)metadata.file_sha256 andmetadata.file_size_bytes.
metadata.image_unavailable=true and metadata.image_404_at=<iso>.
paper_figure artifact rows with no file_path (or file_pathimage_unavailable; else leaveSciDEX-Artifacts submodule viascidex.atlas.artifact_commit.commit_artifact — never raw git addRetry-AfterSciDEX-Artifacts. Batch commits (e.g. 100 files perunassigned