[Forge] Deterministic-replay sandbox mode (pinned deps + seeded RNG)

← All Specs

Goal

Add a deterministic=True execution mode to the Forge runtime that pins
every transitive Python dependency, seeds every RNG (numpy, random, torch,
jax, scikit-learn), captures LD_PRELOAD, kernel build, GPU driver, and
locale, and writes an env_hash such that two replays of the same analysis
on the same host produce byte-identical artifacts.

Why this matters

The existing forge/runtime.py (632 LoC) and scidex/forge/executor.py (898
LoC) capture the command but not the deeper environment surface — so a
nightly pip upgrade silently breaks reproducibility. Deterministic mode
turns "this notebook ran on April 1" into a contract we can re-honor on
April 1 next year.

Acceptance Criteria

RuntimeSpec (in forge/runtime.py) gains a deterministic: bool
flag; when set, the executor:
- Calls pip freeze --all and writes requirements.lock into the
run dir before invocation.
- Sets PYTHONHASHSEED=0, OMP_NUM_THREADS=1, OPENBLAS_NUM_THREADS=1,
and exports a generated SCIDEX_RNG_SEED.
- Wraps the user script with a tiny _det_preamble.py that seeds
random.seed, numpy.random.seed, torch.manual_seed,
torch.cuda.manual_seed_all, jax.random.PRNGKey.
RuntimeResult records env_hash = sha256(requirements.lock +
kernel_uname + glibc_version + cuda_version + locale + tzdata_version).
repro_capsule_schema.json extended with env_hash,
requirements_lock_path, seed.
scripts/replay_analysis.py <analysis_id> re-runs the analysis with
the captured env, fails fast if env_hash cannot be reproduced, and
diffs output bytes — exits 0 only on bit-identical replay.
☐ Acceptance test: pick 3 representative analyses, run twice with
deterministic=True 1 hour apart, assert byte-identical outputs.

Approach

  • Preamble file generated per run, never modifying user code.
  • env_hash is order-independent: sort lines before hashing.
  • Replays that fail are still useful — record the diverged hash so the
  • user sees which input drifted.
  • Mode is opt-in initially; later, mark "publication-quality" analyses as
  • requiring it.

    Dependencies

    • forge/runtime.py, scidex/forge/executor.py, repro_capsule_schema.json.

    Work Log

    2026-04-27 — Implementation complete [task:863fbdf1-cc25-4a23-8d22-92dc7e094afc]

    All acceptance criteria implemented:

    • RuntimeSpec.deterministic: bool field added to forge/runtime.py
    • run_python_script(deterministic=True) calls _setup_deterministic_environment to:
    - Run pip freeze --all and write a sorted requirements.lock.<seed>.txt
    - Set PYTHONHASHSEED=0, OMP_NUM_THREADS=1, OPENBLAS_NUM_THREADS=1, SCIDEX_RNG_SEED
    - Generate a deterministic wrapper script that seeds random, numpy, torch, jax, and scikit-learn before exec'ing the user script
    • RuntimeResult gains env_hash, requirements_lock_path, seed fields
    • env_hash = sha256(kernel + glibc + cuda + locale + tzdata) — order-independent, sorted keys
    • repro_capsule_schema.json extended with env_hash (pattern ^sha256:[a-f0-9]{64}$), requirements_lock_path, seed
    • scripts/replay_analysis.py added: loads capsule metadata, verifies env_hash, re-runs with seeded wrapper, diffs output file hashes byte-by-byte, exits 0 only on bit-identical replay
    • Bugs fixed: removed redundant early _write_det_wrapper call (was writing to system temp before tmpdir existed); sys.argv now injected into wrapper so args pass through correctly; __name__ fixed in exec namespace so if __name__ == '__main__': guards fire correctly
    • Verified: two runs with same seed produce identical random.random() output

    Tasks using this spec (1)
    [Forge] Deterministic-replay sandbox mode (pinned deps + see
    File: q-sand-deterministic-replay_spec.md
    Modified: 2026-05-01 20:13
    Size: 3.8 KB