[Forge] Resource-prediction model — predict CPU/RAM/time pre-run

Goal

Train a lightweight regression model that, given a candidate analysis script
+ runtime spec, predicts CPU-seconds, peak RAM, and wall-clock duration before it runs. Wire the predictions into the executor so the supervisor
can allocate the right slot, set a conservative cgroup memory cap, and reject
manifestly impossible jobs early.

Why this matters

The cgroup isolator at scidex/senate/cgroup_isolation.py already enforces
limits, but the limits are static defaults from RESOURCE_LIMITS. Half the
analyses are wasting headroom (8 GB cap on a 200 MB job) and the other half
are getting OOM-killed mid-debate. A predictor closes that loop, lifting
throughput without raising the host's resident-set ceiling.

Acceptance Criteria

☐ New module scidex/forge/resource_predictor.py exposes

predict(script_text, runtime_name, input_sizes_mb) returning
{cpu_seconds, peak_rss_mb, wall_seconds, confidence}.

☐ Model trained from RuntimeResult history rows (joined to existing

forge_runtime log lines); features include script-token counts of
heavy libs (scanpy, torch, pyscenic, pymc), runtime name,
input artifact sizes, and the analysis's prior runs (if any).

☐ Migration resource_prediction(analysis_id, predicted_cpu_s,


      predicted_rss_mb, predicted_wall_s, actual_cpu_s, actual_rss_mb,
      actual_wall_s, model_version, predicted_at, settled_at)

for
ground-truth tracking.

☐ Executor: when deterministic mode is off, set

memory_limit_mb = 1.5 * predicted_rss_mb, capped to host budget;
timeout_seconds = max(60, 1.3 * predicted_wall_s).

☐ /forge/resource-predictor page renders prediction-vs-actual scatter

and current model accuracy (R² across the last 200 runs).

☐ Retrain job runs weekly via scidex-predictor-retrain.timer once

≥100 settled rows accumulate.

Approach

Start with sklearn.ensemble.GradientBoostingRegressor per target —

small, fast, no GPU dependency.

Feature extraction reuses tool_chains.py import-detection scaffolding.

Confidence = predicted ± 1.96 * residual_std_for_this_runtime.

Roll-out: dry-run for 3 days (predict but don't enforce), then enforce.

Dependencies

scidex/senate/cgroup_isolation.py, forge/runtime.py,

scidex/forge/executor.py.

Work Log

2026-04-27 — Implementation complete [task:b3a31ccd-59a1-4216-a98c-bd3c72370f30]

Files created/modified:

scidex/forge/resource_predictor.py — GBR model per target; predict(), train(),

record_prediction(), settle_actuals(), get_accuracy_stats(), retrain_if_ready().
Feature vector: 30 dims (heavy-lib import flags, token counts, runtime encoding,
input sizes, prior-run actuals). Falls back to synthetic training from analyses.resource_cost
until ≥10 real settled rows accumulate.

migrations/add_resource_prediction_table.py — Creates resource_prediction table with

predicted/actual columns + partial index on settled_at.

scidex/forge/executor.py — LocalExecutor.run_analysis now calls predict() pre-run,

writes prediction via record_prediction(), overrides memory_limit_mb and timeout_seconds
with 1.5×predicted_rss / max(60, 1.3×predicted_wall) when confidence > 5%, and calls
settle_actuals() post-run.

api.py — Added GET /forge/resource-predictor (HTML dashboard with Chart.js scatter),

GET /api/forge/resource-predictor/stats, POST /api/forge/resource-predictor/predict.

deploy/bootstrap/systemd/scidex-predictor-retrain.service + .timer — weekly retrain

at Sun 03:00 UTC via retrain_if_ready() (no-op until ≥100 settled rows).

Dry-run mode: Set RESOURCE_PREDICTOR_DRY_RUN=1 to predict without enforcing.

Acceptance criteria status:

✅ predict() function with correct signature and return keys
✅ GBR trained from DB history; features include heavy-lib imports + runtime + sizes
✅ resource_prediction migration applied
✅ Executor dynamic caps: 1.5×rss + 1.3×wall with host budget ceiling
✅ /forge/resource-predictor scatter + R² dashboard
✅ scidex-predictor-retrain.timer weekly, ≥100-row guard

Tasks using this spec (1)

[Forge] Resource-prediction model - predict CPU/RAM/time pre

Analysis Sandboxing done P88

File: q-sand-resource-prediction_spec.md

Modified: 2026-05-01 20:13

Size: 4.3 KB