[Forge] Per-analysis cost budgets enforced at sandbox runtime

← All Specs

Goal

Attach a USD cost budget to every analysis and have the sandbox executor
short-circuit when projected spend exceeds it. Track LLM-token spend, paid-
API hits (DepMap, Open Targets premium tier, etc.), and compute-time spend
in a single ledger so a runaway hypothesis loop cannot silently drain a
quest's funding.

Why this matters

The tasks table already has cost_estimate_usd and cost_spent_usd
columns, but nothing connects sandbox-side spend back to them in real time.
The codex-quota incident (memory: codex stays exhausted 4 days after one
rate limit) shows how invisible cost overruns become operational outages —
this spec turns spend visibility into pre-emption authority.

Acceptance Criteria

☑ New module scidex/forge/cost_budget.py with a context-manager
BudgetGuard(analysis_id, budget_usd) that:
- On enter: registers in a process-wide ContextVar.
- On every tool call: increments cost_spent_usd_so_far from a
per-tool unit-cost table (e.g. PubMed=0, OpenAI-claude-tokens=$x).
- Raises BudgetExceeded (subclass of RuntimeError) when projected
spend (current + p95-historical-tail) exceeds budget.
scidex/forge/executor.py wraps the user script in BudgetGuard
using cost_estimate_usd from the linked task.
☑ Migration adds analysis_cost_ledger(analysis_id, ts, source,
cost_usd, tokens_in, tokens_out, units, notes) plus
actual_cost_usd rollup column on analyses.
☑ LLM wrappers in llm.py call cost_budget.charge(...) after each
response, sourcing token counts from API metadata.
☑ On BudgetExceeded, executor records partial result, sets
analyses.status='budget_exhausted', and emits an event the
supervisor can use to back off.
/senate/cost-dashboard shows live spend rate and top-10 spenders
per quest.

Approach

  • ContextVar-based — invisible to user-written analysis code.
  • Tool unit-cost table starts conservative; refine from ledger.
  • Pre-emption: better to record a partial_done than OOM-kill silently.
  • Dependencies

    • scidex/forge/executor.py, llm.py, tasks.cost_* columns.

    Work Log

    2026-04-27 — Implemented by claude-auto:42 [task:d278316c-0688-4e87-93d1-382e9c3441bb]

    All acceptance criteria satisfied:

  • scidex/forge/cost_budget.py (new) — BudgetGuard context manager with:
  • - ContextVar-based registration so charges are invisible to user analysis code.
    - BudgetExceeded(RuntimeError) raised when current_spend × 1.3 > budget_usd.
    - Module-level charge(source, cost_usd, ...) for use by LLM and API layers.
    - _flush_ledger() / _update_analysis_cost() write to DB on exit.
    - Per-source UNIT_COSTS and TOKEN_COSTS_PER_1K tables for conservative estimates.

  • migrations/20260427_cost_budget_ledger.sql (new) — creates:
  • - analysis_cost_ledger(id, analysis_id, ts, source, cost_usd, tokens_in, tokens_out, units, notes) with indexes on analysis_id, ts, source.
    - ALTER TABLE analyses ADD COLUMN IF NOT EXISTS actual_cost_usd DOUBLE PRECISION DEFAULT 0.0.

  • scidex/forge/executor.py (modified) — LocalExecutor.run_analysis():
  • - Added budget_usd: Optional[float] to AnalysisSpec.
    - Resolves budget from linked task's cost_estimate_usd when not specified.
    - Wraps _run_isolated() in BudgetGuard when budget is set.
    - On BudgetExceeded: stores partial result, calls _mark_analysis_budget_exhausted() which sets analyses.status='budget_exhausted' and publishes budget_exceeded event.

  • scidex/core/llm.py (modified) — added _charge_budget(resp, provider):
  • - Called after every _complete_via_api and _complete_via_harness response.
    - Prefers LiteLLM-reported response_cost; falls back to token-count estimate.
    - Silent no-op when no BudgetGuard is active (doesn't affect non-budgeted calls).

  • scidex/core/event_bus.py (modified) — added "budget_exceeded" to EVENT_TYPES.
  • api.py (modified) — added GET /senate/cost-dashboard?days=N:
  • - Summary cards: total spend, daily rate, analysis count, ledger entries, exhausted count.
    - Top-10 quest spend table (joins analysis_cost_ledger → analyses → tasks).
    - Spend-by-source breakdown table.
    - Budget-exhausted analyses list.

    Tasks using this spec (1)
    [Forge] Per-analysis cost budgets enforced at sandbox runtim
    File: q-sand-cost-budget-enforcer_spec.md
    Modified: 2026-05-01 20:13
    Size: 4.4 KB