SciDEX — Task: [Forge] Skill cost-rationality router

Replace static SKILL_GROUPS dispatch with router using live cost + composite quality + drift filter; shadow-mode rollout.

Completion Notes

Auto-release: work already on origin/main

Git Commits (2)

[Verify] Skill cost-rationality router — already resolved [task:7a3e5888-b8fe-4dd1-82fe-dfdd1ab80c3d] (#720)2026-04-27

[Forge] Skill cost-rationality router — pick cheaper skill when quality matches [task:7a3e5888-b8fe-4dd1-82fe-dfdd1ab80c3d] (#719)2026-04-27

Spec File

Effort: thorough

Goal

scidex/agora/skill_evidence.py:SKILL_GROUPS dispatches by topic with a
hardcoded order — the agent will call uniprot_protein_info even when alphafold_structure would have been cheaper and equally informative. With agent_skill_invocations now logging latency_ms, and q-skills-quality-leaderboard providing per-skill composite quality, we
can build a cost-rationality router that picks the cheapest skill in each
group whose composite is within ε of the group max, falling back to the
quality leader only when nothing else clears the bar. This converts skill
selection from a static table into a measured decision.

Acceptance Criteria

☐ Module scidex/forge/skill_router.py exposing

pick(group: str, available: list[str], constraint: dict) -> str.
constraint accepts latency_budget_ms, min_composite,
forbid_drifted (default True).

☐ Cost model uses live data — no hardcoded prices:

cost(skill) = p50_latency_ms / 1000 plus
external_call_cost_usd if the skill row carries an
external_call_cost_usd annotation in its SKILL.md frontmatter
(new optional metadata key — schema-extension only, default 0).

☐ Routing rule (documented inline):

1. Drop skills with composite < min_composite (default 0.4).
2. Drop skills with is_drifted events at severity=high in the
past 24 h when forbid_drifted=True.
3. Among survivors, find max composite q*.
4. Keep skills with composite ≥ q* - 0.05.
5. Of those, return the lowest-cost one. Ties broken by lowest
p95 latency, then alphabetical.

☐ scidex/agora/skill_evidence.py calls skill_router.pick instead

of iterating SKILL_GROUPS[group] directly. The legacy ordered
list becomes the fallback used only when telemetry is empty.

☐ Decision log: every router invocation writes one row to

skill_router_decisions(id, group, candidates_json, picked,
       reason, picked_composite, picked_cost, decided_at)

so we can
audit "why did the router pick X over Y".

☐ Migration migrations/20260428_skill_router_decisions.sql.

☐ HTML page /forge/skills/router shows a stacked bar of

decisions-per-skill in the last 24 h + a link to the most recent
decisions and a "shadow-mode" toggle (env var
SCIDEX_SKILL_ROUTER_SHADOW=1 returns legacy pick but logs the
counterfactual chosen — used during rollout).

☐ Tests tests/test_skill_router.py:

(a) all skills have composite=0 → falls back to legacy first item.
(b) one skill clearly dominates (composite=0.9 vs 0.5) → picked.
(c) two skills near-tie composite, costs 100 ms vs 1000 ms →
picks the 100 ms one.
(d) latency_budget_ms=50 + winner is 100 ms → returns the cheaper
alternative even if composite is lower.
(e) all candidates drifted → returns the least drifted with a
warning logged.

☐ Smoke: run pick("gene", available=[...], constraint={}) against

live PG; verify a decision row was written and the picked skill
matches the documented rule when checked manually.

Approach

Read scidex/agora/skill_evidence.py:SKILL_GROUPS to understand

today's static dispatch.

Pull live composite + latency from mv_skill_usage_daily +

skill_quality_scores (created by previous skills tasks); cache for
60 s in-process.

Wrap pick() so it never raises — on any error, fall back to the

legacy list and log the failure into skill_router_decisions with
reason='router_error_fallback'.

Shadow-mode rollout: ship with SCIDEX_SKILL_ROUTER_SHADOW=1

default-on, observe one week of counterfactual logs, then flip in a
follow-up.

Dependencies

q-skills-usage-telemetry — mv_skill_usage_daily.
q-skills-quality-leaderboard — composite score.
q-skills-versioning-drift — skill_drift_events for the

forbid_drifted filter.

Dependents

q-skills-bundle-auto-discovery — bundles annotated with

external_call_cost_usd plug straight into the router.

Work Log

2026-04-27 06:05 PT — Slot 0 (Staleness Review)

Verified task already resolved on main at commit 696ce0f89
Confirmed: skill_router.py, migration, test file, HTML page, skill_evidence wiring all present
Ran smoke test: pick('gene', [...], {}) returned uniprot_protein_info, wrote decision row to DB
Ran test suite: 10/10 passed
Decision log table skill_router_decisions verified live on PG

Result: Already resolved — this task was completed and merged to main before this worktree was created. No additional work needed.

Payload JSON

{
  "completion_shas": [
    "1c68e3319",
    "696ce0f89"
  ],
  "completion_shas_checked_at": "2026-04-27T13:12:09.139794+00:00"
}

Sibling Tasks in Quest (Forge) ↗

○[Forge] Integrate tools with debate engineP95

○[Forge] Reproducible analysis capsules and artifact supply chainP93

○[Forge] Benchmark answer-key migration to dataset registry (driver #31)P93

○[Forge] CI: Experiment claim driver — pick high-IIG experiments for executionP93

○[Forge] Benchmark evaluation harness — run top 50 hypotheses through 6 registered benchmarks, store predictive scoresP92

○[Forge] CI: Paper replication target selectorP91

○[Forge] Artifact enrichment quest — evaluation context, cross-links, provenanceP82

○[Forge] Reduce PubMed metadata backlog for papers missing abstractsP82

○[Forge] CI: Test all scientific tools for availabilityP78

○[Forge] Execute: testes-gonadal RNA-seq experiment 5b0bb7afP70

[Forge] Skill cost-rationality router - pick cheaper skill when quality matches done