[Forge] Skill cost-rationality router - pick cheaper skill when quality matches done

← Forge
Replace static SKILL_GROUPS dispatch with router using live cost + composite quality + drift filter; shadow-mode rollout.

Completion Notes

Auto-release: work already on origin/main

Git Commits (2)

[Verify] Skill cost-rationality router — already resolved [task:7a3e5888-b8fe-4dd1-82fe-dfdd1ab80c3d] (#720)2026-04-27
[Forge] Skill cost-rationality router — pick cheaper skill when quality matches [task:7a3e5888-b8fe-4dd1-82fe-dfdd1ab80c3d] (#719)2026-04-27
Spec File

Effort: thorough

Goal

scidex/agora/skill_evidence.py:SKILL_GROUPS dispatches by topic with a
hardcoded order — the agent will call uniprot_protein_info even when alphafold_structure would have been cheaper and equally informative. With agent_skill_invocations now logging latency_ms, and q-skills-quality-leaderboard providing per-skill composite quality, we
can build a cost-rationality router that picks the cheapest skill in each
group whose composite is within ε of the group max, falling back to the
quality leader only when nothing else clears the bar. This converts skill
selection from a static table into a measured decision.

Acceptance Criteria

☐ Module scidex/forge/skill_router.py exposing
pick(group: str, available: list[str], constraint: dict) -> str.
constraint accepts latency_budget_ms, min_composite,
forbid_drifted (default True).
☐ Cost model uses live data — no hardcoded prices:
cost(skill) = p50_latency_ms / 1000 plus
external_call_cost_usd if the skill row carries an
external_call_cost_usd annotation in its SKILL.md frontmatter
(new optional metadata key — schema-extension only, default 0).
☐ Routing rule (documented inline):
1. Drop skills with composite < min_composite (default 0.4).
2. Drop skills with is_drifted events at severity=high in the
past 24 h when forbid_drifted=True.
3. Among survivors, find max composite q*.
4. Keep skills with composite ≥ q* - 0.05.
5. Of those, return the lowest-cost one. Ties broken by lowest
p95 latency, then alphabetical.
scidex/agora/skill_evidence.py calls skill_router.pick instead
of iterating SKILL_GROUPS[group] directly. The legacy ordered
list becomes the fallback used only when telemetry is empty.
☐ Decision log: every router invocation writes one row to
skill_router_decisions(id, group, candidates_json, picked,
reason, picked_composite, picked_cost, decided_at)
so we can
audit "why did the router pick X over Y".
☐ Migration migrations/20260428_skill_router_decisions.sql.
☐ HTML page /forge/skills/router shows a stacked bar of
decisions-per-skill in the last 24 h + a link to the most recent
decisions and a "shadow-mode" toggle (env var
SCIDEX_SKILL_ROUTER_SHADOW=1 returns legacy pick but logs the
counterfactual chosen — used during rollout).
☐ Tests tests/test_skill_router.py:
(a) all skills have composite=0 → falls back to legacy first item.
(b) one skill clearly dominates (composite=0.9 vs 0.5) → picked.
(c) two skills near-tie composite, costs 100 ms vs 1000 ms →
picks the 100 ms one.
(d) latency_budget_ms=50 + winner is 100 ms → returns the cheaper
alternative even if composite is lower.
(e) all candidates drifted → returns the least drifted with a
warning logged.
☐ Smoke: run pick("gene", available=[...], constraint={}) against
live PG; verify a decision row was written and the picked skill
matches the documented rule when checked manually.

Approach

  • Read scidex/agora/skill_evidence.py:SKILL_GROUPS to understand
  • today's static dispatch.
  • Pull live composite + latency from mv_skill_usage_daily +
  • skill_quality_scores (created by previous skills tasks); cache for
    60 s in-process.
  • Wrap pick() so it never raises — on any error, fall back to the
  • legacy list and log the failure into skill_router_decisions with
    reason='router_error_fallback'.
  • Shadow-mode rollout: ship with SCIDEX_SKILL_ROUTER_SHADOW=1
  • default-on, observe one week of counterfactual logs, then flip in a
    follow-up.

    Dependencies

    • q-skills-usage-telemetrymv_skill_usage_daily.
    • q-skills-quality-leaderboardcomposite score.
    • q-skills-versioning-driftskill_drift_events for the
    forbid_drifted filter.

    Dependents

    • q-skills-bundle-auto-discovery — bundles annotated with
    external_call_cost_usd plug straight into the router.

    Work Log

    2026-04-27 06:05 PT — Slot 0 (Staleness Review)

    • Verified task already resolved on main at commit 696ce0f89
    • Confirmed: skill_router.py, migration, test file, HTML page, skill_evidence wiring all present
    • Ran smoke test: pick('gene', [...], {}) returned uniprot_protein_info, wrote decision row to DB
    • Ran test suite: 10/10 passed
    • Decision log table skill_router_decisions verified live on PG
    Result: Already resolved — this task was completed and merged to main before this worktree was created. No additional work needed.

    Payload JSON
    {
      "completion_shas": [
        "1c68e3319",
        "696ce0f89"
      ],
      "completion_shas_checked_at": "2026-04-27T13:12:09.139794+00:00"
    }

    Sibling Tasks in Quest (Forge) ↗