[Forge] Skill usage telemetry — per-persona, per-artifact rollups

← All Specs

Effort: deep

Goal

scidex/agora/skill_evidence.py:_log_invocation already writes one row per
K-Dense skill call into agent_skill_invocations (migration 20260427_agent_skill_invocations.sql — 7 columns: skill_name, artifact_class, artifact_id, persona, success, latency_ms, cited_in_artifact). Today nothing reads those rows: there is no rollup, no
endpoint, no view. Build the first analytics layer so we can answer "which
skills did the Theorist actually use on hypothesis H, and how many of those
calls became evidence in the final artifact?". Without this loop the table is
write-only and the 23 tool skills + 100+ Anthropic skills are an opaque cost
center.

Acceptance Criteria

☑ New module scidex/forge/skill_telemetry.py exposing four queries
(each pure SQL, no LLM):
- usage_by_skill(window_days: int) -> list[dict] — per skill_name,
total_calls, success_rate, p50_latency_ms, p95_latency_ms,
cited_rate (= cited / success).
- usage_by_persona(window_days: int) -> list[dict] — per persona,
distinct_skills, total_calls, top_skill, top_skill_share.
- usage_by_artifact_class(window_days: int) -> list[dict] — per
artifact_class (the 10 enum values from the migration's CHECK
constraint), counts + skills-per-artifact distribution.
- coverage_for_artifact(artifact_class, artifact_id) -> dict
every skill called for one artifact, with citation status, ordered
by created_at.
☑ Materialized view mv_skill_usage_daily(skill_name, day, calls,
success, errors, p50_ms, p95_ms, cited) refreshed via a new
scidex-skill-telemetry-rollup systemd timer (template the unit file
under deploy/bootstrap/systemd/); refresh interval 15 min. Manual
refresh via python -m scidex.forge.skill_telemetry refresh.
☑ API routes added to api_routes/forge.py:
GET /api/forge/skills/telemetry?window=7d (returns all four
rollups bundled) and
GET /api/forge/skills/{skill_name}/calls?limit=50 (recent rows).
☑ HTML page at /forge/skills/telemetry renders the four tables with
sortable headers (vanilla JS, no framework). Each skill_name links
to /forge/skills/{skill_name} (already exists for canonical
skills); each persona links to /persona/{slug}.
☑ Tests tests/test_skill_telemetry.py: synthetic 200-row fixture with
3 skills × 4 personas × 5 artifact_classes; assert each rollup row
count, that p50 + p95 are bracketed by min/max, that cited_rate
stays in [0, 1], and that the artifact-coverage query returns rows
in chronological order.
☑ CLI smoke: run all four rollups against live PG; verify each
returns at least one row (or [] if the table is genuinely empty)
and write the first daily snapshot into mv_skill_usage_daily.

Approach

  • Read scidex/senate/quality_dashboard.py for the existing rollup-API
  • pattern (sortable HTML + JSON endpoint + Jinja template).
  • Author the four queries against agent_skill_invocations only —
  • nothing else needs to change. Use PG's percentile_cont(0.5) and
    percentile_cont(0.95) for latency percentiles.
  • Build the materialized view in a new migration
  • migrations/20260428_mv_skill_usage_daily.sql. Schedule
    REFRESH MATERIALIZED VIEW CONCURRENTLY from the new systemd timer.
  • Wire the API + HTML in lockstep; reuse templates/forge/base.html so
  • the page slots into the Forge nav.
  • Tests against an in-memory pg fixture (mirror existing
  • tests/test_skill_evidence.py setup).

    Dependencies

    • agent_skill_invocations table (migration 20260427_agent_skill_invocations.sql).
    • scidex/agora/skill_evidence.py — current writer; this task only reads.

    Dependents

    • q-skills-quality-leaderboard — consumes cited_rate + success_rate.
    • q-skills-cost-rationality — consumes p50/p95 latency for cost models.

    Work Log

    2026-04-27 — Implementation complete [task:64c8d088-0ea4-4acc-a26f-d27d1be4fc76]

    Files created:

    • scidex/forge/skill_telemetry.py — four pure-SQL rollup functions
    (usage_by_skill, usage_by_persona, usage_by_artifact_class,
    coverage_for_artifact), recent_calls_for_skill, build_telemetry_bundle,
    refresh_materialized_view, and __main__ CLI for refresh/smoke.
    • migrations/20260428_mv_skill_usage_daily.sql — materialized view with
    UNIQUE index on (skill_name, day) required for CONCURRENT refresh.
    • deploy/bootstrap/systemd/scidex-skill-telemetry-rollup.service + .timer
    — 15-minute REFRESH MATERIALIZED VIEW CONCURRENTLY cycle.
    • api_routes/forge.py — new router with
    GET /api/forge/skills/telemetry?window=7d,
    GET /api/forge/skills/{skill_name}/calls?limit=50,
    GET /forge/skills/telemetry (HTML dashboard, 4 sortable tables, vanilla JS).
    • tests/test_skill_telemetry.py — 11 tests against 200-row synthetic fixture
    (3 skills × 4 personas × 5 artifact_classes); all pass.

    api.py change: added from api_routes.forge import router as _forge_router
    and app.include_router(_forge_router) at line ~1244.

    Note on templates/forge/base.html: the spec referenced this template but
    it does not exist in the codebase. The HTML page uses api_shared.nav.page_template
    instead, consistent with all other route files (e.g. api_routes/senate.py).

    Tasks using this spec (1)
    [Forge] Skill usage telemetry - per-persona, per-artifact ro
    Forge done P90
    File: q-skills-usage-telemetry_spec.md
    Modified: 2026-05-01 20:13
    Size: 5.5 KB