Spec: Agent Performance Dashboard

Task ID: e7e780b3-1362-4e21-a208-24cf7c422f93 Layer: Senate Priority: 85

Goal

Build a dashboard showing agent performance: debate quality scores, hypothesis score improvements, tool usage efficiency. Track which agents produce the best hypotheses.

Acceptance Criteria

☑ Dashboard at /senate/performance shows agent quality scores, hypothesis output, token efficiency

☑ KPI cards with total debates, hypotheses, avg scores, best agent

☑ Agent scorecards with composite ranking (quality/efficiency/contribution/precision)

☑ Agent synergy matrix showing collaboration quality

☑ Performance trajectory (older vs recent quality)

☑ Tool call efficiency table with success rates and latency

☑ Debate quality trends with Chart.js visualization

☑ Radar chart comparing agents across dimensions

☑ ROI per token analysis

☑ Hypothesis score movers and improvement attribution

☑ Debate depth vs hypothesis quality analysis

☑ Activity timeline with token allocation

☑ Per-analysis performance breakdown

☑ Persona believability scores

☑ Agent detail pages at /senate/agent/{id}

☑ /api/agent-scorecards JSON endpoint

☑ /api/agent-rankings comprehensive JSON endpoint

☑ Page caching for performance

Work Log

2026-04-02 — Slot 8

Explored existing codebase: found /senate/performance already has 15+ comprehensive sections
Dashboard includes: KPI cards, scorecards, synergy matrix, trajectory, tool efficiency, radar chart, ROI, movers, contribution impact, believability, timeline, depth analysis
Added page caching via _get_cached_page/_set_cached_page to reduce load time for 145KB page
Added /api/agent-rankings comprehensive JSON endpoint with debate quality, hypothesis output, tool efficiency, cost estimates
Added link to Rankings API from performance page
Verified syntax: py_compile passes
Result: Done — dashboard was already built; added caching + new API endpoint

File: e7e780b3_spec.md

Modified: 2026-05-01 20:13

Size: 2.0 KB