[Senate] Hypothesis cohort tracker — survival analysis by birth-week

← All Specs

Effort: thorough

Goal

Hypotheses are minted continuously by Theorist agents, but the
platform never asks "of the hypotheses minted in week N, how many
survived to month 6?" — i.e. retained composite_score ≥ threshold,
weren't superseded, weren't quietly archived. This is the
fundamental epistemic-quality question: are we generating durable
ideas or burning compute on noise?

Build a Hypothesis Cohort Tracker: group hypotheses by
creation week (the cohort), compute survival/verification/Elo-
retention curves over time, surface which cohorts produced the
most durable ideas, and feed the curves into the q-epistemic-rigor quest as a quality KPI.

Acceptance Criteria

☐ New module scidex/senate/hypothesis_cohorts.py:
- compute_cohort(creation_week: date) -> dict returns
{cohort_size, survival_at: {30d: n, 90d: n, 180d: n,
365d: n}, verification_at: {...}, elo_p50_at: {...},
promoted_to_canonical: int, superseded: int,
archived: int}
where survival means
composite_score ≥ 0.6 AND not superseded.
- recompute_all_cohorts() -> int walks every week from
the earliest hypothesis creation_at to the current week
and rebuilds the hypothesis_cohort_metrics table.
☐ New table hypothesis_cohort_metrics with
(cohort_week, cohort_size, snapshot_at, age_days,
survivors, verified, mean_elo, median_elo, n_superseded,
n_archived, n_promoted)
— one row per (cohort, snapshot)
pair. Snapshot taken weekly.
☐ Systemd timer scidex-hypothesis-cohorts-weekly.timer
runs Sunday 23:00 UTC, recomputes the latest snapshot for
every cohort.
GET /cohorts/hypotheses dashboard:
- Header: total cohorts tracked, average 90-day survival
rate, trend arrow vs. trailing-quarter average.
- Heatmap: rows = cohort weeks, columns = age buckets
(30d / 90d / 180d / 365d), cell color = survival rate.
- "Best cohorts" sidebar: top-5 cohorts by 180-day
verification rate with link to each cohort's
hypothesis list.
- "Worst cohorts" sidebar: bottom-5 with prompt-evolution
diagnosis (which agent persona was generating that
week, were the source gaps stale, etc.) — sourced from
prompt_evolution.py history.
GET /cohort/{week} per-cohort detail page:
- Survival curve (Kaplan-Meier-style step function).
- Full list of cohort hypotheses with current
composite_score and supersede status.
- Provenance breakdown: which agents/personas authored
them.
☐ Quality KPI integration: cohort survival rates are
written into senate_metrics(metric=
'hypothesis_cohort_survival_180d', value, week)
so the
q-epistemic-rigor quest can alert on regressions.
☐ Pytest: seed 4 cohorts of varied sizes and survival
profiles; recompute → assert metric rows match expected;
KM curve render-test asserts monotone non-increasing
survival values.

Approach

  • "Superseded" detection uses
  • scidex.atlas.supersede_resolver already in the codebase
    (search supersede_resolver.py for the canonical helper).
  • Verification = composite_score ≥ 0.7 AND has at least
  • one evidence_assessment debate with verdict supports.
  • Snapshot is additive — never overwrite, so we keep a true
  • longitudinal record.
  • Heatmap rendering reuses q-live-market-liquidity-heatmap
  • color-scale logic.
  • Worst-cohort diagnosis prompt: feed prompt_evolution
  • history for that week + the cohort's hypothesis statements
    to an LLM and ask "what went wrong" — short summary stored
    on hypothesis_cohort_metrics.diagnosis_md.

    Dependencies

    • scidex.atlas.supersede_resolver — supersede detection.
    • scidex.senate.prompt_evolution — diagnosis source.
    • q-time-hypothesis-history-viewer — rich per-hypothesis
    history; cohort drill-down links here.

    Work Log

    Tasks using this spec (1)
    [Senate] Hypothesis cohort tracker - survival analysis by bi
    File: q-time-hypothesis-cohort-tracker_spec.md
    Modified: 2026-05-01 20:13
    Size: 4.0 KB