Effort: thorough
Hypotheses are minted continuously by Theorist agents, but the
platform never asks "of the hypotheses minted in week N, how many
survived to month 6?" — i.e. retained composite_score ≥ threshold,
weren't superseded, weren't quietly archived. This is the
fundamental epistemic-quality question: are we generating durable
ideas or burning compute on noise?
Build a Hypothesis Cohort Tracker: group hypotheses by
creation week (the cohort), compute survival/verification/Elo-
retention curves over time, surface which cohorts produced the
most durable ideas, and feed the curves into the
q-epistemic-rigor quest as a quality KPI.
scidex/senate/hypothesis_cohorts.py:compute_cohort(creation_week: date) -> dict returns{cohort_size, survival_at: {30d: n, 90d: n, 180d: n,
365d: n}, verification_at: {...}, elo_p50_at: {...},
promoted_to_canonical: int, superseded: int,
archived: int} where survival meanscomposite_score ≥ 0.6 AND not superseded.recompute_all_cohorts() -> int walks every week fromhypothesis_cohort_metrics table.
hypothesis_cohort_metrics with(cohort_week, cohort_size, snapshot_at, age_days,
survivors, verified, mean_elo, median_elo, n_superseded,
n_archived, n_promoted) — one row per (cohort, snapshot)scidex-hypothesis-cohorts-weekly.timerGET /cohorts/hypotheses dashboard:prompt_evolution.py history.
GET /cohort/{week} per-cohort detail page:senate_metrics(metric=
'hypothesis_cohort_survival_180d', value, week) so theq-epistemic-rigor quest can alert on regressions.
scidex.atlas.supersede_resolver already in the codebasesupersede_resolver.py for the canonical helper).
composite_score ≥ 0.7 AND has at leastevidence_assessment debate with verdict supports.
q-live-market-liquidity-heatmapprompt_evolutionhypothesis_cohort_metrics.diagnosis_md.scidex.atlas.supersede_resolver — supersede detection.scidex.senate.prompt_evolution — diagnosis source.q-time-hypothesis-history-viewer — rich per-hypothesis