[Exchange] Backfill hypothesis stats

← All Specs

[Exchange] Backfill hypothesis stats

Task ID: f41d7c86-6422-424f-95b9-552df99f188b Layer: Exchange (Quest 2) Priority: P82

Goal

Backfill tokens_used, kg_edges_generated, and citations_count for all hypotheses with missing data.

Implementation

Created scripts/backfill_hypothesis_stats.py which computes:

  • tokens_used: From debate_rounds token tracking, or estimated from transcript length / sibling count
  • kg_edges_generated: From knowledge_edges matched by hypothesis ID, analysis_id, or target gene name
  • citations_count: From PMID/DOI references in description, evidence_for, and evidence_against fields (handles both text and JSON formats)

Results

Before:

  • tokens_used: 118/191 (62%)
  • kg_edges_generated: 7/191 (4%)
  • citations_count: 131/191 (69%)

After:
  • tokens_used: 184/191 (96%)
  • kg_edges_generated: 181/191 (95%)
  • citations_count: 191/191 (100%)

Work Log

  • 2026-04-02 14:38 UTC: Started. Analyzed DB schema.
  • 2026-04-02 14:50 UTC: Wrote and ran backfill script. All metrics significantly improved.
  • 2026-04-16 15:57 UTC: Re-verified task. DB now has 631 hypotheses. Ran updated backfill script.
- Before: tokens=190, kg_edges=260, citations=478
- After: tokens=525, kg_edges=523, citations=540
- New commit: 9622f95d0 [Exchange] Add hypothesis stats backfill script — tokens, edges, citations [task:f41d7c86-6422-424f-95b9-552df99f188b]
- Pushed to gh remote.
  • 2026-04-18 12:30 UTC: Reopened task - audit found work never landed on main. Verified script missing from main despite commit 9622f95d0 existing in orphan branch. Rebuilt script on current main (ba324f9e0). DB has 697 hypotheses.
- Before: tokens=678, kg_edges=530, citations=593
- After: tokens=678, kg_edges=530, citations=593 (already populated from prior runs)
- New commit: 497fd264d [Exchange] Add hypothesis stats backfill script — tokens, edges, citations [task:f41d7c86-6422-424f-95b9-552df99f188b]
- Pushed via force-with-lease to update orphan branch.

Tasks using this spec (1)
[Exchange] Backfill tokens_used, kg_edges_generated, citatio
Exchange done P82
File: f41d7c86_642_spec.md
Modified: 2026-05-01 20:13
Size: 2.0 KB