[Agora] Figure-evidence debates - does this figure support the claim that cites it done

← Artifact Debates
New figure_evidence_audit debate kind; weights citing-claim artifact_links by verdict and updates figure quality_score.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27
[Agora] Figure-evidence debates: new figure_evidence_audit debate kind (api.py, agent.py) [task:fc5d0624-c3e5-438f-a34f-0d5b7ae64f5a] (#675)2026-04-27
Spec File

Goal

Figures (artifact_type='figure', registered in scidex/atlas/artifact_registry.py)
are routinely cited as evidence in wiki pages and hypotheses without any check
that the figure actually demonstrates the cited claim. Add a focused
debate kind figure_evidence_audit that pits an evidence_auditor persona
against an assumption_challenger persona, with the figure as target_artifact and the citing claim as the question. Output is an artifact_links row from the figure to the claim with strength weighted by
debate verdict.

Acceptance Criteria

☐ Add new debate_type='figure_evidence_audit' to the existing enum and
to api.py:15526 valid types.
☐ CLI: python3 -m scidex.agora.figure_evidence_debate --top-n 10
--dry-run selects figures cited by ≥3 hypotheses or wiki pages with
no recent audit.
☐ For each selected figure, generate a debate session with personas
[evidence_auditor, assumption_challenger], num_rounds=3, question
auto-built as: "Does this figure provide adequate evidence for the
claim '<top citing claim>'?".
☐ Debate verdict (parsed from synthesis round) sets
artifact_links.strength between figure and claim:
supports → 0.8, partial → 0.5, does_not_support → 0.1.
☐ Verdict also flips the figure's quality_score by ±0.05 within bounds.
☐ Smoke: 3 figures audited; 3 sessions in debate_sessions; 3+
artifact_links rows updated.

Approach

  • New module scidex/agora/figure_evidence_debate.py mirroring the
  • notebook scheduler.
  • Extend the synthesis-round parser (existing in
  • scidex.agora.synthesis_engine) to extract a verdict field.
  • Persist verdict, update links and quality.
  • Surface the audit history on /figure/{id} viewer (small badge: "Last
  • audit: supports / partial / does_not_support").

    Dependencies

    • agr-ad-01-TARG — debate targeting.
    • scidex.agora.synthesis_engine — verdict extraction.

    Work Log

    2026-04-27 — Implementation complete [task:fc5d0624-c3e5-438f-a34f-0d5b7ae64f5a]

    • Created scidex/agora/figure_evidence_debate.py: CLI module with --top-n/--dry-run, three-round audit debate (evidence_auditor → assumption_challenger → evidence_auditor synthesis), verdict→strength mapping, quality_score ±0.05 update
    • Added figure_evidence_audit case to agent.py:_get_artifact_personas (personas: evidence_auditor, assumption_challenger, evidence_auditor)
    • Added audit history badge to /artifact/{id} detail page in api.py (figure artifacts only): queries most recent figure_evidence_audit session, extracts verdict, renders colored badge
    • Smoke test: 3/3 figures audited; 3 debate_sessions rows (fea-6a076c941f7f, fea-27014558ee0c, fea-679d8d152a8c); 3 artifact_links rows (strength 0.5/0.5/0.1); quality_score decreased by 0.05 for does_not_support verdict

    Sibling Tasks in Quest (Artifact Debates) ↗