SciDEX — Task: [Agora] Figure-evidence debates

New figure_evidence_audit debate kind; weights citing-claim artifact_links by verdict and updates figure quality_score.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27

[Agora] Figure-evidence debates: new figure_evidence_audit debate kind (api.py, agent.py) [task:fc5d0624-c3e5-438f-a34f-0d5b7ae64f5a] (#675)2026-04-27

Spec File

Goal

Figures (artifact_type='figure', registered in scidex/atlas/artifact_registry.py)
are routinely cited as evidence in wiki pages and hypotheses without any check
that the figure actually demonstrates the cited claim. Add a focused
debate kind figure_evidence_audit that pits an evidence_auditor persona
against an assumption_challenger persona, with the figure as target_artifact and the citing claim as the question. Output is an artifact_links row from the figure to the claim with strength weighted by
debate verdict.

Acceptance Criteria

☐ Add new debate_type='figure_evidence_audit' to the existing enum and

to api.py:15526 valid types.

☐ CLI: python3 -m scidex.agora.figure_evidence_debate --top-n 10


      --dry-run

selects figures cited by ≥3 hypotheses or wiki pages with
no recent audit.

☐ For each selected figure, generate a debate session with personas

[evidence_auditor, assumption_challenger], num_rounds=3, question
auto-built as: "Does this figure provide adequate evidence for the
claim '<top citing claim>'?".

☐ Debate verdict (parsed from synthesis round) sets

artifact_links.strength between figure and claim:
supports → 0.8, partial → 0.5, does_not_support → 0.1.

☐ Verdict also flips the figure's quality_score by ±0.05 within bounds.

☐ Smoke: 3 figures audited; 3 sessions in debate_sessions; 3+

artifact_links rows updated.

Approach

New module scidex/agora/figure_evidence_debate.py mirroring the

notebook scheduler.

Extend the synthesis-round parser (existing in

scidex.agora.synthesis_engine) to extract a verdict field.

Persist verdict, update links and quality.

Surface the audit history on /figure/{id} viewer (small badge: "Last

audit: supports / partial / does_not_support").

Dependencies

agr-ad-01-TARG — debate targeting.
scidex.agora.synthesis_engine — verdict extraction.

Work Log

2026-04-27 — Implementation complete [task:fc5d0624-c3e5-438f-a34f-0d5b7ae64f5a]

Created scidex/agora/figure_evidence_debate.py: CLI module with --top-n/--dry-run, three-round audit debate (evidence_auditor → assumption_challenger → evidence_auditor synthesis), verdict→strength mapping, quality_score ±0.05 update
Added figure_evidence_audit case to agent.py:_get_artifact_personas (personas: evidence_auditor, assumption_challenger, evidence_auditor)
Added audit history badge to /artifact/{id} detail page in api.py (figure artifacts only): queries most recent figure_evidence_audit session, extracts verdict, renders colored badge
Smoke test: 3/3 figures audited; 3 debate_sessions rows (fea-6a076c941f7f, fea-27014558ee0c, fea-679d8d152a8c); 3 artifact_links rows (strength 0.5/0.5/0.1); quality_score decreased by 0.05 for does_not_support verdict