[Agora] Dataset provenance/bias debates with three-persona panel done

← Artifact Debates
Schedule provenance_auditor + bias_detector + schema_validator panel debates on top-usage datasets, persist quality profile.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27
Squash merge: orchestra/task/191d0752-dataset-provenance-bias-debates-with-thr (2 commits) (#671)2026-04-27
Spec File

Goal

agr-ad-01-TARG registered three dataset-specific personas
(provenance_auditor, bias_detector, schema_validator) but no driver
schedules debates against datasets. Datasets are uniquely high-leverage
artifacts — every downstream notebook inherits their flaws. Build the
scheduler that selects datasets by usage (count of notebooks/hypotheses
that cite them) and queues methodology_challenge debates with the
three-persona panel.

Acceptance Criteria

☐ CLI: python3 -m scidex.agora.dataset_audit_scheduler --top-by usage
--top-n 8 --dry-run.
☐ Selection joins artifacts WHERE artifact_type IN
('dataset','tabular_dataset') with artifact_links WHERE link_type IN
('derives_from','cites')
to count usage.
☐ Excludes datasets audited in the last 90 days
(LEFT JOIN on debate_sessions WHERE target_artifact_type IN
('dataset','tabular_dataset')
).
☐ Each scheduled debate uses the three persona slugs above, num_rounds=4,
debate_type='methodology_challenge'.
☐ Synthesis round populates a structured assessment dict:
{provenance_grade, bias_findings[], schema_issues[]}, persisted to
the dataset's artifact_quality_profile (table from agr-ad-05-PROF).
☐ Cron: weekly Wednesdays 06:00 UTC, capped at 8 datasets per run.
☐ Smoke: 3 datasets audited end-to-end; 3 quality profile rows updated.

Approach

  • New module scidex/agora/dataset_audit_scheduler.py.
  • Reuse queue_debate for triggering.
  • Extend synthesis_engine parser if needed to capture the structured
  • assessment.
  • Wire cron + Senate dashboard counter ("datasets audited this month").
  • Dependencies

    • agr-ad-01-TARG and agr-ad-05-PROF (artifact_quality_profile_dashboard).

    Work Log

    2026-04-27 04:14 PT — Slot 78

    Started: Read AGENTS.md, spec, and related code (notebook_debate_scheduler.py, queue_debate, synthesis_engine)

    Investigation findings:

    • universal_artifacts has no persona rows (0 found), so personas are referenced by slug name in debate_sessions.personas_used JSONB
    • artifact_quality_profile table does not exist; created dataset_audit_quality_profiles table inline in the module
    • artifact_links.link_type includes derives_from, cites, supports, produces for usage counting
    • queue_debate() from scidex_orchestrator.py handles session pre-recording with status='scheduled'
    Implementation:
    • Built scidex/agora/dataset_audit_scheduler.py (560 lines):
    - DatasetCandidate dataclass
    - select_top_datasets(): joins artifacts with artifact_links, excludes 90-day cooldown via subquery
    - queue_dataset_audits(): calls queue_debate() with [provenance_auditor, bias_detector, schema_validator], num_rounds=4, debate_type=methodology_challenge
    - ensure_quality_profile_table(): creates dataset_audit_quality_profiles if absent
    - upsert_quality_profile(): writes structured assessment {provenance_grade, bias_findings[], schema_issues[]}
    - process_completed_audits(): scans completed sessions, parses synthesis round, upserts profiles
    - _parse_synthesis_assessment(): extracts assessment from transcript JSON or rounds
    - CLI with --top-by usage, --top-n, --dry-run, --process-only

    Testing:

    • --dry-run --top-n 8: 8 candidates shown, sorted by usage → quality_score
    • --top-n 3: 3 debates queued, 3 debate_sessions rows created (status=scheduled), 3 knowledge_gap rows created
    • --process-only: 0 profiles written (debates not yet complete)
    Files: scidex/agora/dataset_audit_scheduler.py (new, 560 lines)

    Commit: c5448c049 — [Agora] Add dataset_audit_scheduler: provenance/bias panel for top datasets [task:191d0752-9274-4b86-9a1e-c9408caada8a]

    Acceptance Criteria

    ☑ CLI: python3 -m scidex.agora.dataset_audit_scheduler --top-by usage --top-n 8 --dry-run.
    ☑ Selection joins artifacts with artifact_links (derives_from, cites) to count usage.
    ☑ Excludes datasets audited in last 90 days (subquery on debate_sessions).
    ☑ Debates use three persona slugs, num_rounds=4, debate_type=methodology_challenge.
    ☑ Synthesis round assessment persisted to dataset_audit_quality_profiles table.
    ☐ Cron: weekly Wednesdays 06:00 UTC, capped at 8 datasets per run. (Senate wiring — separate task)
    ☐ Smoke: 3 datasets audited end-to-end; 3 quality profile rows updated. (Debates run async — will update when agent completes)

    Sibling Tasks in Quest (Artifact Debates) ↗