agr-ad-01-TARG registered three dataset-specific personas
(provenance_auditor, bias_detector, schema_validator) but no driver
schedules debates against datasets. Datasets are uniquely high-leverage
artifacts — every downstream notebook inherits their flaws. Build the
scheduler that selects datasets by usage (count of notebooks/hypotheses
that cite them) and queues methodology_challenge debates with the
three-persona panel.
python3 -m scidex.agora.dataset_audit_scheduler --top-by usageartifacts WHERE artifact_type INartifact_links WHERE link_type IN
('derives_from','cites') to count usage.
debate_sessions WHERE target_artifact_type IN
('dataset','tabular_dataset')).
debate_type='methodology_challenge'.
{provenance_grade, bias_findings[], schema_issues[]}, persisted toartifact_quality_profile (table from agr-ad-05-PROF).
scidex/agora/dataset_audit_scheduler.py.queue_debate for triggering.synthesis_engine parser if needed to capture the structuredagr-ad-01-TARG and agr-ad-05-PROF (artifact_quality_profile_dashboard).Started: Read AGENTS.md, spec, and related code (notebook_debate_scheduler.py, queue_debate, synthesis_engine)
Investigation findings:
universal_artifacts has no persona rows (0 found), so personas are referenced by slug name in debate_sessions.personas_used JSONBartifact_quality_profile table does not exist; created dataset_audit_quality_profiles table inline in the moduleartifact_links.link_type includes derives_from, cites, supports, produces for usage countingqueue_debate() from scidex_orchestrator.py handles session pre-recording with status='scheduled'scidex/agora/dataset_audit_scheduler.py (560 lines):DatasetCandidate dataclassselect_top_datasets(): joins artifacts with artifact_links, excludes 90-day cooldown via subqueryqueue_dataset_audits(): calls queue_debate() with [provenance_auditor, bias_detector, schema_validator], num_rounds=4, debate_type=methodology_challengeensure_quality_profile_table(): creates dataset_audit_quality_profiles if absentupsert_quality_profile(): writes structured assessment {provenance_grade, bias_findings[], schema_issues[]}process_completed_audits(): scans completed sessions, parses synthesis round, upserts profiles_parse_synthesis_assessment(): extracts assessment from transcript JSON or rounds--top-by usage, --top-n, --dry-run, --process-onlyTesting:
--dry-run --top-n 8: 8 candidates shown, sorted by usage → quality_score--top-n 3: 3 debates queued, 3 debate_sessions rows created (status=scheduled), 3 knowledge_gap rows created--process-only: 0 profiles written (debates not yet complete)scidex/agora/dataset_audit_scheduler.py (new, 560 lines)Commit: c5448c049 — [Agora] Add dataset_audit_scheduler: provenance/bias panel for top datasets [task:191d0752-9274-4b86-9a1e-c9408caada8a]
python3 -m scidex.agora.dataset_audit_scheduler --top-by usage --top-n 8 --dry-run.artifacts with artifact_links (derives_from, cites) to count usage.dataset_audit_quality_profiles table.