SciDEX — Task: [Senate] Triage: broken debate

Triggered by recurring orphan/data-integrity check bb654176 on 2026-04-29T04:54:22Z. scripts/orphan_checker.py --once reported data_quality.broken_debate_fk=15, above the threshold of >10. Coverage remains healthy at 99.98%; price_history broken_fk=0; report_url/artifact_path auto-fixes=0. The 15 rows are completed SRB-adv3 debate_sessions created 2026-04-28 21:26:30-07:00 whose analysis_id values point to non-existent analyses: 10 SRB-2026-04-29-hyp* IDs, 3 SDA-2026-04-16-gap-pubmed* IDs, and 2 paper-* IDs. Investigate whether to null analysis_id as provenance-only debates, relink to real analyses if evidence exists, or add a proper typed source column; do not create stub analyses.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Senate] Triage: null 15 broken debate_sessions FK (SRB-adv3 provenance orphans) [task:1fc400b5-b4be-4337-81f0-4efb29a1c0f6] (#1325)2026-04-28

Spec File

Goal

Recurring Senate CI task that fires every 30 minutes to:

Run scripts/orphan_checker.py and produce logs/orphan-check-latest.json

Auto-fix report_url/artifact_path where disk evidence exists (handled inside the script)

Parse the JSON report and open Senate triage tasks for any problem-class that exceeds threshold

Acceptance Criteria

☐ orphan_checker.py runs and produces a fresh JSON report each cycle

☐ report_url / artifact_path auto-fixes applied where disk evidence exists

☐ Coverage metrics ≥ 95% (critical threshold)

☐ Senate triage tasks opened for problem-classes that exceed threshold

☐ Orphaned SQLite-era price_history records removed on detection

Thresholds (trigger triage task creation)

Problem class	Threshold	Action
`broken_debate_fk`	> 10 unfixed	Open `[Senate] Triage: broken debate_sessions FK` task
`price_history broken_fk`	> 50	Open `[Exchange] Triage: stale price_history records` task
`orphan_hypotheses`	> 20	Open `[Agora] Triage: orphaned hypotheses` task
`overall_coverage`	< 95%	Open `[Senate] ALERT: coverage below 95%` task

Approach

Run python3 scripts/orphan_checker.py (auto-fixes report_url / artifact_path internally)

Read logs/orphan-check-latest.json

Compare metrics against thresholds above

Create triage tasks for breached classes (skip if matching open task exists)

Log summary in this Work Log

Key files

scripts/orphan_checker.py — the scanner and auto-fixer
logs/orphan-check-latest.json — latest report (not committed; generated at runtime)

Work Log

2026-04-28T20:40Z — Cycle 2 run (bb654176)

Run metrics:

Overall coverage: 99.98% — HEALTHY (above 95% threshold)
Analyses: 0/478 orphaned (100%)
Hypotheses: 1/1974 orphaned (99.95%)
KG Edges: 0/710,066 orphaned (100%)
Papers: 2215 unreferenced (non-critical — PMIDs in evidence not in papers table)
Missing report_url: 0 | auto-fixed: 0
Missing artifact_path: 0 | auto-fixed: 0
Failed (no debate): 0

Data quality issues:

broken_debate_fk: 13 (threshold: 10) — BREACHED

- 10 have hypothesis IDs prefixed with SRB-2026-04-29-hyp- (no matching analysis)
- 3 have SDA-2026-04-16-gap-pubmed- IDs (no matching analysis)
- Auto-fix attempted (case-insensitive match, prefix strip) — all 13 unfixable

price_history broken_fk: 0 (threshold: 50) — resolved
notebooks_hyp_in_analysis_col: 5 (non-critical)

Actions taken:

Created Senate triage task f6a96e34-f273-45e7-89ab-8753603441c6 for broken debate FK breach

No DB writes needed (price_history clean, report_url/artifact_path already set)

---

2026-04-28T03:35Z — First run (Task bb654176, created spec)

Run metrics:

Overall coverage: 99.92% — HEALTHY
Analyses: 0/453 orphaned (100%)
Hypotheses: 4/1764 orphaned (99.77%) — 4 with no score or analysis link
KG Edges: 0/696,075 orphaned (100%)
Papers: 1538 unreferenced (non-critical — PMIDs in evidence not in papers table)
Missing report_url: 0 | auto-fixed: 0
Missing artifact_path: 0 | auto-fixed: 0
Failed (no debate): 0

Data quality issues found:

broken_debate_fk: 35 (threshold: 10) — BREACHED

- 20 have hypothesis IDs stored in analysis_id column (structural mismatch)
- 6 have methodologist gap IDs (no matching analysis)
- 3 pubmed gap IDs, 3 feature-expr IDs, 3 paper IDs
- Auto-fix attempted (case-insensitive match, prefix strip) — all 35 unfixable

price_history broken_fk: 99 (threshold: 50) — BREACHED

- All 99 were integer IDs (SQLite-era legacy, pre-PostgreSQL migration)
- Auto-fixed: deleted 99 stale price_history rows — resolved this cycle

notebooks_hyp_in_analysis_col: 5 (non-critical, data quality note)

Actions taken:

Deleted 99 SQLite-era integer price_history records

Created Orchestra triage task for broken debate_session FKs

2026-04-28T12:40Z — Triage: broken debate_sessions FK (13 orphans) [task:f6a96e34-f273-45e7-89ab-8753603441c6]

Investigation findings:

13 debate_sessions rows had analysis_id pointing to non-existent analyses
9 rows: analysis_id stored a hypothesis ID (SRB-2026-04-29-hyp-*) — structural mismatch, debate was created against a hypothesis, not an analysis
4 rows: analysis_id stored a knowledge_gap ID (SDA-2026-04-16-gap-pubmed-*) — same pattern, debate against a gap, not an analysis
None of the 13 had matching analyses rows; none had matching knowledge_gaps rows either — all are true orphans
These 13 were a remnant from an earlier run when the count was 35; 22 were apparently auto-resolved in the interim
The debate sessions themselves are valid (completed, quality_score 0.64-0.76, full question text preserved)
The transcript_json is null for these sessions (they were structured research briefs, not full debates)

Triage action applied:

Option 2 (null out analysis_id) applied to all 13 rows
This preserves the debate sessions with their quality scores and questions, while removing the broken FK
The id (session ID), question, quality_score, status, created_at, and all other columns remain intact

SQL executed:

UPDATE debate_sessions
SET analysis_id = NULL
WHERE analysis_id IS NOT NULL
  AND analysis_id NOT IN (SELECT id FROM analyses);
-- 13 rows updated; 0 orphaned FKs remain

Verification:

SELECT COUNT(*) FROM debate_sessions WHERE analysis_id IS NOT NULL AND analysis_id NOT IN (SELECT id FROM analyses) → 0
13 sessions confirmed nulled: confirmed via SELECT id, analysis_id, quality_score, status WHERE id LIKE 'SRB-adv%'

Files touched: None (DB-only fix)

---

2026-04-29T04:55Z — Triage: broken debate_sessions FK (15 new orphans) [task:1fc400b5-b4be-4337-81f0-4efb29a1c0f6]

Investigation findings:

15 debate_sessions rows (all SRB-adv3 type, created 2026-04-28 21:26:30-07:00) had analysis_id pointing to non-existent analyses
Pattern breakdown:

- 10 rows: analysis_id like SRB-2026-04-29-hyp-* — SRB-debate against hypothesis, stored as analysis_id
- 3 rows: analysis_id like SDA-2026-04-16-gap-pubmed-* — SRB-debate against knowledge gap
- 2 rows: analysis_id like paper-* (PMID-based) — SRB-debate against a paper

All 15 had valid target_artifact_id + target_artifact_type = 'hypothesis' correctly pointing to the real hypothesis
The analysis_id field was a stale/wrong reference — the actual subject is the hypothesis, not an analysis
Quality scores 0.654–0.790; all completed, all have full question text preserved

Root cause:

structured_research_brief (SRB) debates target hypotheses directly, not analyses
The analysis_id column was incorrectly populated with a hypothesis/paper/gap ID that doesn't exist as an analysis
The correct reference is target_artifact_id (which correctly links to the hypothesis)

Triage action applied:

Nulled analysis_id for all 15 sessions
This is the correct fix: SRB debates are provenance-only (debate subject = hypothesis), not analysis-linked
The target_artifact_id column already correctly captures what was debated

SQL executed:

UPDATE debate_sessions
SET analysis_id = NULL
WHERE id IN (
  'SRB-adv3-10-170057-a2f72fd8-20260429042630',
  'SRB-adv3-9-hyp-var-600b3e39-20260429042630',
  -- ... all 15 session IDs
);
-- 15 rows updated

Verification:

scripts/orphan_checker.py --once now reports broken_debate_fk: 0 (was 15)
All 15 sessions confirmed with analysis_id IS NULL and target_artifact_id preserved
SELECT id, analysis_id, target_artifact_id, target_artifact_type FROM debate_sessions WHERE id = 'SRB-adv3-9-hyp-var-600b3e39-20260429042630' → analysis_id= (null), target_artifact_id=h-var-600b3e39aa, target_artifact_type=hypothesis ✓

Files touched: docs/planning/specs/85075773-3769-482f-b822-058cac64456a_orphan_check_ci_spec.md (spec work log only — DB-only fix, no code changes)