SciDEX — Task: [Atlas] Score 30 open knowledge gaps with quality

2351 open knowledge gaps lack gap_quality_score values. Gap quality scores drive prioritization, missions, and debate routing. ## Acceptance criteria (recommended — see 'Broader latitude' below) - 30 open gaps have gap_quality_score between 0 and 1 - Specificity, evidence coverage, hypothesis density, debate depth, and actionability are populated where possible - Remaining unscored open gap count is <= 2321 ## Before starting 1. Read this task's spec file and check for duplicate recent work. 2. Evaluate whether the gap and acceptance criteria target the right problem. If you see a better framing, propose it in your work log and — if appropriate — reframe before executing. 3. Check adjacent SciDEX layers (Agora, Atlas, Forge, Exchange, Senate): does your work need cross-linking? Do you see a pattern spanning multiple gaps that could become a platform improvement? ## Broader latitude (explicitly welcome) You are a scientific discoverer, not just a task executor. Beyond the acceptance criteria above, you're invited to: - **Question the framing.** If the gap's premise is weak, the acceptance criteria miss the point, or the methodology is the wrong frame entirely — say so. Propose a reframe with justification. - **Propose structural improvements.** If you notice a recurring pattern across tasks that would benefit from a new tool, scoring dimension, debate mode, or governance rule — flag it in your work log with a concrete proposal (file a Senate task or add to the Forge tool backlog as appropriate). - **Propose algorithmic improvements.** If the scoring algorithm, ranking method, matching heuristic, or quality rubric seems misaligned with the data you're seeing — document a specific improvement with before/after examples. - **Strengthen artifacts beyond the minimum.** Iterate toward a SOTA-quality notebook/analysis/benchmark rather than the lowest bar that passes the checks. Fewer high-quality artifacts beat many shallow ones. Document each such contribution in your commit messages (``[Senate] proposal:`` / ``[Forge] tool-sketch:`` / ``[Meta] algorithm-critique:``) so operators can triage.

Git Commits (2)

[Atlas] Score 30 more open knowledge gaps; gap 2038→2008 [task:2f2a61c5-6996-4244-8705-35d1b84416c9] (#1194)2026-04-28

[Atlas] Work log: score 30 more open gaps [task:2f2a61c5-6996-4244-8705-35d1b84416c9] (#1189)2026-04-28

Spec File

Goal

Score open knowledge gaps with the landscape-gap quality rubric so SciDEX can prioritize the most actionable and scientifically useful gaps. Component scores should reflect specificity, evidence coverage, hypothesis density, debate depth, and actionability.

Acceptance Criteria

☑ The selected open gaps have gap_quality_score values between 0 and 1

☑ Component scores are populated where the schema supports them

☑ Scores are based on surrounding evidence and artifact coverage, not placeholders

☑ The before/after unscored-gap count is recorded

Approach

Query open gaps without gap_quality_score, prioritizing high priority_score and importance_score.

Inspect adjacent hypotheses, debates, papers, KG edges, and datasets.

Apply the landscape-gap rubric and persist the composite plus component scores.

Verify score ranges and count reduction.

Dependencies

quest-engine-ci - Generates this task when queue depth is low and unscored gaps exist.

Dependents

Gap routing, missions, debate scheduling, and Exchange funding depend on quality-scored gaps.

Work Log

2026-04-28 01:43 UTC — Slot 76 (minimax) — iteration 2

Task still current: live PostgreSQL has 2,038 open knowledge gaps with gap_quality_score IS NULL before this run; 1,085 open gaps already have in-range scores.
Ran python3 score_30_gaps.py --limit 30 --task-id 2f2a61c5-6996-4244-8705-35d1b84416c9 to score 30 open gaps; all 30 rows updated with composite + component scores.
Before: 2,038 open gaps unscored, 1,115 open gaps scored in range.
After: 2,008 open gaps unscored, 1,145 open gaps scored in range; net 30 new gaps scored this iteration.
Score range this run: composite scores in [0.635, 0.858]; all in [0,1]. No new low-quality gaps scored below 0.3, so low-quality triage remains 25 cumulative.
Verification: 30 scored, 0 failed; remaining_unscored=2008, scored_in_range=1145, low_quality_under_review=25.

2026-04-28 01:41 UTC — Slot 76 (minimax)

Task still current: live PostgreSQL has 2,038 open knowledge gaps with gap_quality_score IS NULL after this run; 1,115 open gaps already have in-range scores.
Ran python3 score_30_gaps.py --limit 30 to score 30 open gaps; all 30 rows updated with composite + component scores.
Before: 2,068 open gaps unscored, 1,085 open gaps scored in range.
After: 2,038 open gaps unscored, 1,115 open gaps scored in range; net 30 new gaps scored this iteration.
Score range this run: composite scores in [0.652, 0.825]; all in [0,1]. No new low-quality gaps scored below 0.3, so low-quality triage remains 25 cumulative.
Verification: 30 scored, 0 failed; remaining_unscored=2038, scored_in_range=1115, low_quality_under_review=25.

2026-04-28 08:20 UTC — Slot 54 (codex)

Task still current: live PostgreSQL has 2,168 open knowledge gaps with gap_quality_score IS NULL; 985 open gaps already have in-range scores.
Prior iterations scored only part of the backlog; this iteration scored the next 100 open unscored gaps ordered by priority/importance using the existing five-component landscape-gap rubric.
Structural note: 57 already-scored open gaps still have at least one missing component score, so a future hygiene task should backfill legacy component columns separately from this forward-scoring batch.
Before: 2,168 open gaps unscored, 985 open gaps scored in range.
After: 2,068 open gaps unscored, 1,085 open gaps scored in range; all 100 rows from this run have component scores populated.
Score range this run: composite scores in [0.627, 0.883]; all in [0,1]. No new low-quality gaps scored below 0.3, so low-quality triage remains 25 cumulative.
Verification: python3 -m py_compile score_30_gaps.py; python3 score_30_gaps.py --limit 1 --dry-run --task-id ad73537f-7d01-4ed4-99d0-acef69b7c7f1; PostgreSQL check over rows scored between 2026-04-28 01:18:00-07 and 2026-04-28 01:24:00-07 returned n=100, components_populated=100, rubric_match=100, remaining_unscored=2068.
Small scorer hygiene improvement: composite rounding now uses decimal half-up quantization and the JSON verification output accepts --task-id/ORCHESTRA_TASK_ID instead of emitting an obsolete hardcoded task ID.

2026-04-28 00:43 UTC — Slot 47 (claude-auto)

Task still current: 2,308 open knowledge gaps with gap_quality_score IS NULL before this run.
Applied two concurrent scoring rounds (100 gaps each) with python3 score_30_gaps.py --limit 100.
Before: 2,308 open gaps unscored, 845 open gaps scored.
After: 2,168 open gaps unscored, 985 open gaps scored; net 140 new gaps scored this iteration.
Score range this run: composite scores in [0.650, 0.823]; all in [0,1]. No gaps scored below 0.3 (low-quality triage total remains 25 cumulative).
Verification: 100 scored, 0 failed (both runs); remaining_unscored=2168, scored_in_range=985.

2026-04-28 00:11 UTC — Slot 43 (claude-auto)

Task still current: 2,408 open knowledge gaps with gap_quality_score IS NULL before this run.
Applied scoring to 100 open unscored gaps with python3 score_30_gaps.py --limit 100.
Before: 2,408 open gaps unscored, 745 open gaps scored.
After: 2,308 open gaps unscored, 845 open gaps scored; all 100 rows have scores in [0,1].
Score range this run: min=0.624, max=0.858. No gaps scored below 0.3 (low-quality triage total: 25 cumulative).
Verification: all 100 wrote successfully; remaining_unscored=2308, scored_in_range=845.

2026-04-27 04:56 UTC — Slot 54 (codex)

Task e48ce6aa remains current: live PostgreSQL has 2,527 open knowledge gaps with gap_quality_score IS NULL.
Sibling work (35e9639c, b993d7b3, plus prior 30-gap batches) covered only part of the backlog; this task will score the next 100 open unscored gaps.
Approach for this run: update the existing batch scorer to accept a configurable limit, align the composite with the task rubric (specificity, actionability, evidence each capped at 0.33), and mark gaps below 0.3 as quality_status='under_review' for closure/revision triage.
Applied scoring to 100 open unscored gaps with python3 score_30_gaps.py --limit 100.
Before: 2,527 open gaps unscored, 585 open gaps scored.
After: 2,427 open gaps unscored, 685 open gaps scored; 100/100 recent rows have scores in [0,1] and match specificity_score0.33 + actionability0.33 + evidence_coverage*0.33.
Low-quality triage: 2 gaps from this run scored below 0.3 and were marked quality_status='under_review' (gap-test-20260425-224949, gap-debate-20260426-011448-7c85f5dc).
Verification: python3 -m py_compile score_30_gaps.py; PostgreSQL count/rubric check over rows scored at 2026-04-26 14:54:00-07 returned n=100, min_score=0.066, max_score=0.924, rubric_match=100.

2026-04-21 14:27 UTC — Slot 0 (minimax:73)

Task started: needed to score 30 open gaps lacking gap_quality_score
Found gap_quality.py scoring module exists at scidex/agora/gap_quality.py but its get_db() helper uses broken SQLite shim
Wrote standalone score_30_gaps.py using direct psycopg connection
Applied 5-component rubric: specificity (heuristic gene/pathway match), evidence_coverage (paper count via specific terms), hypothesis_density (debate sessions), debate_depth (quality_score of linked debates), actionability (keyword heuristics)
30 gaps scored successfully; all scores in [0,1]
Before: 2858 unscored open gaps, 231 scored
After: 2828 unscored open gaps, 261 scored
Verification: acceptance criteria met
Committed and pushed

2026-04-22 16:45 UTC — Slot (minimax:73)

Task still necessary: 2558 unscored open gaps remain
Fixed two bugs in score_30_gaps.py: (1) compute_evidence_coverage had unused total_papers = float(total) line and was missing total_papers param; (2) compute_hypothesis_density used LOWER(evidence_for) LIKE on a jsonb column which fails on PostgreSQL — removed the jsonb path, kept only analysis_id FK lookup
Applied 5-dimension rubric: specificity (heuristic gene/pathway/cell-type match), evidence_coverage (paper count via specific term search), hypothesis_density (linked hypotheses + debate sessions), debate_depth (quality_score of linked debates), actionability (keyword heuristics)
30 gaps scored successfully; all composite scores in [0.205, 0.573], within [0,1] range
Before: 2558 unscored open gaps, 531 scored
After: 2528 unscored open gaps, 561 scored
Verification: acceptance criteria met
Committed and pushed

2026-04-21 21:36 UTC — Slot 73 (minimax:73)

Fix for merge gate rejection: hardcoded PostgreSQL password in _PG_DSN
Replaced import psycopg + _PG_DSN = "dbname=scidex user=scidex_app password=scidex_local_dev host=localhost" with from scidex.core.db_connect import get_connection
Updated _pg_connect() to call get_connection() which uses SCIDEX_PG_DSN environment variable with safe default
Verified: script runs successfully, 30 gaps scored with all scores in [0,1]
Force-pushed fix to replace rejected branch

Payload JSON

{
  "requirements": {
    "analysis": 7,
    "reasoning": 6
  },
  "max_iterations": 15
}

Sibling Tasks in Quest (Atlas) ↗

○[Atlas] Drug target therapeutic recommendations — generate actionable recs for 91 tier-1 neurodegeneration targetsP96

○[Atlas] Causal KG entity resolution — bridge 19K free-text causal edges to canonical KG entitiesP95

○[Atlas] Squad findings bubble-up driver (driver #20)P94

○[Atlas] Install Dolt server + migrate first dataset (driver #26)P92

○[Atlas] Dataset PR review & merge driver (driver #27)P92

○[Atlas] Wiki mermaid LLM regen — 50 pages/run, parallel agentsP92

○[Atlas] CI: Drive artifact folder migration backfillP92

○[Atlas] Unresolved causal edge triage — mine 12K stalled causal claims for cross-disease KG nodesP91

○[Atlas] Versioned tabular datasets — overall coordination questP90

○[Atlas] KG ↔ dataset cross-link driver (driver #30)P90

[Atlas] Score 30 open knowledge gaps with quality rubric done analysis:7 reasoning:6