[Atlas] Score wiki quality on 20 pages (daily heuristics) done analysis:6 safety:6

← Atlas
Sample 20 wiki pages, score on 6 heuristics (intro quality, prose ratio, depth, crosslinks, completeness) via LLM. Store in wiki_quality_scores. Run: python3 /home/ubuntu/scidex/scripts/wiki_quality_scorer.py --n 20. Then check /senate/wiki-quality dashboard. ## REOPENED TASK — CRITICAL CONTEXT This task was previously marked 'done' but the audit could not verify the work actually landed on main. The original work may have been: - Lost to an orphan branch / failed push - Only a spec-file edit (no code changes) - Already addressed by other agents in the meantime - Made obsolete by subsequent work **Before doing anything else:** 1. **Re-evaluate the task in light of CURRENT main state.** Read the spec and the relevant files on origin/main NOW. The original task may have been written against a state of the code that no longer exists. 2. **Verify the task still advances SciDEX's aims.** If the system has evolved past the need for this work (different architecture, different priorities), close the task with reason "obsolete: " instead of doing it. 3. **Check if it's already done.** Run `git log --grep=''` and read the related commits. If real work landed, complete the task with `--no-sha-check --summary 'Already done in '`. 4. **Make sure your changes don't regress recent functionality.** Many agents have been working on this codebase. Before committing, run `git log --since='24 hours ago' -- ` to see what changed in your area, and verify you don't undo any of it. 5. **Stay scoped.** Only do what this specific task asks for. Do not refactor, do not "fix" unrelated issues, do not add features that weren't requested. Scope creep at this point is regression risk. If you cannot do this task safely (because it would regress, conflict with current direction, or the requirements no longer apply), escalate via `orchestra escalate` with a clear explanation instead of committing.

Completion Notes

BLOCKED by DB corruption. All SQLite queries return "database disk image is malformed" (error 11). This is a systemic issue (not task-specific) that affects the entire platform including /api/status, /senate/wiki-quality, and the wiki_quality_scores table. The DB was repaired at 7fde08b81 (Apr 18) but corruption recurred by Apr 19 04:17 as documented in task e0e384c6b. The wiki_quality_scorer.py script and wiki_quality_scores table both exist correctly on origin/main — the task's code components are complete and correct. Only the live DB restore can unblock this task. Escalate to dedicated DB restore/repair task.

Git Commits (2)

[Atlas] Update wiki quality quest work log [task:81759c6c-1dee-4e9a-a16d-a37b71fd7b02]2026-04-10
[Atlas] Fix wiki_quality_scorer type error - cast LLM scores to float [task:81759c6c-1dee-4e9a-a16d-a37b71fd7b02]2026-04-10
Spec File

Quest: Wiki Quality — Prose, Depth, and Wikipedia Parity

Layer: Atlas Priority: P90 Status: active

Problem Statement

SciDEX wiki pages currently read like auto-generated NeurWiki stubs rather than authoritative
scientific references. Key failure modes:

  • Bulleted-list bias — content is fragmented into bullet points and tables instead of
  • flowing prose that builds understanding paragraph by paragraph
  • Missing introductions — pages jump straight into infoboxes or section headers with no
  • orienting paragraph that explains what the entity is, why it matters, and how it fits into
    the broader research landscape
  • Thin explanatory depth — facts are stated without mechanism, context, or the "so what"
  • that distinguishes a reference from a fact sheet
  • Weak cross-linking — KG relationships, related hypotheses, analyses, and wiki pages are
  • not woven into the prose
  • No quality feedback loop — improvements are one-time scripts, not a continuous process
  • Quality Standard

    The target is gold-standard scientific wiki pages comparable to the best Wikipedia
    neuroscience/molecular biology pages (e.g., wikipedia.org/wiki/Tau_protein,
    wikipedia.org/wiki/Alzheimer%27s_disease) and high-quality scientific reviews. Characteristics:

    • Opens with 2-4 sentences of plain-English context: what is it, why it matters in
    neurodegeneration, what distinguishes it from similar entities
    • Prose flows: ideas connect via transitional sentences, not just bullets
    • Bullets and tables exist only for genuinely enumerable items (genetic variants, clinical
    criteria, trial phases) — never as a substitute for explanation
    • Each major section explains mechanism and significance, not just facts
    • Internal hyperlinks to related wiki pages, KG entities, hypotheses, and analyses are
    woven naturally into the prose
    • Word count: disease pages ≥ 2,000 words; gene/protein pages ≥ 800 words; mechanism
    pages ≥ 1,200 words (these are minima, not targets)

    Quality Heuristics (LLM-Evaluable)

    These heuristics replace templates. An LLM evaluator applies them to score any page on
    a 0–10 scale per dimension. The goal is calibrated judgment, not checkbox compliance.

    H1 — Introduction Quality (0–10)

    • 0: No introduction; page starts with infobox or ## Section
    • 3: One-sentence stub intro ("X is a gene involved in...")
    • 5: 2–3 sentences covering identity and role but no context
    • 7: Full paragraph explaining what, why it matters, and relationship to neurodegeneration
    • 10: Two paragraphs; establishes significance, distinguishes from similar entities, hints
    at open questions; a scientist could hand this to a non-specialist and they'd understand

    H2 — Prose vs. Structure Ratio (0–10)

    • 0: Page is > 80% bullet points and tables
    • 3: Bullets dominate; prose exists only as section headers
    • 5: Roughly equal prose and bullets; transitions are abrupt
    • 7: Most sections are prose paragraphs; bullets used only for enumerable items
    • 10: Flowing prose throughout; bullets/tables serve supporting roles; reads like a review article

    H3 — Explanatory Depth (0–10)

    • 0: Facts stated with no mechanism ("X is associated with AD")
    • 3: Some mechanism mentioned but not explained
    • 5: Mechanism described but without quantitative context or comparison
    • 7: Mechanism explained with supporting evidence (studies cited, numbers given)
    • 10: Mechanism, evidence, competing hypotheses, open questions, clinical significance all covered

    H4 — Cross-Link Density (0–10)

    • 0: No internal links; no connections to hypotheses, analyses, or KG
    • 3: A few links to related wiki pages, but no hypothesis/analysis connections
    • 5: Links to related entities and at least one hypothesis or analysis
    • 7: KG relationships surfaced as prose ("...consistent with the role of [ENTITY] in [PATH]")
    • 10: Rich weave of wiki links, KG context panels, linked hypotheses and analyses, and
    external links to papers/tools; the page functions as a navigation hub

    H5 — Wikipedia Parity (0–10)

    Score by comparison: load the equivalent Wikipedia page (if it exists) and assess whether
    SciDEX page is more or less comprehensive, more or less current, and better or worse for
    a neurodegeneration researcher specifically.
    • 0–3: Wikipedia is clearly better in depth, prose, and sourcing
    • 4–6: Roughly equivalent; SciDEX may add hypothesis/KG context but lacks depth elsewhere
    • 7–9: SciDEX is more comprehensive for neurodegeneration researchers; better cross-links
    • 10: SciDEX is the definitive reference; Wikipedia would cite us if it could

    H6 — Section Completeness (0–10)

    Disease pages should cover: epidemiology, pathophysiology, genetics, clinical features,
    biomarkers, therapeutics, open questions. Gene/protein pages: function, structure, expression,
    disease associations, variants, therapeutic relevance. Mechanism pages: molecular detail,
    circuit/systems context, evidence quality, therapeutic implications.
    • Score = (sections present with substantive content) / (sections expected for entity type)

    Implementation Architecture

    Core Scripts

    scripts/wiki_quality_scorer.py
    • Samples N pages (by entity type, weighted toward high-importance pages)
    • Applies all 6 heuristics via LLM (Claude claude-sonnet-4-6) with structured JSON output
    • Stores scores in wiki_quality_scores table
    • Generates a ranked improvement queue
    • Entry point: python3 wiki_quality_scorer.py [--entity-type TYPE] [--n 20] [--budget-tokens N]
    scripts/wiki_prose_improver.py
    • Takes a scored page below threshold on H1 or H2
    • Rewrites introduction and converts bullet-heavy sections to prose
    • Uses heuristic-guided prompts (not templates)
    • Compares word counts before/after; rejects regressions
    • Entry point: python3 wiki_prose_improver.py [--slug SLUG] [--batch N]
    scripts/wiki_crosslink_enricher.py
    • For a given page, queries KG for related entities (depth 2)
    • Finds linked hypotheses and analyses
    • Adds inline links and a "Related Research" section if not present
    • Entry point: python3 wiki_crosslink_enricher.py [--slug SLUG] [--batch N]
    scripts/wiki_wikipedia_parity.py
    • For a page, queries Wikipedia API for equivalent page
    • Asks LLM to identify sections/depth present in WP but missing from SciDEX
    • Returns a structured improvement plan
    • Entry point: python3 wiki_wikipedia_parity.py [--slug SLUG] [--entity-type TYPE]

    Database

    CREATE TABLE wiki_quality_scores (
        id INTEGER PRIMARY KEY,
        slug TEXT NOT NULL,
        scored_at TEXT DEFAULT datetime('now'),
        h1_intro REAL,         -- 0-10
        h2_prose_ratio REAL,   -- 0-10
        h3_depth REAL,         -- 0-10
        h4_crosslinks REAL,    -- 0-10
        h5_wikipedia REAL,     -- 0-10
        h6_completeness REAL,  -- 0-10
        composite_score REAL,  -- weighted average
        word_count INTEGER,
        issues_json TEXT,      -- ["missing_intro", "bullet_heavy", ...]
        improvement_plan TEXT, -- LLM-generated action items
        scorer_version TEXT
    );
    CREATE INDEX idx_wqs_slug ON wiki_quality_scores(slug);
    CREATE INDEX idx_wqs_composite ON wiki_quality_scores(composite_score);

    API Routes

    • GET /api/wiki/quality-scores?entity_type=X&limit=50&min_score=Y&max_score=Z
    • GET /api/wiki/quality-scores/<slug> — latest score for a page
    • GET /senate/wiki-quality — dashboard: score distribution, improvement queue, recent changes

    Feedback Loop

  • Score — sample 20 pages/run, score on 6 heuristics, store in wiki_quality_scores
  • Prioritize — sort by composite score × page_importance (disease/mechanism pages first)
  • Improve — prose_improver and crosslink_enricher run on lowest-scoring pages
  • Verify — re-score improved pages; track Δ score
  • Report — senate/wiki-quality shows trends over time
  • Recurring cadence:

    • Daily: score 20 pages (rotating coverage), improve 5 lowest-scoring
    • Weekly: full Wikipedia parity audit for top-50 disease/mechanism pages
    • On-demand: triggered by new page creation or major content update

    Acceptance Criteria

    wiki_quality_scores table and migration
    wiki_quality_scorer.py with all 6 heuristics
    wiki_prose_improver.py producing prose from bullet-heavy pages
    wiki_crosslink_enricher.py weaving KG/hypothesis links
    /senate/wiki-quality dashboard
    /api/wiki/quality-scores JSON endpoint
    wiki_wikipedia_parity.py working for top-50 disease pages
    ☐ Median composite score ≥ 6.0 across disease pages
    ☐ All disease pages: H1 intro score ≥ 6
    ☐ All disease pages: word count ≥ 2,000
    ☐ Recurring daily Orchestra tasks scheduled
    ☐ Feedback loop: Δ score tracked and reported

    Reference Quality Examples

    Pages to use as prompts when generating/improving content:

    Wikipedia gold-standard (neuroscience):

    • https://en.wikipedia.org/wiki/Tau_protein
    • https://en.wikipedia.org/wiki/Alzheimer%27s_disease
    • https://en.wikipedia.org/wiki/Alpha-synuclein
    • https://en.wikipedia.org/wiki/Amyloid_beta
    Target characteristics from these pages:
    • 3,000–8,000 words
    • First paragraph explains mechanism and significance in plain English
    • Tables used for structured data (genetic variants, clinical trials) only
    • Every claim linked or sourced
    • Sections flow; each ends by connecting to the next topic

    2026-04-10 14:30 UTC — Slot 0

    • Task: [36c43703-d777-442c-9b42-a02b1aa91bdf] Improve prose on 5 lowest-scoring wiki pages
    • Ran python3 scripts/wiki_prose_improver.py --batch 5
    • 4/5 pages improved (therapeutics-als-therapeutic-landscape: H1=6.5, H2=1.0 — only H2 below threshold, no intro fix needed):
    - mechanisms-non-invasive-brain-stimulation-cortico-basal-syndrome: +414 words (TMS + tDCS sections prose)
    - mechanisms-psp-pupillary-visual-dysfunction: H1 7.5→8, +336 words (Research Directions + Cross-References)
    - genes-p2rx5: H2 3.5→6, +179 words (Disease Associations + Animal Models)
    - diseases-als-genetic-variants: H2 3.5→8, +336 words (Recent Research + Major Causal Genes)
    • H1 ≥2 point improvement on mechanisms-psp-pupillary-visual-dysfunction (7.5→8)
    • All word counts increased (positive delta on all 4 pages)
    • Database verified: wiki_pages table updated, word_count fields confirm increases

    Work Log

    2026-04-10 13:58 UTC — Slot 0

    • Task: [fa9bd811-b084-48ec-ab67-85813e2e836e] Improve prose on diseases-fabry-disease
    • Ran python3 scripts/wiki_prose_improver.py --slug diseases-fabry-disease to rewrite intro and convert bullet-heavy sections
    • Page Improved: diseases-fabry-disease (disease, 3057 words)
    - Introduction rewritten: H1 3.0→8.0
    - Treatment and Management section: converted to 364 words of prose
    - Pathophysiology section: converted to 300 words of prose
    • Score improvement: composite 4.9→7.0 (H1 3→8, H2 3.5→8)
    • +264 words added to the page
    • Database verified: wiki_pages table word_count updated to 3057

    2026-04-10 15:30 UTC — Slot 54

    • Task: [36c43703-d777-442c-9b42-a02b1aa91bdf] Improve prose on 5 lowest-scoring wiki pages
    • Ran python3 scripts/wiki_prose_improver.py --batch 5 to rewrite intros and convert bullet-heavy sections
    • 5 Pages Improved:
    1. cell-types-microglia-batten-disease: H1=3.0→improved intro, +561 words
    2. diseases-alzheimers-genetic-variants: H1=4.0→improved intro, +516 words
    3. diseases-fabry-disease: H1=3.0→improved intro, +621 words
    4. cell-types-horizonal-limb-diagonal-band: H2=1.0→prose sections converted, +322 words
    5. diseases-hereditary-sensory-autonomic-neuropathy: H1=4.0→improved intro, +872 words
    • All word counts significantly increased (322-872 word deltas)
    • Database verified: wiki_pages table word_count fields confirm increases

    2026-04-10 12:15 UTC — Slot 0

    • Task: [36c43703-d777-442c-9b42-a02b1aa91bdf] Improve prose on 5 lowest-scoring wiki pages
    • Ran python3 scripts/wiki_prose_improver.py --batch 5 to rewrite intros and convert bullet-heavy sections
    • 5 Pages Improved (H1 before→after):
    1. therapeutics-psychosocial-interventions-cbs-psp: H1 0→8, +14 words
    2. therapeutics-exenatide-parkinsons-disease: H1 4→8, -7 words (intro rewritten)
    3. cell-types-locus-coeruleus-noradrenergic-projection-neurons: H1 5→8, +6 words
    4. cell-types-enteric-neurons-pd: H1 4→8, +1 word
    5. diseases-caregiver-support-palliative-care-cbs-psp: H1 6→6, sections converted to prose
    • H1 improvement ≥2 points achieved on 4/5 pages; 1 page already at 6
    • All word counts maintained or increased (4/5 positive delta)
    • Branch pushed to origin; waiting on merge

    2026-04-10 09:45 UTC — Slot 0

    • Task: [81759c6c-1dee-4e9a-a16d-a37b71fd7b02] Score wiki quality on 20 pages (daily heuristics)
    • Ran python3 scripts/wiki_quality_scorer.py --n 20 to score 20 wiki pages
    • Found and fixed type error in score_page(): LLM returns could be non-float (string), causing TypeError at composite score calculation
    • Fix: added float() conversion around llm_result.get() calls for h1_intro, h3_depth, h5_wikipedia
    • Scoring Results (20 pages):
    - Avg composite: 5.8, min: 2.9, max: 7.3
    - ≥ 7.0: 11%, ≥ 5.0: 89%, < 4.0: 0%
    - H1 intro avg: 6.6, ≥ 6: 83%
    • Top issues: no_analysis_links (33x), no_hypothesis_links (33x), missing_intro (28x), prose_thin (21x)
    • Lowest scored pages: therapeutics-psychosocial-interventions-cbs-psp (2.9), cell-types-enteric-neurons-pd (4.5), cell-types-locus-coeruleus (4.6)
    • Committed fix: [Atlas] Fix wiki_quality_scorer type error - cast LLM scores to float [task:81759c6c-1dee-4e9a-a16d-a37b71fd7b02]

    2026-04-10 04:20 UTC — Slot 0

    • Task: [46666cb6-08c6-4db8-89d3-f11f7717e8cd] Wikipedia parity audit for top 20 disease pages
    • Created scripts/wiki_wikipedia_parity.py with:
    - Wikipedia API integration with proper User-Agent headers
    - LLM-based comparison scoring (H5 heuristic)
    - Structured gap analysis (missing sections, thin coverage, structural/depth gaps)
    - Orchestra task generation for top gaps
    • Audit Results:
    - 20 disease pages audited against Wikipedia
    - 13 pages had valid Wikipedia comparisons, 4 had no equivalent (CBS, CBD, FTD, VCI, NPC, MJD, NCL)
    - 2 content mismatches identified (Kabuki: theatre vs disease, Stargardt: surname vs disease)
    • Top 5 Content Gaps (lowest H5 scores):
    1. Parkinson's Disease: 4.2/10 - Missing clinical presentation, diagnosis, treatment sections
    2. Fatal Familial Insomnia: 4.5/10 - Missing symptoms, diagnosis, treatment, case studies
    3. Alzheimer's Disease: 6.5/10 - Missing signs/symptoms, diagnosis, treatment, prevention
    4. Huntington's Disease: 6.5/10 - Missing clinical triad, diagnosis, treatment, epidemiology
    5. Multiple System Atrophy: 6.5/10 - Missing diagnosis, management, historical context
    • Generated Tasks: 5 Orchestra task definitions saved to scripts/wikipedia_parity_tasks.json
    - Tasks need to be created from main directory: orchestra task create --project SciDEX ...
    • Files Modified: scripts/wiki_wikipedia_parity.py (new), scripts/wikipedia_parity_tasks.json (new), logs/wikipedia_parity_audit_20260410_042536.json (results)
    • Next Steps: Create Orchestra tasks from main directory, then execute tasks sequentially

    2026-04-09 19:23 PDT — Slot 0

    • Started task: add a standalone migration for wiki_quality_scores.
    • Read scripts/wiki_quality_scorer.py to match the existing table and index definitions exactly.
    • Implemented migrations/add_wiki_quality_scores.py with idempotent table creation, slug/composite indexes, and a __main__ entrypoint.
    • Verified by running python3 migrations/add_wiki_quality_scores.py.
    • Result: migration created and quest spec updated with this work log entry.

    Verification — 2026-04-20 20:30:00Z

    Result: PASS Verified by: minimax:64 via task 46666cb6-08c6-4db8-89d3-f11f7717e8cd

    Tests run

    TargetCommandExpectedActualPass?
    Top 20 disease pagesDB query via get_db_readonly()20 pages20 pages retrieved
    H5 scores in wiki_quality_scoresSELECT DISTINCT slug, h5_wikipedia FROM wqs WHERE h5_wikipedia IS NOT NULL96 rows with H596 rows
    Wikipedia section coveragecurl /api/wiki/{slug} + regex scan for key sectionsKey sections presentAll 20 pages have ≥7/9 key sections
    Content API responsecurl -s http://localhost:8000/api/wiki/diseases-parkinsons-disease200 + JSON200 + valid page JSON
    Original audit commitgit log --all --oneline --grep="46666cb6"2 commits7397845fd, 1577f9737
    Audit script on mainls scripts/wiki_wikipedia_parity.pyNot presentNot on main (orphan branch)

    Attribution

    The current audit state was produced by:

    • 7397845fd — [Atlas] Add Wikipedia parity audit script; run audit on top 20 disease pages; create 5 improvement tasks [task:46666cb6...]
    - This commit exists on an orphan branch, NOT on main
    - The audit script was never merged to main (deprecated/removed during SQLite retirement)
    • 1577f9737 — [Atlas] Add Wikipedia parity audit script and run top-20 disease audit [task:46666cb6...]

    Notes

    Original audit findings (2026-04-10):
    The original audit identified these 5 biggest gaps vs Wikipedia:

  • Parkinson's Disease: H5=4.2 — missing clinical presentation, diagnosis, treatment
  • Fatal Familial Insomnia: H5=4.5 — missing symptoms, diagnosis, treatment, case studies
  • Alzheimer's Disease: H5=6.5 — missing signs/symptoms, diagnosis, treatment, prevention
  • Huntington's Disease: H5=6.5 — missing clinical triad, diagnosis, treatment, epidemiology
  • Multiple System Atrophy: H5=6.5 — missing diagnosis, management, historical context
  • Verification (2026-04-20): Current state shows substantial improvement on all 5 pages:

    PageWordsMissing sections vs Wikipedia (original)Current state
    diseases-parkinsons-disease8,807prognosis, historyAll 9 key sections present
    diseases-fatal-familial-insomnia4,032etiologyAll 9 key sections present
    diseases-alzheimers-disease5,365epidemiology8/9 present (epidemiology MISSING)
    diseases-huntingtons4,606etiology, pathophysiology, prognosis6/9 present (etiology MISSING, pathophysiology MISSING, prognosis MISSING)
    diseases-multiple-system-atrophy5,399(original gaps unclear)All 9 key sections present
    Key observation: The pages have been substantially improved since the original audit (likely by the prose_improver tasks that ran on 2026-04-10). The orphan-branch audit script was NEVER merged to main — it was deprecated when the SQLite scidex.db was retired in favor of PostgreSQL.

    H5 scoring status: The wiki_quality_scores table has 96 rows with H5 scores. Parkinson's disease has TWO entries (H5=2.0 on 2026-04-10 13:46 and H5=6.0 on 2026-04-10 22:39), showing active improvement over the same day. The newer H5=6.0 score reflects the improved content.

    Recommendation: The improvement tasks from the original audit were never created via Orchestra (the wikipedia_parity_tasks.json file was generated but tasks were not actually created due to the orphan branch issue). A follow-up task should create those 5 improvement tasks if they haven't been addressed by subsequent prose_improver runs.

    Payload JSON
    {
      "requirements": {
        "analysis": 6,
        "safety": 6
      },
      "completion_shas": [
        "ddefca09759cb60f401e18c9907c2db9fce0b4d0",
        "caa88af4d88e4effc30cd4dedc047a568759ccf0"
      ],
      "completion_shas_checked_at": "2026-04-17T10:48:24.357137+00:00",
      "_reset_note": "This task was reset after a database incident on 2026-04-17.\n\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\ncorruption. Some work done during Apr 16-17 may have been lost.\n\n**Before starting work:**\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\n\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\nSCIDEX_DB_BACKEND=postgres env var.",
      "_reset_at": "2026-04-18T06:29:22.046013+00:00",
      "_reset_from_status": "done"
    }

    Sibling Tasks in Quest (Atlas) ↗