Effort: medium
> SUPERSEDED 2026-05-20. This spec targeted api.py:73829
> (disease_landing_page) and the v1 entity_disease_canonical
> table. v1 was frozen 2026-05-13 (see AGENTS.md §"v1 FROZEN");
> the api.py JOIN refactor described here cannot be implemented.
> See q-disease-canonical-tagging-v2_spec.md for the v2
> substrate equivalent and 2026-05-10-notebook-disease-recap.md
> for the session context.
>
> v1's Phase 1 broadening (PR #1386) remains live in v1's serving
> window and is the known-good interim behavior. The Phase 3
> backfill script at scripts/backfill_entity_disease_canonical.py
> is forensic / reference material only — do not run against v1.
q-synth-disease-landing_spec.md (Apr 2026) called for entity-link joins
to drive the per-disease landing page. The cherry-pick that shipped
(d734ea71d / f2103dbc4) skipped the entity-link audit and used naive
text matching against hypotheses.disease, analyses.domain,
challenges.domain. Result: /disease/<slug> is empty for any disease
whose content was tagged with a broad-area label
("neurodegeneration", "neuroinflammation") rather than the specific
disease name — i.e. ~80% of the catalog including all 250 non-ND disease
landing pages from the 9ca528d3a fan-out.
Phase 1 (PR #1386) broadened the live queries with ontology-synonym
expansion as an interim fix. Phase 3 (separate PR) added
scripts/backfill_entity_disease_canonical.py to populate
entity_disease_canonical from existing text fields. This task closes
the remaining gaps so we never silently lose disease tagging again.
Make disease tagging on new content automatic and machine-checkable, so
the disease landing page works without future backfills.
entity_disease_canonical is non-empty (≥1000 rows after the/atlas/quality.
hypotheses, analyses, challenges,knowledge_gaps rows are updated to request a MONDO ID (ordisease / domain field. The promptdisease_ontology_catalog and link agents to a/api/diseases/resolve?q=<text> — to build).
/api/diseases/resolve?q=<text> endpoint returns up to 5(mondo_id, label, confidence) candidates by reusing thescripts/backfill_entity_disease_canonical.py.hypotheses / analyses / challenges /knowledge_gaps, a trigger calls the resolver and writes the(entity_id, mondo_id, confidence) intoentity_disease_canonical — single-disease tagging done at thedisease_landing_page (api.py:73829) switches its WHEREentity_disease_canonical. Synonym-expansion stays as a fallbacktests/test_disease_landing.py extends coverage with onebreast-cancer) and asserts ≥1 row in everyscripts/backfill_entity_disease_canonical.py intoscidex/atlas/disease_resolver.py. Exposeresolve(text: str, top_k: int = 5) -> list[(mondo_id, label, conf)]./api/diseases/resolve in api.py.f_canonicalize_disease() thatdisease/domain + title (+ description if present).orchestra/prompt.md and thescidex/agora/disease to the exactdisease_ontology_catalog matching the entity. If/api/diseases/resolve?q=<your-text> and use thedisease_landing_page, replace eachWHERE col ILIKE ANY(...) with WHERE id IN (
SELECT entity_id FROM entity_disease_canonical WHERE mondo_id = $1
). Pre-resolve the slug → mondo_id at the top of the function.entity_disease_canonical_size andentity_disease_canonical_freshness (median age) metrics tobuild_quality_dashboard() and surface on /atlas/quality./disease/<slug> to /diseases/<slug>; update(to be filled in by the implementer)