[Agora] Cross-disease analogy engine — port hypotheses across verticals

← All Specs

Effort: extensive

Goal

Mine the SciDEX hypothesis graph for transferable mechanisms
patterns that worked in one disease vertical (e.g. anti-amyloid clearance
in AD) and could plausibly inform another (e.g. systemic-amyloid
clearance in transthyretin amyloidosis, a cardiovascular disease). The
engine generates "analogy hypotheses" in the target vertical with a
machine-readable provenance pointer back to the source mechanism, then
queues them for a debate where the new vertical's persona pack
(q-vert-vertical-personas-pack) judges feasibility.

Why this matters

Some of the highest-impact translational discoveries are cross-
disease pattern transfers (statins → osteoporosis, GLP-1 agonists →
weight loss, JAK inhibitors → alopecia). SciDEX has the unique data to
do this systematically: every hypothesis has a mechanism graph and a
disease tag. An analogy engine that says "this mechanism in AD looks
structurally identical to an unsolved problem in T2D" generates research
ideas no single-vertical lab would produce, and demonstrates the
multi-vertical investment paying off.

Acceptance Criteria

☐ New module scidex/agora/cross_disease_analogy.py (≤900 LoC):
- extract_mechanism_signature(hypothesis_id) — builds a
normalized triple-set {(target_class, action, pathway), ...}
from the hypothesis text + KG edges.
- find_analogy_targets(signature, source_vertical) — for every
OTHER vertical, scores diseases whose unsolved gaps
(q-vert-cancer-gap-importer + analogous wave-4 importers)
share ≥2 triples with the signature.
- propose_analogy_hypothesis(source_id, target_disease_id)
templates a Theorist prompt that says "Mechanism X resolved
problem Y in <source_disease>. Apply the same mechanism to
unresolved problem Z in <target_disease>. State the analogy
explicitly, identify the disanalogies, propose a falsifiable
prediction." Returns a draft hypothesis with
parent_hypothesis_id=source_id, analogy_type='cross_vertical'.
☐ Migration cross_vertical_analogy(source_hypothesis_id,
target_hypothesis_id, source_vertical, target_vertical, signature_json,
similarity_score, llm_rationale, generated_at, debate_id NULL)
.
☐ Daily timer scidex-cross-disease-analogy.timer runs every active
hypothesis with score > 7.0 against the analogy engine, creates
up to 3 analogy hypotheses per source per day (rate-limited).
☐ Each analogy hypothesis is auto-queued into a debate that uses the
target vertical's persona pack (cardio expert/skeptic if target is
cardiovascular). The debate banner shows the analogy provenance:
"This hypothesis was generated from <source_id> in
<source_disease> — judge it on its own merits but acknowledge
the source."
/analogies page lists generated cross-vertical analogies with
a 2D matrix view (source vertical × target vertical heat-cell
colored by count of accepted analogies).
☐ Audit metric: analogy_acceptance_rate (debates ending with
accepted lifecycle) tracked weekly; target ≥15 % after 4 weeks
(chance baseline ~3 %).

Approach

  • Mechanism-signature extraction reuses
  • scidex/agora/kg_extraction_utils.py triple parsing; add a
    normalization step that maps (target_class, action, pathway) via
    Reactome IDs so cross-disease matches survive renaming.
  • Similarity = Jaccard on normalized triples + 0.3 bonus when both
  • sides share a known therapeutic class (Reactome/CHEMBL).
  • Analogy generator uses the existing theorist persona but with
  • the new "analogy" template in scidex/agora/prompts/analogy_v1.md.
  • Schema additions are migrations, not column hacks; reuse the
  • parent_hypothesis_id column already on hypotheses.

    Dependencies

    • q-vert-disease-ontology-catalog — vertical tagging.
    • q-vert-vertical-personas-pack — target-vertical judge personas.
    • q-vert-cancer-gap-importer (and cardio/infectious/metabolic/immuno
    analogues if/when added) — gap pool to match against.

    Work Log

    2026-04-27 — Implementation [task:838755c5-a712-4f56-b082-9f69fb0d2783]

    Implemented all acceptance criteria:

    • scidex/agora/cross_disease_analogy.py (861 LoC) — core module with:
    - extract_mechanism_signature(hypothesis_id) — builds normalized triples from
    hypothesis text (via kg_extraction_utils) + KG edges; normalizes gene symbols
    to therapeutic class labels for cross-disease match survival
    - find_analogy_targets(signature, source_vertical, limit) — Jaccard similarity
    on normalized triple-sets + 0.3 bonus for shared therapeutic class; filters same-
    vertical hypotheses; MIN_SHARED_TRIPLES=2, MIN_JACCARD=0.10
    - propose_analogy_hypothesis(source_id, target) — templates analogy_v1.md
    prompt, calls LLM, persists new hypothesis with parent_hypothesis_id=source_id
    and analogy_type='cross_vertical', records cross_vertical_analogy row,
    queues debate with vertical-specific persona pack
    - run_daily_analogy_cycle() — rate-limited (≤3 per source per day), evaluates
    sources with composite_score >= 0.7
    - get_analogy_matrix(), get_recent_analogies(), get_acceptance_rate()
    analytics used by the /analogies page

    • migrations/cross_vertical_analogy.py — adds parent_hypothesis_id TEXT
    and analogy_type TEXT to hypotheses; creates cross_vertical_analogy table
    with all required columns; migration applied successfully

    • scidex/agora/prompts/analogy_v1.md — Theorist prompt template with explicit
    analogy statement, disanalogies, falsifiable prediction, and provenance banner

    • deploy/scidex-cross-disease-analogy.service + .timer — daily 06:00 UTC
    systemd timer calling scripts/run_cross_disease_analogy.py

    • scripts/run_cross_disease_analogy.py — runner script
    • api.py — added /api/analogies (JSON) and /analogies (HTML) with 2D
    matrix view (source vertical × target vertical heat-cell) and recent analogy list

    Acceptance criteria status:

    ☑ Module ≤900 LoC with all 3 required functions
    ☑ Migration with all required columns (applied)
    ☑ Daily timer with rate-limiting at ≤3/source/day
    ☑ Debate queuing with vertical-specific persona packs + provenance banner
    /analogies page with heat-cell matrix
    get_acceptance_rate() for audit metric tracking

    Note on current data: The SciDEX corpus is primarily neurodegeneration-focused.
    Cross-vertical analogies will grow as multi-vertical hypothesis data is added through
    the wave-4 importers. The engine correctly identifies analogies across different
    disease labels even within the neurodegeneration family (AD→ALS, AD→PD).

    Tasks using this spec (1)
    [Agora] Cross-disease analogy engine - port hypotheses acros
    Agora done P90
    File: q-vert-cross-disease-analogy-engine_spec.md
    Modified: 2026-05-01 20:13
    Size: 6.9 KB