[Atlas] Expand Knowledge Graph — Add Edges from PubMed Abstracts

← All Specs

[Atlas] Expand Knowledge Graph — Add Edges from PubMed Abstracts

Task ID: ef9b2f76-0cf9-458b-bfed-b1809ee82dc1 Priority: P90 Layer: Atlas

Objective

Use NLP co-occurrence extraction to add KG edges from paper abstracts. Focus on top entities by connection count. Target: 2000+ new edges with evidence.

Approach

Run existing extract_kg_from_abstracts.py which uses regex-based co-occurrence NLP to:
  • Detect genes, diseases, pathways, cell types, and drugs in paper abstracts
  • Classify relationship types (activates, inhibits, regulates, etc.)
  • Create evidence-backed edges with PubMed citations
  • Filter out existing edges and insert new ones
  • Results

    • Processed 10,514 papers with abstracts
    • Extracted 46,908 unique edge candidates
    • Inserted 6,083 new edges (3x the 2,000 target)
    • KG grew from 696,022 → 703,302 edges
    • Edge types: gene→gene (3,550), gene→pathway (1,076), gene→disease (909), gene→cell_type (420), drug→gene (56), disease→pathway (49), drug→disease (23)

    Work Log

    • 2026-04-03T00:00 — Started. Ran extract_kg_from_abstracts.py against 10,514 papers
    • 2026-04-03T00:05 — Completed. 6,083 new edges inserted successfully

    File: ef9b2f76_atlas_expand_kg_edges_spec.md
    Modified: 2026-05-01 20:13
    Size: 1.2 KB