[Atlas] Expand Knowledge Graph — Add Edges from PubMed Abstracts
Task ID: ef9b2f76-0cf9-458b-bfed-b1809ee82dc1
Priority: P90
Layer: Atlas
Objective
Use NLP co-occurrence extraction to add KG edges from paper abstracts. Focus on top entities by connection count. Target: 2000+ new edges with evidence.
Approach
Run existing
extract_kg_from_abstracts.py which uses regex-based co-occurrence NLP to:
Detect genes, diseases, pathways, cell types, and drugs in paper abstracts
Classify relationship types (activates, inhibits, regulates, etc.)
Create evidence-backed edges with PubMed citations
Filter out existing edges and insert new onesResults
- Processed 10,514 papers with abstracts
- Extracted 46,908 unique edge candidates
- Inserted 6,083 new edges (3x the 2,000 target)
- KG grew from 696,022 → 703,302 edges
- Edge types: gene→gene (3,550), gene→pathway (1,076), gene→disease (909), gene→cell_type (420), drug→gene (56), disease→pathway (49), drug→disease (23)
Work Log
- 2026-04-03T00:00 — Started. Ran extract_kg_from_abstracts.py against 10,514 papers
- 2026-04-03T00:05 — Completed. 6,083 new edges inserted successfully