[Forge] Build automated PubMed update pipeline for hypothesis evidence
Task ID: eba164da-60eb-48ba-99b3-7246ec365465
Goal
Create and maintain a recurring pipeline that searches PubMed for new papers related to top hypotheses and updates their evidence_for/evidence_against fields with classified citations. The pipeline runs as a systemd daemon, uses Claude Haiku for evidence classification, and tracks all updates in a monitoring log.
Current State
The pipeline (pubmed_update_pipeline.py) is already built and running as scidex-pubmed-update systemd service. It has processed 176 hypotheses and added 509 papers. API endpoints exist at /api/pubmed-pipeline-status and /api/pubmed-pipeline/trigger.
Acceptance Criteria
☑ Pipeline script with PubMed search, Claude Haiku classification, DB update
☑ Daemon mode with configurable interval (default 6 hours)
☑ systemd service running (scidex-pubmed-update)
☑ API endpoints for status and manual trigger
☑ Monitoring log table (pubmed_update_log)
☑ Deduplication of existing PMIDs
☑ Rate limiting for NCBI API compliance
☐ Evidence items include abstract snippets for richer display
☐ Classifier returns strength ratings (strong/moderate/weak)
☐ Search queries use hypothesis description keywords for broader coverage
Approach
Enhance evidence item format to include abstract snippet (200 chars) and strength rating
Update Claude Haiku prompt to include strength classification
Improve build_queries() to extract keywords from description text
Test with dry run, verify evidence quality
Commit and deployWork Log
2026-04-02 — Slot 6
- Reviewed existing pipeline: fully operational, 176 hypotheses tracked, 509 papers added
- Identified quality gaps: evidence items lack abstract snippets and strength ratings
- Enhancing classifier prompt and evidence item format