Goal
For every preprint cited in a hypothesis or analysis (bioRxiv, medRxiv,
arXiv), poll Crossref + Europe PMC weekly to detect when the peer-reviewed
version lands. When detected, link the two records, surface the upgrade on
the originating hypothesis page, and emit an event the Skeptic can subscribe
to ("the preprint you doubted is now in Cell").
Why this matters
Right now an agent can cite
bioRxiv 2024.05.12.594321 and that citation
just stays a preprint reference forever, even if the same paper was published
in Nature six months later. We lose the credibility uplift, fail to update
inline citation freshness, and miss a major signal for hypothesis re-scoring.
Acceptance Criteria
☐ Migration preprint_publication_link(preprint_id, preprint_source,
preprint_doi, published_doi, published_pmid, published_journal,
detection_method, detected_at).
☐ scripts/preprint_publication_poller.py walks distinct preprint DOIs
cited in the last 12 months, queries:
- Crossref
relation field (
is-preprint-of),
- Europe PMC
crossReferences for matching titles,
- Semantic Scholar
externalIds.
☐ On a hit, write the link row, update each citing hypothesis's
evidence_version, and POST an event to
event_bus of type
preprint_published with the (preprint, published) pair.
☐ /hypothesis/<id> page replaces preprint citations with
"preprint → Nature 2024" badges sourced from the new table.
☐ Skeptic persona consumes the event and re-issues only those debate
rounds where the upgraded paper materially changes the picture
(heuristic: published version cited >5 times more than preprint).
☐ Runs on scidex-preprint-tracker.timer weekly Saturday 04:00 UTC.
Approach
Crossref relation fields are the most reliable signal — start there.
Title-shingle (token Jaccard >0.85) as last-resort match.
Persist detection_method so we can audit false-positive linking.Dependencies
- Existing
crossref_paper_metadata and europe_pmc_search in
scidex/forge/tools.py.
Work Log
2026-04-27 04:00 UTC — Slot 0
- Migration created:
scripts/migrations/003_add_preprint_publication_link.py
- Table:
preprint_publication_link with preprint_id, preprint_source, preprint_doi, published_doi, published_pmid, published_journal, detection_method, detected_at, created_at, updated_at
- History table:
preprint_publication_link_history with UPDATE/DELETE triggers
- Verified: dry-run and real run both succeed
- Poller script created:
scripts/preprint_publication_poller.py
- Queries preprint DOIs from papers table (biorxiv, medrxiv, arxiv)
- Checks Crossref relations API for
is-preprint-of links
- Falls back to Semantic Scholar externalIds and Europe PMC title match (Jaccard >0.85)
- Writes to preprint_publication_link table and emits
preprint_published events
- Verified: dry-run works, found arxiv.0711.0409 → PMID 18386982 via title_match
- Event bus updated: Added
preprint_published to EVENT_TYPES in scidex/core/event_bus.py
- Hypothesis page updated:
api.py:hypothesis_detail_page()
- Collects preprint DOIs from evidence_for and evidence_against
- Batch queries preprint_publication_link table
- Renders "preprint → Journal Year" badges with PMID links
- Syntax verified:
python3 -m py_compile api.py passes
- Testing: API status returns 200, hypothesis page
/hypothesis/h-aging-h7-prs-aging-convergence returns 200