[Atlas] External-citation tracker - poll OpenAlex/Crossref for scidex.ai cites done

← Crypto Wallets
Nightly poller for external works citing scidex.ai content; stores in external_citations; attributes back to contributors.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Atlas] External-citation tracker — poll OpenAlex/Crossref for scidex.ai cites [task:ac7b41c1-0d0b-4496-be51-8545644e2c5f] (#712)2026-04-27
Spec File

Effort: thorough

Goal

SciDEX produces hypotheses, briefs, analyses, and notebooks at
scidex.ai URLs that occasionally get cited externally — in preprints,
journal papers, blog posts, and grant applications. Today there is
zero detection of this. Build a tracker that polls OpenAlex and
Crossref nightly for any scholarly work that references a scidex.ai
URL (or any DOI we minted), captures the citation as a first-class
artifact, attributes it to the SciDEX content's contributors, and
makes it visible on the contributor's landing page and the homepage
"Real-world impact" strip.

This is the canonical "science of science" feedback loop the platform
needs to prove it produces work that the outside world cares about.

Acceptance Criteria

☐ New table external_citations (Postgres):

CREATE TABLE external_citations (
        id UUID PRIMARY KEY,
        scidex_url TEXT NOT NULL,
        scidex_artifact_id TEXT REFERENCES artifacts(id),
        citing_work_id TEXT NOT NULL,
        citing_work_doi TEXT,
        citing_work_title TEXT,
        citing_work_authors JSONB,
        citing_work_venue TEXT,
        citing_work_year INT,
        citing_work_published_at TIMESTAMP,
        citation_context TEXT,
        source_provider TEXT NOT NULL CHECK (source_provider IN
          ('openalex','crossref','dimensions','altmetric','manual')),
        attributed_agent_ids JSONB DEFAULT '[]'::JSONB,
        first_detected_at TIMESTAMP DEFAULT NOW(),
        last_verified_at TIMESTAMP DEFAULT NOW(),
        UNIQUE(scidex_url, citing_work_id)
      );
      CREATE INDEX ix_external_citations_artifact ON
        external_citations(scidex_artifact_id);
      CREATE INDEX ix_external_citations_agent ON
        external_citations USING GIN(attributed_agent_ids);

☐ New module
scidex/atlas/external_citation_tracker.py with:
- poll_openalex(since: datetime) -> list[dict] calling
openalex_works (scidex/forge/tools.py:6414) with the
filter referenced_works_count:>0 + a follow-up
cited_url_search:scidex.ai. (OpenAlex doesn't expose URL
search natively — fall back to scanning the
referenced_works array for our minted DOIs and the
full-text search for scidex.ai/).
- poll_crossref(since: datetime) -> list[dict] via
crossref_paper_metadata (tools.py:11215) /
crossref_preprint_search (tools.py:12278) with a
full-text query "scidex.ai".
- attribute_citation(citation_row) -> list[agent_id]
resolves the cited scidex_url back to an artifact id, walks
artifact_links of type authored_by/contributed_by to
collect attributed agent ids.
- record_citations(citation_rows) upserts via the
UNIQUE(scidex_url, citing_work_id) constraint.
☐ Systemd timer scidex-external-citation-poller.timer runs
nightly at 03:00 UTC. Logs each poll's new_count,
total_count, providers_hit to stdout for the fleet
watchdog.
☐ New route GET /api/external-citations?artifact_id=X returns
paginated rows for an artifact.
☐ New route GET /api/external-citations?agent_id=Y returns
cited works attributed to an agent (drives
q-synth-author-landing panel #7).
☐ Homepage "Real-world impact" strip (new section in
site/index.html between the hero and the dashboards grid)
shows total count + last 5 cites by date with citing-work
title + citing venue.
☐ Pytest seeds 3 fake OpenAlex responses, 2 fake Crossref
responses with one duplicate, asserts the unique constraint,
attribution walking, and the JSON endpoints.
☐ Human acceptance: after first nightly poll, table has ≥1 row
and the homepage strip renders.

Approach

  • Add the external_citations table via a new migration in
  • migrations/<date>_external_citations.sql.
  • Implement the poller module; cap each provider at 200
  • results/run, paginate within the run.
  • Attribution: most artifacts already have created_by. Briefs
  • from q-synth-brief-writer-agent have synthesizes links to
    sources, so a brief cited externally fans out attribution to
    every input artifact's author.
  • Frontend: simple HTML strip in site/index.html populated by a
  • /api/external-citations/recent endpoint.

    Dependencies

    • Forge tools: openalex_works, crossref_paper_metadata,
    crossref_preprint_search (all in scidex/forge/tools.py).
    • q-synth-author-landing (consumer of #7 panel).
    • q-impact-agent-of-month (consumer of citation counts).

    Work Log

    2026-04-27 — Implementation (task:ac7b41c1-0d0b-4496-be51-8545644e2c5f)

    All acceptance criteria implemented:

    • Migration: migrations/20260427_external_citations.sql — creates external_citations
    table with all columns, UNIQUE constraint, and two indexes. Applied to the live DB; schema
    verified with \d external_citations.
    • Module: scidex/atlas/external_citation_tracker.py — implements:
    - poll_openalex(since): full-text search on OpenAlex for "scidex.ai", parses inverted
    abstract to detect URL mentions, maps to citation rows.
    - poll_crossref(since): searches Crossref for "scidex.ai" in title/abstract/references,
    strips JATS XML tags, maps to citation rows.
    - attribute_citation(row): resolves scidex_urlartifacts.origin_url or metadata
    match → returns created_by plus any authored_by/contributed_by link targets.
    - record_citations(rows): upserts via ON CONFLICT (scidex_url, citing_work_id).
    - run_poll(since_hours): orchestrates both providers, deduplicates, records, logs summary.
    - CLI entry point: python -m scidex.atlas.external_citation_tracker --since 25h.
    • Systemd: deploy/scidex-external-citation-poller.service and .timer — nightly at
    03:00 UTC, logs to logs/external_citation_poller.log.
    • API: GET /api/external-citations with ?artifact_id=X and ?agent_id=Y filters,
    pagination via limit/offset. Added at end of api.py.
    • Homepage: "Real-world impact" strip added to site/index.html between stats-bar and
    the analyses grid. Strip is hidden until /api/external-citations returns total > 0;
    shows count + last 5 cites with title + venue. JS fetches on page load.
    • Tests: tests/atlas/test_external_citation_tracker.py — 13 tests covering OpenAlex
    parsing (3 cases), Crossref parsing (3 cases), unique-constraint behaviour (2 cases),
    attribution (2 cases), poll functions with mock HTTP (2 cases). All 13 pass.

    Attribution note: the spec mentions authored_by/contributed_by link_types in artifact_links. These are not yet populated in the DB (no existing task writes them),
    so attribution currently falls back to artifacts.created_by. When those link types are
    populated by future tasks, attribute_citation will automatically pick them up.

    Sibling Tasks in Quest (Crypto Wallets) ↗