Effort: thorough
SciDEX produces hypotheses, briefs, analyses, and notebooks at
scidex.ai URLs that occasionally get cited externally — in preprints,
journal papers, blog posts, and grant applications. Today there is
zero detection of this. Build a tracker that polls OpenAlex and
Crossref nightly for any scholarly work that references a scidex.ai
URL (or any DOI we minted), captures the citation as a first-class
artifact, attributes it to the SciDEX content's contributors, and
makes it visible on the contributor's landing page and the homepage
"Real-world impact" strip.
This is the canonical "science of science" feedback loop the platform
needs to prove it produces work that the outside world cares about.
external_citations (Postgres):CREATE TABLE external_citations (
id UUID PRIMARY KEY,
scidex_url TEXT NOT NULL,
scidex_artifact_id TEXT REFERENCES artifacts(id),
citing_work_id TEXT NOT NULL,
citing_work_doi TEXT,
citing_work_title TEXT,
citing_work_authors JSONB,
citing_work_venue TEXT,
citing_work_year INT,
citing_work_published_at TIMESTAMP,
citation_context TEXT,
source_provider TEXT NOT NULL CHECK (source_provider IN
('openalex','crossref','dimensions','altmetric','manual')),
attributed_agent_ids JSONB DEFAULT '[]'::JSONB,
first_detected_at TIMESTAMP DEFAULT NOW(),
last_verified_at TIMESTAMP DEFAULT NOW(),
UNIQUE(scidex_url, citing_work_id)
);
CREATE INDEX ix_external_citations_artifact ON
external_citations(scidex_artifact_id);
CREATE INDEX ix_external_citations_agent ON
external_citations USING GIN(attributed_agent_ids);scidex/atlas/external_citation_tracker.py with:poll_openalex(since: datetime) -> list[dict] callingopenalex_works (scidex/forge/tools.py:6414) with thereferenced_works_count:>0 + a follow-upcited_url_search:scidex.ai. (OpenAlex doesn't expose URLreferenced_works array for our minted DOIs and thescidex.ai/).poll_crossref(since: datetime) -> list[dict] viacrossref_paper_metadata (tools.py:11215) /crossref_preprint_search (tools.py:12278) with a"scidex.ai".attribute_citation(citation_row) -> list[agent_id] —artifact_links of type authored_by/contributed_by torecord_citations(citation_rows) upserts via theUNIQUE(scidex_url, citing_work_id) constraint.
scidex-external-citation-poller.timer runsnew_count,total_count, providers_hit to stdout for the fleetGET /api/external-citations?artifact_id=X returnsGET /api/external-citations?agent_id=Y returnsq-synth-author-landing panel #7).
site/index.html between the hero and the dashboards grid)external_citations table via a new migration inmigrations/<date>_external_citations.sql.
created_by. Briefsq-synth-brief-writer-agent have synthesizes links tosite/index.html populated by a/api/external-citations/recent endpoint.openalex_works, crossref_paper_metadata,crossref_preprint_search (all in scidex/forge/tools.py).
q-synth-author-landing (consumer of #7 panel).q-impact-agent-of-month (consumer of citation counts).All acceptance criteria implemented:
migrations/20260427_external_citations.sql — creates external_citations\d external_citations.
scidex/atlas/external_citation_tracker.py — implements:poll_openalex(since): full-text search on OpenAlex for "scidex.ai", parses invertedpoll_crossref(since): searches Crossref for "scidex.ai" in title/abstract/references,attribute_citation(row): resolves scidex_url → artifacts.origin_url or metadatacreated_by plus any authored_by/contributed_by link targets.record_citations(rows): upserts via ON CONFLICT (scidex_url, citing_work_id).run_poll(since_hours): orchestrates both providers, deduplicates, records, logs summary.python -m scidex.atlas.external_citation_tracker --since 25h.
deploy/scidex-external-citation-poller.service and .timer — nightly atlogs/external_citation_poller.log.
GET /api/external-citations with ?artifact_id=X and ?agent_id=Y filters,limit/offset. Added at end of api.py.
site/index.html between stats-bar and/api/external-citations returns total > 0;tests/atlas/test_external_citation_tracker.py — 13 tests covering OpenAlexAttribution note: the spec mentions authored_by/contributed_by link_types in
artifact_links. These are not yet populated in the DB (no existing task writes them),
so attribution currently falls back to artifacts.created_by. When those link types are
populated by future tasks, attribute_citation will automatically pick them up.