target_gene to a live CELLxGENE Censuscellxgene_gene_expression() in
scidex/forge/tools.py:6102 only hits the Discover HTTP API (collectionscellxgene-census Python package and exposes a hypothesis_id-keyed cache.mechanistic_plausibility and cell_type_specificityscidex/forge/census_expression.py (≤300 LoC) withexpression_for_gene(gene, organism="Homo sapiens") returning percell_type mean/median/n_cells fromcellxgene_census.get_anndata(...). Pin Census version withcensus_version="2024-07-01" (most recent stable LTS at task time).
data/cellxgene_cache/<gene>_<census_version>.parquetscripts/backfill_hypothesis_census.py walkshypotheses table where target_gene IS NOT NULL, calls the newhypothesis_cell_type_expression table (migration included).
scidex/forge/tools.py:cellxgene_gene_expression re-exports the newSCIDEX_DISABLE_CENSUS=1.
synthesis_engine.py) reads the new table whencell_type_specificity — log a metric census_hits_total so/metrics.
"source": "fallback" for genes Censuscellxgene-census>=1.13 to requirements.txt and verify install inforge-bio conda env (docker/forge-bio/environment.yml).
census_expression.py with a thread-safe LRU around thecensus.open_soma() handle (Census recommends one open SOMA per process).
migrations/add_hypothesis_cell_type_expression.py — table keyed(hypothesis_id, cell_type) with mean_log_norm, median_log_norm,n_cells, census_version, fetched_at.
synthesis_engine.score_hypothesis() to JOIN against the new tablecell_type_specificity; add a Prometheus counter.
scidex-census-refresh.timer that re-runs backfill forcensus_version is older than the pinned LTS.data/scidex-artifacts — caches written underdata/scidex-artifacts/cellxgene/; must use scidex.atlas.artifact_commit.
q-555b6bea3848 task "Integrate real Allen data into theApproach taken:
cellxgene-census (requires <3.13). ImplementedFiles created/modified:
migrations/20260427_add_hypothesis_cell_type_expression.sql — PG migrationhypothesis_cell_type_expression(hypothesis_id, cell_type, mean_log_norm,
median_log_norm, n_cells, census_version, fetched_at).
scidex/forge/census_expression.py — new module (~275 LoC). Public API:expression_for_gene(gene, organism). Thread-safe SOMA handle, Parquet +census_hits_total counter, WMG v2 fallback.
scripts/backfill_hypothesis_census.py — walks hypotheses WHEREtarget_gene IS NOT NULL, upserts to hypothesis_cell_type_expression.--limit, --dry-run, --stale-only.
scidex/forge/tools.py — cellxgene_gene_expression now callscensus_expression.expression_for_gene; legacy dataset-index HTTP path isscidex/agora/synthesis_engine.py — added get_census_cell_type_context(hypothesis_cell_type_expression, returnscensus_hits_total.
requirements.txt — added cellxgene-census>=1.13 (with Python <3.13 note).expression_for_gene("TREM2") returns 712 cell types via WMG v2;source field is
"wmg_http" (not "fallback").