[Atlas] Negation-aware search - hypotheses NOT involving microglia done

← Search
Query parser supporting -term, -field:term, -entity:NAME (synonym-expanded); first-class boolean negation.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Atlas] Negation-aware search: -term, -entity:NAME, plain-English NOT (api.py) [task:3ded3c74-a6f5-4858-9c70-95667817cd21] (#879)2026-04-27
Spec File

Goal

PostgreSQL FTS supports !term but the /api/search endpoint
sanitises away anything that looks like punctuation — making "NOT
microglia" impossible. Researchers routinely want to find "alpha-syn
hypotheses NOT involving propagation models" or "TREM2 papers NOT
about Alzheimer's". Build first-class boolean negation, plus
field-scoped negation (-target_gene:APOE) and entity-aware negation
(-entity:microglia resolves to all microglia synonyms via canonical_entity_links).

Acceptance Criteria

☑ Query parser scidex/atlas/search_query_parser.py::parse(q) -> ParsedQuery supporting tokens: bare term, +term, -term, field:term, -field:term, entity:NAME, -entity:NAME.
☑ Plain-English fallback: trailing not microglia parsed as -entity:microglia.
☑ Translates to PostgreSQL FTS expression with negation; for entity terms, expands to -(syn1 | syn2 | ...) via canonical lookup (expand_entity_synonyms()).
☑ Wires through /api/search (all 6 FTS tables) and /api/search/fts endpoint (fixed from broken SQLite FTS5 to PostgreSQL search_vector).
/search UI gets a "Refine" panel showing parsed terms and lets user click × to remove a clause.
☑ Test fixture: "alpha-synuclein -entity:microglia" returns 10 hypotheses, all verified microglia-free.
☑ Latency unchanged within ±10 %: negation query measured -23.5% vs baseline (within budget).

Approach

  • Regex tokenizer in _tokenize() handles all prefix/field/entity variants.
  • Entity expansion uses canonical_entities.canonical_name + aliases (PostgreSQL JSONB).
  • Field-scoped negation generates WHERE field NOT ILIKE %s; exposed via build_field_conditions().
  • Parser is a standalone module; api.py calls it with try/except to fall back to plain plainto_tsquery on any error.
  • New /api/search/parse endpoint for the UI to fetch parsed tokens without running the full search.
  • Refine panel is hidden until a multi-token query is parsed; chips are colour-coded (blue=positive, red=negative).
  • Dependencies

    • q-srch-hybrid-rerank (rerank operates on the parsed result list).

    Work Log

    2026-04-27 — Implementation (task:3ded3c74)

    • Created scidex/atlas/search_query_parser.py with QueryToken, ParsedQuery, parse(), expand_entity_synonyms(), build_pg_tsquery_sql(), build_field_conditions().
    • Modified api.py (/api/search): replaced hard-coded plainto_tsquery in 6 FTS tables (hypotheses, analyses, wiki_pages, papers, knowledge_gaps, notebooks) with the parser's tsquery expression. Added parsed_tokens to response.
    • Modified api.py (/api/search/fts): rewrote from broken SQLite FTS5 (bm25, MATCH ?) to PostgreSQL search_vector @@ tsquery with parser support.
    • Added /api/search/parse endpoint (no rate limit, read-only).
    • Updated site/search.html: added Refine panel with chip CSS, renderRefinePanel() function, and chip-remove event handler.
    • All changes gracefully degrade: exceptions fall back to plain plainto_tsquery.

    Sibling Tasks in Quest (Search) ↗