[Forge] Weekly automated SciDEX vs Biomni vs K-Dense comparison artifact

← All Specs

Effort: thorough

Goal

Memory [Cross-cutting] Biotools competitive intel plus docs/bio_competitive/biomni_profile.md and docs/bio_competitive/k_dense_profile.md track the rivals manually. The
weekly cadence is sustained by humans, the comparison matrix
(docs/bio_competitive/comparison_matrix.md) goes stale fast, and there is
no programmatic ingestion of the resulting deltas back into Forge planning.
Build a recurring automated comparison that scrapes each competitor's
public surface (release notes, arXiv, blog, GitHub), diffs against the
prior week's snapshot, scores the deltas with an LLM-graded rubric (new
capabilities, claimed performance, integrations), and registers the result
as a competitive_intel artifact every Monday. The artifact embeds in /forge/landscape and feeds prioritisation in q-prop-funding-proposals-….

Acceptance Criteria

scidex/forge/competitive_intel/scanner.py with scan(competitor) -> dict for each of biomni, k_dense, anthropic_skills, futurehouse; each scanner returns {releases: [...], arxiv_papers: [...], capability_changes: [...], headline_metrics: {...}}.
☐ Diff engine diff_against(prev_artifact_id, current_scan) returns {added, removed, changed} lists per competitor.
compare_to_scidex(diff) uses LLM grader (llm.py provider abstraction) with rubric in prompts/competitive_rubric.md to score "SciDEX gap exposure" 0-10 per delta.
☐ Cron in scidex/senate/scheduled_tasks.py at Mon 06:00 UTC runs the full pipeline.
☐ Output written via scidex.atlas.artifact_commit.commit_artifact as competitive_intel artifact with structured frontmatter (week_of, competitors_scanned, delta_count, top_5_gaps, top_5_advantages); raw HTML report in artifact body.
/forge/landscape/intel page lists the last 12 weekly artifacts and a sparkline of "SciDEX gap exposure" trend.
☐ When top_5_gaps[i].score >= 7, the pipeline auto-creates an intel_followup Senate proposal linking the gap to candidate Forge tasks.
☐ Test: mock 2 competitors with deterministic scrape output; run pipeline; verify artifact registered with correct delta count; second run finds 0 deltas; gap-score-7 mock triggers proposal.

Approach

  • Read docs/bio_competitive/biomni_parity_verification.md and forge/biomni_parity/pipeline.py for the existing manual Biomni audit; lift its scoring patterns.
  • Scrapers should be polite — respect robots.txt, rate-limit to 1 req/3 s, log to competitive_intel_scrape_log for debuggability.
  • Use WebSearch/WebFetch (deferred tools) wrapped behind scidex/forge/competitive_intel/web_client.py so swapping fetcher is trivial.
  • LLM grader runs at effort=thorough (Opus, xhigh) per AGENTS.md effort-tier table — this is where novel framing happens.
  • Idempotency: re-running on the same week is a no-op (key on (competitor, week_of)).
  • Dependencies

    • forge/biomni_parity/pipeline.py — existing parity infrastructure to extend.
    • scidex/atlas/artifact_commit.commit_artifact — artifact write path.
    • scidex/senate/governance.py:159 create_proposal — for intel_followup proposals.

    Work Log

    Tasks using this spec (1)
    [Forge] Weekly automated SciDEX vs Biomni vs K-Dense compari
    File: q-tools-weekly-comparison-artifact_spec.md
    Modified: 2026-05-01 20:13
    Size: 3.2 KB