[Forge] Weekly automated SciDEX vs Biomni vs K-Dense comparison artifact

Effort: thorough

Goal

Memory [Cross-cutting] Biotools competitive intel plus docs/bio_competitive/biomni_profile.md and docs/bio_competitive/k_dense_profile.md track the rivals manually. The
weekly cadence is sustained by humans, the comparison matrix
(docs/bio_competitive/comparison_matrix.md) goes stale fast, and there is
no programmatic ingestion of the resulting deltas back into Forge planning.
Build a recurring automated comparison that scrapes each competitor's
public surface (release notes, arXiv, blog, GitHub), diffs against the
prior week's snapshot, scores the deltas with an LLM-graded rubric (new
capabilities, claimed performance, integrations), and registers the result
as a competitive_intel artifact every Monday. The artifact embeds in /forge/landscape and feeds prioritisation in q-prop-funding-proposals-….

Acceptance Criteria

☐ scidex/forge/competitive_intel/scanner.py with scan(competitor) -> dict for each of biomni, k_dense, anthropic_skills, futurehouse; each scanner returns {releases: [...], arxiv_papers: [...], capability_changes: [...], headline_metrics: {...}}.

☐ Diff engine diff_against(prev_artifact_id, current_scan) returns {added, removed, changed} lists per competitor.

☐ compare_to_scidex(diff) uses LLM grader (llm.py provider abstraction) with rubric in prompts/competitive_rubric.md to score "SciDEX gap exposure" 0-10 per delta.

☐ Cron in scidex/senate/scheduled_tasks.py at Mon 06:00 UTC runs the full pipeline.

☐ Output written via scidex.atlas.artifact_commit.commit_artifact as competitive_intel artifact with structured frontmatter (week_of, competitors_scanned, delta_count, top_5_gaps, top_5_advantages); raw HTML report in artifact body.

☐ /forge/landscape/intel page lists the last 12 weekly artifacts and a sparkline of "SciDEX gap exposure" trend.

☐ When top_5_gaps[i].score >= 7, the pipeline auto-creates an intel_followup Senate proposal linking the gap to candidate Forge tasks.

☐ Test: mock 2 competitors with deterministic scrape output; run pipeline; verify artifact registered with correct delta count; second run finds 0 deltas; gap-score-7 mock triggers proposal.

Approach

Read docs/bio_competitive/biomni_parity_verification.md and forge/biomni_parity/pipeline.py for the existing manual Biomni audit; lift its scoring patterns.

Scrapers should be polite — respect robots.txt, rate-limit to 1 req/3 s, log to competitive_intel_scrape_log for debuggability.

Use WebSearch/WebFetch (deferred tools) wrapped behind scidex/forge/competitive_intel/web_client.py so swapping fetcher is trivial.

LLM grader runs at effort=thorough (Opus, xhigh) per AGENTS.md effort-tier table — this is where novel framing happens.

Idempotency: re-running on the same week is a no-op (key on (competitor, week_of)).

Dependencies

forge/biomni_parity/pipeline.py — existing parity infrastructure to extend.
scidex/atlas/artifact_commit.commit_artifact — artifact write path.
scidex/senate/governance.py:159 create_proposal — for intel_followup proposals.

Work Log

Tasks using this spec (1)

[Forge] Weekly automated SciDEX vs Biomni vs K-Dense compari

AI Tools Landscape done P86

File: q-tools-weekly-comparison-artifact_spec.md

Modified: 2026-05-01 20:13

Size: 3.2 KB