Build a continuously-running anomaly detector that watches the
streaming agent-activity tables (tool_invocations,
agent_skill_invocations, artifact_provenance,
account_model_quota_state, task_events) and emits a
senate_alerts row whenever an actor crosses any of a curated set of
suspicious-pattern thresholds. This is the
defense-in-depth complement to per-write checks: prompt-injection scans
catch toxic content, the circuit breaker caps volume, but neither
catches "this agent is spending 10× normal tokens on identical
queries" or "this skill is succeeding on 100% of analyses where the
fleet average is 65%". Those second-order patterns are where corrupted
prompts and adversarial accounts hide.
Effort: deep
scidex/senate/suspicious_patterns.py exportingrun_once(window_minutes=15) -> dict[str, int] (counts ofscidex/senate/governance_metrics_scheduled.py.
senate_alerts rows of kind suspicious_pattern.<detector>:token_spend_outlier — actor's tokens-per-task > 10×(median over 7d for same skill). Source:agent_skill_invocations.input_tokens + output_tokens.identical_write_repeat — same actor wrote ≥10 rows ofartifact_type whose content_hash repeats withinscidex.atlas.artifact_registry.compute_content_hash.success_rate_sudden_perfect — skill had ≥50 invocationselo_swing — entity's Elo moved > 200 points in<24h driven by < 5 distinct judges (collusion / Sybilscidex/exchange/elo_ratings.py match log.citation_density_drop — hypothesis or analysis whoseq-rt-citation-honeypot for fakes).tool_only_rapid_repeat — same (actor, tool, args_hash)pause_evasion — actor on whom an active senate_pausetool_invocation row since thepaused_at. Indicates the in-flight gate was bypassed.
details JSONB with the raw evidence{actor: ..., tokens_observed, baseline_median, delta})token_spend_outlier/identical_write_repeat/pause_evasion → critical;tool_only_rapid_repeat/success_rate_sudden_perfect/elo_swing → high; citation_density_drop → medium.
(detector, actor) only re-alerts after thesenate.suspicious_patterns_run (mirror the existinggovernance_metrics_scheduled registration inscidex/senate/governance_metrics_scheduled.py).
critical rows forq-safety-emergency-pausetests/test_suspicious_patterns.py: one positive +agent_skill_invocations, tool_invocations,artifact_provenance schemas (\d ... against PG); write eachrun_once as a wrapper that calls each detector,suspicious_pattern_dedup(detector, actor, last_emit_at)), andsenate_alerts.
q-safety-emergency-pause — supplies the pause cascade target.q-safety-runaway-circuit-breaker — shares senate_alerts.Files created:
scidex/senate/suspicious_patterns.py — core module with 7 detectors + run_once()scidex/senate/suspicious_patterns_scheduled.py — @scheduled_task wrapper (5-min cron)scidex/senate/scheduled_tasks.py — added import of suspicious_patterns_scheduled to _register_builtin_tasks()api_routes/senate.py — added /api/senate/suspicious_patterns/stats and /ack endpointsapi.py — added Suspicious Patterns (24h) tile to Senate dashboard (_build_senate_page())docs/planning/specs/q-safety-suspicious-pattern-detector_spec.md — this work logtoken_spend_outlier SKIPPED: agent_skill_invocations has no input_tokens/output_tokens columnspause_evasion SKIPPED: senate_pause table does not exist in DBcompute_content_hash not in artifact_registry; replaced with local _content_hash() using SHA256detail column is TEXT (not JSONB), so backoff query uses detail::jsonb->>'actor'account_model_quota_state and task_events tables do not exist; those detectors not applicablerun_once() runs cleanly — 0/0/0/0/114/0 alerts (elo_swing finding 114 entities with >200pt swings in last 24h; expected given high match volume)senate.suspicious_patterns_run at 5-min intervalapi_routes.senate.api_senate_suspicious_patterns_stats compiles and returns correct structureq-safety-emergency-pause) — not wired in this cycletests/test_suspicious_patterns.py — deferred (requires setting up test fixtures for each detector)