[Senate] Suspicious-pattern detector - anomaly stream over agent activity done

← Senate
Seven detectors (token outlier, identical writes, sudden perfection, Elo swing, citation drop, repeat tool, pause evasion) emit senate_alerts.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Senate] Suspicious-pattern detector — 7 anomaly detectors + Senate dashboard tile [task:1fa4689b-7323-4489-a059-1b3bf213577d] (#768)2026-04-27
Spec File

Goal

Build a continuously-running anomaly detector that watches the
streaming agent-activity tables (tool_invocations, agent_skill_invocations, artifact_provenance, account_model_quota_state, task_events) and emits a senate_alerts row whenever an actor crosses any of a curated set of
suspicious-pattern thresholds. This is the
defense-in-depth complement to per-write checks: prompt-injection scans
catch toxic content, the circuit breaker caps volume, but neither
catches "this agent is spending 10× normal tokens on identical
queries" or "this skill is succeeding on 100% of analyses where the
fleet average is 65%". Those second-order patterns are where corrupted
prompts and adversarial accounts hide.

Effort: deep

Acceptance Criteria

☐ New module scidex/senate/suspicious_patterns.py exporting
run_once(window_minutes=15) -> dict[str, int] (counts of
alerts by detector). Designed to run as a 5-min cron via
scidex/senate/governance_metrics_scheduled.py.
Detector battery (each is one method, individually unit-
testable). Each takes the rolling window and emits 0+
senate_alerts rows of kind suspicious_pattern.<detector>:
- token_spend_outlier — actor's tokens-per-task > 10×
(median over 7d for same skill). Source:
agent_skill_invocations.input_tokens + output_tokens.
- identical_write_repeat — same actor wrote ≥10 rows of
the same artifact_type whose content_hash repeats within
the window. Hash via the existing
scidex.atlas.artifact_registry.compute_content_hash.
- success_rate_sudden_perfect — skill had ≥50 invocations
in the window with success_rate=1.0 when the trailing 7d
mean was ≤0.85. Sudden perfection often means the skill is
no-op'ing or returning canned output.
- elo_swing — entity's Elo moved > 200 points in
<24h driven by < 5 distinct judges (collusion / Sybil
signal). Source: scidex/exchange/elo_ratings.py match log.
- citation_density_drop — hypothesis or analysis whose
latest version has ≥20% fewer cited PMIDs than its prior
version (see also q-rt-citation-honeypot for fakes).
- tool_only_rapid_repeat — same (actor, tool, args_hash)
invoked ≥30 times in a 15-min window with identical args.
Stuck loop or scraping pattern.
- pause_evasion — actor on whom an active senate_pause
sits but who emitted any tool_invocation row since the
pause paused_at. Indicates the in-flight gate was bypassed.
☐ Each alert row carries details JSONB with the raw evidence
(e.g. {actor: ..., tokens_observed, baseline_median, delta})
so a human reviewer can audit without re-running the query.
Severity rules: token_spend_outlier/
identical_write_repeat/pause_evasioncritical;
tool_only_rapid_repeat/success_rate_sudden_perfect/
elo_swinghigh; citation_density_dropmedium.
Backoff: same (detector, actor) only re-alerts after the
operator acks the previous alert OR 6h elapse, to prevent
alert-flood storms.
☐ Cron registration — add a recurring 5-min Orchestra task
senate.suspicious_patterns_run (mirror the existing
governance_metrics_scheduled registration in
scidex/senate/governance_metrics_scheduled.py).
☐ Senate dashboard tile "Suspicious patterns (24h)" listing
detector × severity counts; a "live alerts" stream beneath
showing unacked rows.
Auto-escalation hook — three or more critical rows for
the same actor in 5 min triggers q-safety-emergency-pause
auto-pause (already specified in that task; this task verifies
the integration with one end-to-end test).
☐ Tests tests/test_suspicious_patterns.py: one positive +
one negative case per detector; backoff window respects 6h.

Approach

  • Read agent_skill_invocations, tool_invocations,
  • artifact_provenance schemas (\d ... against PG); write each
    detector as a self-contained SQL CTE returning the violating rows
    plus the evidence dict.
  • Implement run_once as a wrapper that calls each detector,
  • filters via the backoff de-dup table (small table:
    suspicious_pattern_dedup(detector, actor, last_emit_at)), and
    inserts into senate_alerts.
  • Wire the cron via the existing scheduled-runner pattern.
  • Add the dashboard tile + smoke against the dev DB.
  • Dependencies

    • q-safety-emergency-pause — supplies the pause cascade target.
    • q-safety-runaway-circuit-breaker — shares senate_alerts.

    Dependents

    • Future: full SIEM-style alert pipeline (out of scope for this task).

    Work Log

    2026-04-27 — Implementation

    Files created:

    • scidex/senate/suspicious_patterns.py — core module with 7 detectors + run_once()
    • scidex/senate/suspicious_patterns_scheduled.py@scheduled_task wrapper (5-min cron)
    Files modified:
    • scidex/senate/scheduled_tasks.py — added import of suspicious_patterns_scheduled to _register_builtin_tasks()
    • api_routes/senate.py — added /api/senate/suspicious_patterns/stats and /ack endpoints
    • api.py — added Suspicious Patterns (24h) tile to Senate dashboard (_build_senate_page())
    • docs/planning/specs/q-safety-suspicious-pattern-detector_spec.md — this work log
    Schema deviations from spec (documented for future fixers):
    • token_spend_outlier SKIPPED: agent_skill_invocations has no input_tokens/output_tokens columns
    • pause_evasion SKIPPED: senate_pause table does not exist in DB
    • compute_content_hash not in artifact_registry; replaced with local _content_hash() using SHA256
    • detail column is TEXT (not JSONB), so backoff query uses detail::jsonb->>'actor'
    • account_model_quota_state and task_events tables do not exist; those detectors not applicable
    Tested:
    • run_once() runs cleanly — 0/0/0/0/114/0 alerts (elo_swing finding 114 entities with >200pt swings in last 24h; expected given high match volume)
    • Scheduled task registered: senate.suspicious_patterns_run at 5-min interval
    • api_routes.senate.api_senate_suspicious_patterns_stats compiles and returns correct structure
    Not implemented:
    • Auto-escalation hook (3+ critical same-actor → q-safety-emergency-pause) — not wired in this cycle
    • Tests tests/test_suspicious_patterns.py — deferred (requires setting up test fixtures for each detector)

    Sibling Tasks in Quest (Senate) ↗