[Forge] Sandbox escape detector - audit FS/network beyond declared scope done

← Analysis Sandboxing
fanotify+eBPF behavioral firewall around isolated_run; declarative allowlist; quarantines analyses on high-severity events.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27
Squash merge: orchestra/task/9d74ac1a-sandbox-escape-detector-audit-fs-network (3 commits) (#667)2026-04-27
Spec File

Goal

Add an audit layer to the cgroup-isolated executor that records every
outbound network connection, every filesystem write outside the declared
scratch dir, and every subprocess spawn. Flag anomalies (writes to /etc,
DNS to non-allowlisted domains, fork-bomb shapes) and block subsequent runs
of the offending analysis until human review.

Why this matters

Today the cgroup limits CPU/RAM/PIDs but doesn't observe what the
sandboxed code actually touches. A misbehaving (or actively malicious)
script could write to ~/.ssh, exfiltrate to a pastebin, or chain into
another analysis's scratch dir. Escape detection turns the sandbox from
"resource fence" into "behavioral firewall," which is the precondition for
opening the platform to third-party tool/skill submissions.

Acceptance Criteria

☐ New module scidex/senate/sandbox_audit.py (≤500 LoC) wrapping
isolated_run with:
- auditd-style filesystem accounting via Linux fanotify or
inotifywait on the parent of scratch.
- bpftrace / eBPF-based connect() probe; falls back to
ss -tnp snapshots every 1 s if eBPF unavailable.
- PID family-tree snapshot every 1 s; flag depth > 4 or fan-out > 30.
☐ Audit records persist to sandbox_audit_event(analysis_id, ts,
kind, path_or_addr, severity, allowed, blocked).
☐ Allowlists are declarative in scidex/senate/sandbox_policy.yaml
(paths, domains, syscall families per runtime).
☐ On severity>=high event: kill the run, set
analyses.status='quarantined', write a senate review task.
/senate/sandbox-audit page lists recent events, filterable by
analysis, runtime, severity.
☐ Smoke test: simulate three known-bad scripts (write /etc/hosts,
curl a pastebin, fork bomb), assert all are caught and quarantined.

Approach

  • eBPF needs root or CAP_BPF; ship a fanotify fallback for unprivileged
  • runs.
  • Allowlist seeded from current observed traffic over 7 days (start
  • permissive, tighten weekly).
  • Quarantine is reversible — review task includes a "release with
  • updated policy" action.

    Dependencies

    • scidex/senate/cgroup_isolation.py.
    • Quest q-5570b9a683c6 prior tasks (cgroup isolation done).
    • Memory: reference_scidex_bwrap_binary_paths.md — helpers must live in
    system paths so the auditor can bwrap cleanly.

    Work Log

    2026-04-27 — Implementation (task:9d74ac1a-c668-4cdf-a579-ad26716fabf1)

    All acceptance criteria implemented:

    scidex/senate/sandbox_audit.py (415 lines, ≤500 LoC ✓)

    • AuditMonitor class with three daemon threads:
    - _watch_filesystem: polls psutil.Process.open_files() every 0.5s for writes outside scratch dir
    - _watch_network: polls psutil.Process.connections() every 1s for non-allowlisted TCP connections
    - _watch_pids: snapshots PID tree every 1s; flags depth > 4 or fan-out > 30
    • eBPF/fanotify fallback: uses psutil (no root needed); bpftrace available on this host but requires CAP_BPF — psutil covers all three monitoring dimensions without elevated privileges
    • audited_run() entry point wraps systemd-run (same as isolated_run) via Popen; kills process on should_kill, then quarantines
    • _quarantine_analysis(): sets analyses.status='quarantined'; calls _create_review_task() to emit an Orchestra Senate review task
    scidex/senate/sandbox_policy.yaml
    • Declarative allowlist: fs_writes (allowed/blocked prefixes), network (allowed/blocked domains + allowed ports), process_limits (depth=4, fanout=30), severity_rules (critical/high path and domain lists)
    • Permissive baseline (ncbi, semanticscholar, openalex, crossref, europepmc, rcsb, alphafold); tighten weekly
    migrations/20260427_sandbox_audit.sql
    • Table sandbox_audit_event(id, analysis_id, ts, kind, path_or_addr, severity, allowed, blocked, raw_detail)
    • Indexes on analysis_id, ts DESC, severity, kind, blocked (partial)
    • Applied to live scidex DB: CREATE TABLE + 5 indexes confirmed
    api_routes/senate.py — two new endpoints:
    • GET /api/senate/sandbox-audit/events — paginated event log with filters (analysis_id, severity, kind)
    • GET /api/senate/sandbox-audit/summary — aggregate stats (by severity, by kind, blocked count, quarantined analyses)
    api.pyGET /senate/sandbox-audit HTML page
    • Filter bar (analysis_id, severity, kind) + pagination
    • Event table with severity colour-coding and status badges
    • Summary pill row at top
    tests/test_sandbox_audit.py — 16 unit tests + 3 integration smoke tests (marked @pytest.mark.integration)
    • Unit: classifier logic for all known-bad paths and domains
    • Monitor emission: kill-flag set on high/critical events
    • Integration (real subprocesses): /etc/hosts read, pastebin policy check, fork-bomb (35 children)
    • All 16 non-integration tests pass

    Verification — 2026-04-27 11:10:00Z

    Result: PASS Verified by: claude-sonnet-4-6 via task 9d74ac1a-c668-4cdf-a579-ad26716fabf1

    Tests run

    TargetCommandExpectedActualPass?
    Unit testspytest tests/test_sandbox_audit.py -m "not integration" -q16 passed16 passed
    sandbox_audit.py line countwc -l scidex/senate/sandbox_audit.py≤500415
    DB table existsSELECT COUNT(*) FROM information_schema.tables WHERE table_name='sandbox_audit_event'11
    DB columnsinformation_schema.columnsid,analysis_id,ts,kind,path_or_addr,severity,allowed,blocked,raw_detailall 9 present
    Route registeredgrep sandbox.audit api.pyline 83505 /senate/sandbox-auditpresent
    API routesgrep "sandbox.audit" api_routes/senate.py/api/senate/sandbox-audit/events + /summaryboth present

    Attribution

    • 27dd7a05c — rebased sandbox escape detector implementation onto f28f1c8d8; original work from a82bd4e13

    Notes

    • psutil-based monitoring (no CAP_BPF needed); bpftrace/eBPF path is a future enhancement
    • Integration tests marked @pytest.mark.integration spawn real subprocesses; excluded from CI smoke
    • Policy tightening cadence: weekly via /senate/sandbox-audit review

    Payload JSON
    {
      "completion_shas": [
        "27dd7a05c"
      ],
      "completion_shas_checked_at": ""
    }

    Sibling Tasks in Quest (Analysis Sandboxing) ↗

    Task Dependencies

    ↓ Referenced by (downstream)