Spec: Abandoned-Run Watchdog (every 1h)

← All Specs

Spec: Abandoned-Run Watchdog (every 1h)

Task ID: 2d657cbd-2019-40c6-b825-01a8e12b7b2f Legacy ID: f29e885e-e216-41f8-9289-7e10a07c63f4 Layer: Senate Type: recurring (every-1h) Quest: 58079891-7a5

Goal

Detect tasks stuck in repeated-abandonment loops and take corrective action every hour:

  • Identify stuck tasks — consecutive abandon streak ≥ 2 (configurable via --abandoned-streak)
  • Identify flapping tasks — abandon ratio ≥ 50% over at least 2 runs
  • Auto-requeue stuck tasks — one attempt per watchdog cycle, skip rate-limited last runs
  • Create Senate triage tasks — for flapping tasks and tasks stuck across 2 consecutive cycles
  • Script

    scripts/abandoned_run_audit.py

    python3 scripts/abandoned_run_audit.py \
        --limit 1000 \
        --abandoned-streak 2 \
        --output logs/abandoned-run-audit-latest.json

    The script persists output to logs/abandoned-run-audit-latest.json (gitignored) for
    cross-cycle tracking. It uses the Orchestra REST API at http://localhost:8100 via the
    dashboard runs endpoint — the SQLite orchestra.db symlink is broken on this host, so all
    Orchestra access goes through the HTTP API.

    Implementation Notes

    • Data source: GET /dashboard/runs/page?project=SciDEX&limit=100 (paginated)
    — runs are ordered newest-first; consecutive streak counts from the head
    • Requeue: POST /api/tasks/{id}/requeue with ORCHESTRA_WEB_TOKEN
    • Triage tasks: POST /api/tasks — Orchestra deduplication prevents repeated creation
    • Cross-cycle detection: previous cycle's stuck_tasks list from the JSON output file

    Output JSON Keys

    KeyDescription
    stuck_tasksTasks with consecutive abandon streak ≥ threshold
    flapping_tasksTasks with abandon ratio ≥ 0.5 and ≥ 2 total runs
    stuck_across_2_cyclesTasks present in stuck_tasks in both current and previous cycle
    requeue_resultsPer-task requeue outcome
    triage_resultsPer-task triage task creation outcome

    Acceptance Criteria

    ☑ Script exists at scripts/abandoned_run_audit.py
    ☑ Runs without error on the production host
    ☑ Produces valid JSON at --output path
    ☑ Requeues stuck tasks via Orchestra REST API
    ☑ Creates Senate triage tasks for flapping tasks (deduplication-safe)
    ☑ Tracks cross-cycle persistence for escalation

    Work Log

    2026-04-28 — First implementation (task 2d657cbd)

    Created scripts/abandoned_run_audit.py. The orchestra SQLite database symlink
    (/home/ubuntu/Orchestra/orchestra.db → /data/orchestra/orchestra.db) points to a
    non-existent path; can_open_db_locally() returns False. The Orchestra REST API at port
    8100 is fully operational and used for all data access and mutations.

    First cycle results:

    • 1000 runs analysed across 454 tasks
    • 12 stuck tasks (streak ≥ 2) — all 12 successfully requeued
    • 38 flapping tasks (ratio ≥ 50%)
    • 12 stuck across 2 cycles (detected in same run because dry-run preceded real run)
    • 24 new Senate triage tasks created; 14 duplicates skipped by Orchestra deduplication

    Notable chronic failures:
    • d00d502f [Atlas] Per-disease landing page... — streak=10, ratio=91%
    • ae9d446b [Forge] Power-calc service... — streak=6, ratio=86%
    • bcd6601f [Senate] Runaway-agent circuit breaker... — ratio=79% over 14 runs

    2026-04-28 — Cycle 2 (task 2d657cbd, run 10e72c3d)

    Cherry-picked script from prior orphaned commit (834e9539f); branch had been reset to main.

    Cycle 2 results:

    • 1000 runs analysed across 455 tasks
    • 10 stuck tasks (streak ≥ 2) — all 10 successfully requeued
    • 34 flapping tasks (ratio ≥ 50%)
    • 0 stuck across 2 cycles (no previous JSON output survived worktree reset)
    • 20 new Senate triage tasks created; 14 duplicates skipped by Orchestra deduplication

    Chronic failures (streak ≥ 8):
    • bb8579d9 [Agora] Score 20 unscored hypotheses... — streak=12, ratio=100% over 12 runs
    • 61997c83 [Exchange] Create 10 challenges... — streak=11, ratio=100%
    • 103d1b8a [Agora] Run target debates... — streak=11, ratio=100%
    • d3e488e6 [Exchange] Calibrate liquidity bands... — streak=11, ratio=100%
    • ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=11, ratio=100%

    2026-04-28 — Cycle 3 (task 2d657cbd)

    Cycle 3 results:

    • 1000 runs analysed across 456 tasks
    • 10 stuck tasks (streak ≥ 2) — all 10 successfully requeued
    • 34 flapping tasks (ratio ≥ 50%)
    • 10 stuck across 2 cycles
    • 34 triage tasks: all already existed (deduplication by Orchestra; no new triage tasks needed)

    Chronic failures persist from cycle 2 (streak ≥ 8):
    • bb8579d9 [Agora] Score 20 unscored hypotheses... — streak=12, ratio=100%
    • 61997c83 [Exchange] Create 10 challenges... — streak=11, ratio=100%
    • 103d1b8a [Agora] Run target debates... — streak=11, ratio=100%
    • d3e488e6 [Exchange] Calibrate liquidity bands... — streak=11, ratio=100%
    • ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=11, ratio=100%

    2026-04-28 — Cycle 4 (task 2d657cbd, retry after merge fix)

    Cycle 4 results:

    • 1000 runs analysed across 454 tasks
    • 11 stuck tasks (streak ≥ 2) — all 11 successfully requeued
    • 33 flapping tasks (ratio ≥ 50%)
    • 10 stuck across 2 cycles
    • 33 triage tasks: all already existed (duplicates skipped)
    • Branch rebased onto current origin/main to drop unrelated api.py/disease_curated_content.py diff

    Chronic failures persist (streak ≥ 8):
    • bb8579d9 [Agora] Score 20 unscored hypotheses... — streak=12, ratio=100%
    • 61997c83 [Exchange] Create 10 challenges... — streak=11, ratio=100%
    • 103d1b8a [Agora] Run target debates... — streak=11, ratio=100%
    • d3e488e6 [Exchange] Calibrate liquidity bands... — streak=11, ratio=100%
    • ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=11, ratio=100%
    • ae9d446b [Forge] Power-calc service... — streak=8, ratio=89%

    2026-04-28 — Cycle 5 (task 2d657cbd, post-rebase clean run)

    Cycle 5 results:

    • 1000 runs analysed across 455 tasks
    • 10 stuck tasks (streak ≥ 2) — all 10 successfully requeued
    • 27 flapping tasks (ratio ≥ 50%)
    • 10 stuck across 2 cycles
    • 27 triage tasks: all already existed (0 new; deduplication by Orchestra)
    • Branch hard-reset to origin/main (squash merge 10e000f2d already contained cycles 1–4)

    Chronic failures persist (streak ≥ 8):
    • 61997c83 [Exchange] Create 10 challenges... — streak=12, ratio=100%
    • d3e488e6 [Exchange] Calibrate liquidity bands... — streak=12, ratio=100%
    • ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=12, ratio=100%
    • bb8579d9 [Agora] Score 20 unscored hypotheses... — streak=12, ratio=100%
    • 66c2ea22 [Agora] Add PubMed evidence to 20 hypotheses... — streak=11, ratio=100%
    • d5e4edc1 [Senate] Triage 25 pending governance decisions — streak=11, ratio=100%
    • 103d1b8a [Agora] Run target debates... — streak=11, ratio=100%
    • 994f144f [Agora] Run debates for 3 analyses... — streak=10, ratio=100%
    • 1ae2bf61 [Senate] Assign content owners for 50 artifacts... — streak=10, ratio=100%
    • ae9d446b [Forge] Power-calc service... — streak=8, ratio=89%

    2026-04-28 — Cycle 6 (task 2d657cbd)

    Cycle 6 results:

    • 1000 runs analysed across 460 tasks
    • 8 stuck tasks (streak ≥ 2) — all 8 successfully requeued
    • 25 flapping tasks (ratio ≥ 50%)
    • 0 stuck across 2 cycles (no previous JSON from prior worktree cycle)
    • 5 new Senate triage tasks created; 20 duplicates skipped

    New triage tasks (previously untracked):
    • d00d502f [Atlas] Per-disease landing page... — ratio=83% over 12 runs
    • bcd6601f [Senate] Runaway-agent circuit breaker... — ratio=79% over 14 runs
    • f1e2d3c8 [Watchdog] Fix: [Exchange] Calibrate liquidity bands... — ratio=50% over 2 runs
    • b78a51ef [Forge] Rate-limit-aware tool calls inside the sandbox... — ratio=50% over 2 runs
    • 0b0ea75a [Agora] Persona disagreement scoreboard... — ratio=50% over 2 runs

    Chronic failures persist (streak ≥ 9):
    • 61997c83 [Exchange] Create 10 challenges... — streak=12, ratio=100%
    • d3e488e6 [Exchange] Calibrate liquidity bands... — streak=12, ratio=100%
    • ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=12, ratio=100%
    • 66c2ea22 [Agora] Add PubMed evidence to 20 hypotheses... — streak=11, ratio=100%
    • 103d1b8a [Agora] Run target debates... — streak=11, ratio=100%
    • 994f144f [Agora] Run debates for 3 analyses... — streak=10, ratio=100%
    • 1ae2bf61 [Senate] Assign content owners for 50 artifacts... — streak=10, ratio=100%
    • ae9d446b [Forge] Power-calc service... — streak=9, ratio=90%

    2026-04-28 — Cycle 7 (task 2d657cbd)

    Cycle 7 results:

    • 1000 runs analysed across 478 tasks
    • 10 stuck tasks (streak ≥ 2) — all 10 successfully requeued
    • 15 flapping tasks (ratio ≥ 50%)
    • 0 stuck across 2 cycles (no previous JSON from prior worktree)
    • 4 new Senate triage tasks created; 11 duplicates skipped

    New triage tasks:
    • ae9d446b [Forge] Power-calc service... — ratio=85% over 13 runs
    • f1e2d3c8 [Watchdog] Fix: [Exchange] Calibrate liquidity bands... — ratio=50% over 2 runs
    • b09c92f4 [Agora] Hypothesis generation: 10 new hypotheses on lysosomal... — ratio=50% over 2 runs
    • 334f2837 [Atlas] Deep citation enrichment: ai-tool-labdao wiki page... — ratio=50% over 2 runs

    Chronic failures persist (streak ≥ 10):
    • d5e4edc1 [Senate] Triage 25 pending governance decisions — streak=12, ratio=86%
    • 61997c83 [Exchange] Create 10 challenges... — streak=12, ratio=100%
    • d3e488e6 [Exchange] Calibrate liquidity bands... — streak=12, ratio=100%
    • ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=12, ratio=100%
    • 66c2ea22 [Agora] Add PubMed evidence to 20 hypotheses... — streak=11, ratio=100%
    • 103d1b8a [Agora] Run target debates... — streak=11, ratio=100%
    • 994f144f [Agora] Run debates for 3 analyses... — streak=10, ratio=100%
    • 1ae2bf61 [Senate] Assign content owners for 50 artifacts... — streak=10, ratio=100%

    2026-04-29 — Cycle 8 (task 2d657cbd)

    Cycle 8 results:

    • 1000 runs analysed across 451 tasks
    • 2 stuck tasks (streak ≥ 2) — both successfully requeued:
    - 8ebfce57 [Atlas] Complete KG edge extraction... — streak=4, ratio=56%
    - ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=2, ratio=100%
    • 11 flapping tasks (ratio ≥ 50%)
    • 0 stuck across 2 cycles (no previous JSON from prior worktree)
    • 6 new Senate triage tasks created; 5 duplicates skipped:
    - New: ae9d446b, d3e488e6, 2f7e1600, a608f058, 150d896d, 14df1380
    - Duplicates: ab0ddc65, 03094ddf, 8ebfce57, 994f144f, f1e2d3c8

    Notable chronic failures (streak ≥ 4):

    • 8ebfce57 [Atlas] Complete KG edge extraction from remaining ~760 debate sessions — streak=4, ratio=56%
    • ab0ddc65 [Senate] Triage 9 failed quality gate results — streak=2, ratio=100%

    2026-04-29 — Cycle 9 (task 2d657cbd)

    Cycle 9 results:

    • 1000 runs analysed across 433 tasks
    • 2 stuck tasks (streak ≥ 2) — both successfully requeued:
    - 31c93e95 [Forge] Execute: testes-gonadal RNA-seq experiment — streak=6, ratio=86%
    - 8ebfce57 [Atlas] Complete KG edge extraction from remaining ~760 debate sessions — streak=5, ratio=58%
    • 8 flapping tasks (ratio ≥ 50%)
    • 0 stuck across 2 cycles (no previous JSON from prior worktree)
    • 4 new Senate triage tasks created; 4 duplicates skipped

    New triage tasks created:
    • ae9d446b [Forge] Power-calc service... — ratio=67% over 3 runs
    • 2f7e1600 [Agora] Add counter-evidence reviews to 10 hypotheses... — ratio=61% over 18 runs
    • a608f058 [Forge] Triage 50 failed tool calls by skill and error type... — ratio=53% over 19 runs
    • 150d896d [Watchdog] Fix: [Agora] Add PubMed evidence to 20 hypotheses... — ratio=50% over 2 runs

    File: f29e885e-e216-41f8-9289-7e10a07c63f4_abandoned_run_watchdog_spec.md
    Modified: 2026-05-01 20:13
    Size: 11.8 KB