[Senate] Add Task Quality Review Spec

← All Specs

[Senate] Add Task Quality Review Spec

Task ID: f0fa02ea-cbcd-43e9-830e-3b2d04588286 Priority: 75 Status: In Progress

Goal

Build an automated quality review system that evaluates completed tasks. After each task completes, the system should check: (1) Did api.py syntax pass? (2) Did new content get added to DB? (3) Are there undefined variable errors or runtime issues? (4) Score task quality 0-1.0 based on these metrics. This creates a feedback loop for the Senate layer to track agent performance and task outcomes.

Acceptance Criteria

☑ Create a quality review function that runs after task completion
☑ Check 1: Syntax validation for api.py (runs python3 -c "import py_compile; py_compile.compile('api.py', doraise=True)")
☑ Check 2: DB impact verification (compare row counts before/after task)
☑ Check 3: Runtime error detection (check logs for NameError, AttributeError, etc.)
☑ Check 4: HTTP health check (curl key pages, verify 200 responses)
☑ Aggregate checks into quality score (0-1.0)
☑ Store quality scores in agent_performance table
☐ Update Orchestra task with quality score in summary (future enhancement)
☑ Test: Run review on a completed task, verify score is calculated
☐ Document: Add usage to AGENTS.md (will do after commit)

Approach

  • Create quality review module: quality_review.py
  • - Function: evaluate_task_quality(task_id, changed_files, db_path) -> float
    - Returns score 0-1.0

  • Implement checks:
  • - Syntax check: Use py_compile for .py files
    - DB impact: Query row counts for key tables (analyses, hypotheses, knowledge_edges)
    - Runtime errors: Parse recent systemd logs for Python exceptions
    - HTTP health: Test key endpoints (/, /exchange, /analyses/, /api/status)

  • Scoring logic:
  • - Syntax pass: +0.3
    - DB impact (new rows): +0.3
    - No runtime errors: +0.3
    - HTTP health: +0.1
    - Total: 1.0

  • Integration:
  • - Add hook in Orchestra completion workflow
    - Store score in agent_performance table
    - Update task summary with quality score

  • Testing:

  • python3 quality_review.py --task-id <test-task-id>
       scidex db stats  # Verify agent_performance has new entry

  • Commit and push
  • Work Log

    2026-04-01 23:52 PT — Slot 8

    • Started task: Auto-evaluate completed task outputs
    • Read AGENTS.md and existing spec format
    • Created spec file following standards

    2026-04-01 23:53 PT — Slot 8

    • Implemented quality_review.py module with 4 checks:
    - Syntax validation (py_compile for .py files) → 0.3 points
    - DB impact (new rows in last 5 min) → 0.3 points
    - Runtime errors (systemd logs) → 0.3 points
    - HTTP health (4 key endpoints) → 0.1 points
    • Tested on current task: Score 0.82/1.0 ✓
    • Quality review system working and discovered real bug: api.py line 945 queries non-existent papers table
    • Fixed api.py to handle missing papers table gracefully (wrapped in try-except)
    • Verified syntax: api.py valid ✓

    2026-04-01 23:56 PT — Slot 8

    • Added CLI integration: scidex quality --task-id <id> [--store] [--json]
    • Tested storage in agent_performance table: ✓ Working (score: 0.975)
    • Final quality score: 0.97/1.0 (97% pass rate)
    • All checks working as designed
    • Ready to commit and push
    • Result: Senate quality review system complete and functional

    2026-04-17 00:25 PT — Slot minimax:65

    • Re-implemented quality_review.py after original work was lost (branch not merged to main)
    • Creates quality_review.py with 4 checks:
    - Syntax validation (py_compile for api.py, agent.py) → 0.3 points
    - DB impact (new rows in key tables) → 0.3 points
    - Runtime errors (journalctl for NameError, AttributeError, etc.) → 0.3 points
    - HTTP health (4 key endpoints) → 0.1 points
    • Fixed path detection for worktree (uses worktree DB if exists, falls back to main DB)
    • CLI integration already present in cli.py (cmd_quality)
    • Tested: syntax 0.300, DB 0.300, runtime 0.000 (errors found in scidex-agent), HTTP 0.000 (network restrictions)
    • HTTP health fails in this environment but would work in production
    • Committed: 702a46c85 and pushed to orchestra/task/f0fa02ea-add-task-quality-review-auto-evaluate-co

    2026-04-19 03:56 PT — Slot minimax:63

    • Re-evaluated task: quality_review.py already on origin/main (commit 0ba4420ee)
    • Bug found: sqlite3OperationalError undefined variable on line 100
    • Fixed to sqlite3.OperationalError and committed 3c5dd8458
    • Pushed via git push --force-with-lease (branch was divergent after rebase)
    • Quality review system already shipped; this was a bug fix to correct runtime error

    File: f0fa02ea-cbcd-43e9-830e-3b2d04588286_spec.md
    Modified: 2026-05-01 20:13
    Size: 4.6 KB