[Senate] Populate task_runs.last_commit_sha on completion

← All Specs

Goal

Wire up the task_runs.last_commit_sha column so it is populated with the
on-main commit SHA whenever a task branch is squash-merged to origin/main.
This enables the waste_detector's abandonment signal
(last_commit_sha IS NULL OR last_commit_sha = '') to accurately
distinguish "ran but produced no commit" from "never ran".

Background

The last_commit_sha column was added to task_runs (Orchestra migration 020)
but is never populated — every row has last_commit_sha = ''. The waste_detector.scan() already checks for this column and falls back to 1=1 (no commit-sha filter) when the column is absent, which reduces
abandonment detection precision.

The commit SHA is computed by _do_push_main in orchestra/sync.py after a
successful squash-merge or fast-forward merge, stored in the local variable new_remote. It is not currently returned to the caller or persisted.

Acceptance Criteria

orchestra/services.py::ensure_tables adds last_commit_sha TEXT DEFAULT ''
column to task_runs (migration 020).
orchestra/sync.py::push_main returns main_sha (the new
refs/remotes/origin/main SHA) in its success dict.
orchestra/services.py::process_merge_candidate updates
task_runs.last_commit_sha with the returned main_sha after a
successful push_main, using the run_id from the merge_candidates row.
scripts/backfill_task_runs_last_commit_sha.py — idempotent, 30-day
backfill that:
- Detects whether the last_commit_sha column exists; skips gracefully if not.
- Finds task_runs rows (last 30 days, terminal status) where
last_commit_sha IS NULL OR last_commit_sha = ''.
- For each row, resolves the branch name from tasks.git_branch, runs
git rev-parse origin/<branch> in the project root to get the current
main SHA, and updates the row.
- Supports --dry-run (default) and --apply.
☐ Soft-depends on Orchestra PR #223 (adds reason= field to watchdog).

Implementation Plan

Orchestra Changes (PR to orchestra/orchestra)

File: orchestra/services.py

In ensure_tables(), after the migration 019 block (around line 465), add:

# Migration 020 — last_commit_sha: on-main SHA from squash-merge step.
if "last_commit_sha" not in task_runs_cols:
    cur.execute("ALTER TABLE task_runs ADD COLUMN last_commit_sha TEXT DEFAULT ''")

File: orchestra/sync.py

In _do_push_main(), the new_remote variable holds the resulting main SHA
after both the PR-merge path (line ~1740) and the direct-push path. Add it
to all success return dicts:

# In the PR merge success return (line ~1743):
return {
    "success": True,
    "commits_merged": commits,
    "main_sha": new_remote,          # ADD THIS
    "merge_strategy": "gh_pr_merge",
    ...
}

# In the direct-push success return (line ~1882):
return {
    "success": True,
    "commits_merged": commits,
    "main_sha": new_remote,          # ADD THIS
    "message": ...
}

Also update push_main() wrapper to pass main_sha through.

File: orchestra/services.py::process_merge_candidate

After a successful push_main (around line 6556), add:

main_sha = result.get("main_sha", "")
if main_sha and row.get("run_id"):
    try:
        conn.execute(
            "UPDATE task_runs SET last_commit_sha = ? "
            "WHERE id = ? AND (last_commit_sha IS NULL OR last_commit_sha = '')",
            (main_sha, row["run_id"]),
        )
        conn.commit()
    except Exception as e:
        logger.warning("Failed to backfill last_commit_sha for run %s: %s",
                       row["run_id"], e)

SciDEX Changes (this task)

File: scripts/backfill_task_runs_last_commit_sha.py (new)

Idempotent backfill script. See the script itself for details.

Dependencies

  • Orchestra PR #223 (watchdog reason= field) — soft dependency
  • orchestra/services.py::ensure_tables migration 020 must have run

Work Log

2026-04-28 — Investigation

  • Confirmed task_runs.last_commit_sha column does not exist in current
Orchestra DB (symlink /home/ubuntu/Orchestra/orchestra.db points to
/data/orchestra/orchestra.db which is not mounted in this environment).
  • waste_detector.py already handles missing column gracefully with
has_commit_sha flag.
  • Orchestra code is read-only in this sandbox; Orchestra changes must be
applied via a separate PR to the Orchestra repo.

2026-04-28 — Implementation (SciDEX side)

Files created:

  • scripts/backfill_task_runs_last_commit_sha.py — idempotent backfill script
that populates last_commit_sha for historical task_runs rows from the
last 30 days by reading the current main SHA for each task's git branch.
Checks for column existence before attempting updates (idempotent if column
doesn't exist yet).
  • This spec file.
Files that need Orchestra-side changes (to be applied separately):
  • orchestra/services.py — add last_commit_sha column in ensure_tables()
  • orchestra/sync.py — return main_sha from push_main()
  • orchestra/services.py::process_merge_candidate — update task_runs after
successful merge

File: ecead4e7-d198-4fd2-9bd1-299abc4de6a3_populate_task_runs_last_commit_sha_spec.md
Modified: 2026-05-01 20:13
Size: 5.2 KB