> ## Continuous-process anchor
>
> This spec describes an instance of one of the retired-script themes
> documented in docs/design/retired_scripts_patterns.md. Before
> implementing, read:
>
> 1. The "Design principles for continuous processes" section of that
> atlas — every principle is load-bearing. In particular:
> - LLMs for semantic judgment; rules for syntactic validation.
> - Gap-predicate driven, not calendar-driven.
> - Idempotent + version-stamped + observable.
> - No hardcoded entity lists, keyword lists, or canonical-name tables.
> - Three surfaces: FastAPI + orchestra + MCP.
> - Progressive improvement via outcome-feedback loop.
> 2. The theme entry in the atlas matching this task's capability:
> EX1 (pick the closest from Atlas A1–A7, Agora AG1–AG5,
> Exchange EX1–EX4, Forge F1–F2, Senate S1–S8, Cross-cutting X1–X2).
> 3. If the theme is not yet rebuilt as a continuous process, follow
> docs/planning/specs/rebuild_theme_template_spec.md to scaffold it
> BEFORE doing the per-instance work.
>
> **Specific scripts named below in this spec are retired and must not
> be rebuilt as one-offs.** Implement (or extend) the corresponding
> continuous process instead.
Quest: Exchange Priority: P87 Status: open
Update hypothesis scores only when new debate/evidence/market signals justify it, and make repricing favor hypotheses with strong recent debate support while avoiding churn from tiny no-op recalculations.
This task is part of the Exchange quest (Exchange layer). It contributes to the broader goal of building out SciDEX's exchange capabilities.
Fix + Run: Found _check_last_execution() was reading from calibration_slashing (a market-slashing table with many NOT NULL columns) to gate manual re-runs, but recalibrate() never wrote a record there — the guard always returned False (never skip). Switched to driver_state(driver, key) key-value store which already exists and is used by other drivers. Added _write_run_marker(db) helper called on both exit paths (early return when stale=0, and after normal full run). CI cycle result: 0 Elo-stale hypotheses (all updated by 2026-04-27 17:00 PT run), 71 hypothesis market prices updated from debate quality signals (apply_debate_quality_batch drained full backlog). Post-fix: manual --dry-run correctly shows "Skipping — last recalibration 0.7 min ago (min 50 min)".
Fix + Run: Found find_stale_hypotheses() had an over-strict score boundary check (0.0 < score < 1.0) that silently dropped two categories of valid stale hypotheses: (1) promoted hypotheses with composite_score=1.0 (above the exclusive upper bound, but a valid correction target — clamp to 0.99), and (2) proposed hypotheses with no composite score yet but real Elo tournament data (Elo-derived initial scores via base 0.5 ± adjustment). Fixed the guard to 0.0 <= float(score) <= 1.0 and allowed score=None through; recalibrate() already defaults null scores to 0.5. After fix: dry-run found 14 stale (was 0). Live run: 13 updated (1 skipped below 0.001 noise floor), 13 price adjustments propagated. Top movers: h-var-afbf2d 0.500→0.471 (elo=1044), h-var-cf5d8c 0.500→0.523 (elo=1874), h-var-58e76a 1.000→0.990 (elo=1990), SDA-2026-04- 1.000→0.990. Debate-quality repricing: 0 repriced, 50 skipped (all current). Post-run dry-run: 0 stale. Endpoints /exchange and /api/economy/debate-signals both 200. Tests: 3/3 passed.
Result: Ran Exchange recalibration cycle. DB state: 1,671 hypotheses, 803 debate sessions (latest 2026-04-27 15:17 PT), 26 Elo-stale hypotheses (Elo last match at 16:14 PT). Elo-driven updates: 16 composite-score updates, 2 skipped (below noise floor). Debate-quality repricing: 30 updated, 70 skipped. Top Elo mover: h-var-3b982e +0.031 (elo=2008). Post-run dry-run found 0 stale hypotheses. Endpoints /exchange and /api/economy/debate-signals both 200.
Plan: Current code already contains the prior continuous-process rebuild, but this run is still actionable: dry-run found 212 hypotheses whose Elo ratings are newer than their score markers. The write run exposed a live null-price bug in Elo price propagation (market_price IS NULL reached compute_price_adjustment). I will add a targeted fallback to use the old composite score when market price is missing, cover it with a focused regression test, rerun the Exchange recalibration cycle, and verify the stale queue drains.
Result: Fixed the null-price Elo propagation bug in scidex/exchange/ci_elo_recalibration.py and added a regression test covering a hypothesis with market_price IS NULL. Ran the live Exchange recalibration cycle: 205 Elo-driven composite-score updates, 7 below-threshold Elo rows marked processed, and 35 debate-quality repricings. Post-run dry-run found 0 Elo-stale hypotheses; price_history now has 461 elo_recalibration events and 405 debate_signal events, both latest at 2026-04-27 05:57 PT. Verification: PYTHONPATH=. pytest tests/test_exchange_recalibration.py -q passed, py_compile passed for the touched modules, /exchange returned 200, and /api/economy/debate-signals returned 200.
Plan: Verified the task is still relevant on current origin/main: ci_elo_recalibration dry-run found 60 hypotheses with Elo newer than their score, while price_history shows no debate_signal repricing events since 2026-04-12 despite 714 debate sessions. I will tighten debate-quality repricing so it is gap-predicate driven (last_debate_at > last debate_signal) instead of retrying every 48 hours, keep duplicate-content variants from receiving duplicated debate boosts, add focused regression tests, then run the driver cycle and record actual DB effects.
Result: Implemented and ran the bounded Exchange recalibration cycle. Code changes make apply_debate_quality_batch() process only hypotheses whose latest qualifying debate is newer than their last debate_signal, skip non-canonical duplicate content_hash variants, and cap each run at 50 candidates. Registered elo-recalibration in the scheduled task registry and made below-threshold Elo deltas stamp last_evidence_update so they do not stay forever stale. Live DB effects: 42 Elo-driven composite-score updates with propagated price changes, 18 below-threshold Elo rows marked processed, and 43 fresh debate-quality price updates (7 skipped by noise floor). Verification: rerun dry-run found 0 Elo-stale hypotheses; pytest tests/test_exchange_recalibration.py -q passed; py_compile passed for the touched modules.
Task: Update hypothesis scores from new debate rounds
Actions:
recalibrate_scores.py:python3 recalibrate_scores.py:Result: ✅ Complete — All hypothesis scores recalculated incorporating latest debate quality, citation evidence, KG connectivity, and convergence metrics. 199 hypotheses updated successfully.
Task: Update hypothesis scores from new debate rounds (re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — Recalibration run completed. Only 7 hypotheses had delta >= 0.001, indicating scores were already well-calibrated from previous run. Average score stable at 0.466.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — Recalibration run completed successfully. Zero updates required (all deltas < 0.001 threshold), confirming scores remain stable and well-calibrated. System healthy.
Task: Update hypothesis scores from new debate rounds (recurring CI run)
Actions:
hypotheses table triggers DatabaseError: database disk image is malformed on complex multi-column queries due to corrupted FTS virtual tables. Simple SELECT with LIMIT works.find_stale_hypotheses() in scidex/exchange/ci_elo_recalibration.py:WHERE id IN (...)composite_score outside (0, 1) to prevent inf/nan propagationlast_evidence_update for all 70 hypothesesResult: ✅ Complete — 70 hypotheses updated. DB corruption workaround confirmed working. Prices propagated to market_price via LMSR-inspired model.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — Recalibration run completed successfully. Zero updates required (all deltas < 0.001 threshold), confirming scores remain stable. No new debates since last run.
Actions:
python3 recalibrate_scores.py: 0 hypotheses updated (all deltas < 0.001)Result: ✅ Complete — Scores stable. No new debates since last run.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — Recalibration run completed successfully. 278/292 hypotheses updated with significant score changes reflecting updated debate quality, citation evidence, KG connectivity, and convergence metrics. No code changes (database-only update).
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — Recalibration run completed. 24/292 hypotheses updated. Average score stable at 0.459. All scoring components stable. System healthy.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — 18/314 hypotheses updated. Elo ratings incorporated (20 hypotheses with >= 2 matches). Average score stable at 0.474-0.475. Market transactions and price history recorded.
recalibrate_scores.py with negligible deltas.Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — 315/333 hypotheses updated. Large recalibration due to significant new data: 19 new hypotheses, 117 Elo-rated hypotheses (6x increase), and new debate sessions since last run. Average score shifted from 0.487 to 0.483.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — 52/333 hypotheses updated. Incremental adjustments after earlier today's large recalibration. APOE-related hypotheses saw the largest upward movement (+0.034). Average score stable at 0.484.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 recalibrate_scores.py:Result: ✅ Complete — 161/333 hypotheses updated with significant recalibration. New debates since 2026-04-08 run triggered score updates. Elo ratings now incorporated at 15% weight alongside composite score components (debate quality, citation evidence, KG connectivity, convergence). Average score shifted to 0.521. Market transactions recorded for all updated hypotheses.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 archive/oneoff_scripts/recalibrate_scores.py:Result: ✅ Complete — 148/343 hypotheses recalibrated. Pool expansion to 343 hypotheses shifted normalization bounds, correctly deflating overscored entries. No new debate rounds since last update; recalibration driven by normalization adjustment. Average score 0.489→0.480.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 archive/oneoff_scripts/recalibrate_scores.py:Result: ✅ Complete — 21/349 hypotheses updated. Pool grew to 349, shifted normalization causing deflation of overscored entries. Market transactions and price history recorded for all 21 updates.
Task: Update hypothesis scores from new debate rounds
Actions:
python3 archive/oneoff_scripts/recalibrate_scores.py:Result: ✅ Complete — 158/355 hypotheses updated. 64 new Elo matches drove repricing; variant hypotheses with strong tournament performance boosted up to +0.127. Market transactions and price history recorded.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
python3 archive/oneoff_scripts/recalibrate_scores.py:Result: ✅ Complete — 79/373 hypotheses updated. New debate sessions and 24 new variant hypotheses drove repricing. Closed-loop tFUS/optogenetic variants boosted significantly due to strong Elo performance. Average score 0.482→0.485.
Task: Update hypothesis scores from new debate rounds (scheduled re-run)
Actions:
Result: ✅ No-op — skipped recalibration (no new debates, Elo matches, or hypotheses since last price update). Scores remain current at avg=0.486.
Task: Update hypothesis scores from new debate rounds (elo recalibration run)
Actions:
elo last_match_at > last_evidence_update (stale re: Elo)recalibrate_scores.py with incremental Elo adjustment approach:elo_adj = 0.05 * clamp((rating - 1500) / 800, -1, 1)market_dynamics.adjust_price_on_new_score BEFORE updating composite_scorerecalibrate_stale_prices: 62 prices syncedResult: ✅ Complete — 66 composite_scores updated with Elo signal; 62 market prices recalibrated. recalibrate_scores.py committed and pushed to main.
Actions:
recalibrate_scores.py:Result: No-op — scores current. New debate's hypotheses will be scored on next CI run once they receive Elo ratings from future tournament rounds.
Task: Update hypothesis scores from new debate rounds (Elo recalibration)
Actions:
elo last_match_at > last_evidence_update (stale re: Elo)python3 recalibrate_scores.py:Result: ✅ Complete — 4/595 hypotheses updated with Elo-driven recalibration. Closed-loop tFUS variant hypotheses repriced based on latest tournament performance. market_price propagated via market_dynamics for all 4 updated hypotheses. Scores remain stable (avg ≈ 0.480).
Task: Update hypothesis scores from new debate rounds (Elo recalibration)
Actions:
elo last_match_at > last_evidence_update (stale)python3 recalibrate_scores.py:sqlite3.OperationalError: database is locked in recalibrate_scores.py.adjust_price_on_new_score was called inside the UPDATE loop, creatingTask: Port ci_elo_recalibration.py from SQLite to PostgreSQL
Problem: Script was broken — _conn() issued SQLite PRAGMA commands on the PostgreSQL connection, causing psycopg.errors.SyntaxError: syntax error at or near "PRAGMA". The script couldn't run at all.
Actions:
import sqlite3, DB_PATH, _conn(), MIN_INTERVAL_MINUTES_check_last_execution() (interval now controlled by Orchestra scheduler)find_stale_hypotheses() batch-workaround (SQLite corruption guard) with a direct PostgreSQL JOIN query_best_elo() and recalibrate() to use a cursor rather than a connection, using get_db() as context manager--db CLI arg (no SQLite path needed)Result: Script runs cleanly. Found 5 stale hypotheses (Elo updated after last_evidence_update); all had adjustments < 0.001 (Elo ratings within ±15 pts of 1500 baseline), so no score changes written this cycle — correct behavior.
Result: ✅ Complete — 42/49 stale hypotheses updated with Elo-driven recalibration. 42 market prices propagated. Elite performers (elo > 2000) received +3-5% composite score boost. Bug fix: DB-locked error resolved by deferring price updates.
Task: Update hypothesis scores from new debate rounds; fix PostgreSQL compatibility bugs in ci_elo_recalibration.py
Actions:
_conn() issued SQLite PRAGMA busy_timeout and PRAGMA journal_mode=WAL on the PostgreSQL connection, causing psycopg.errors.SyntaxError: syntax error at or near "PRAGMA". The script was completely non-functional.scidex/exchange/ci_elo_recalibration.py: a. _conn(): Removed SQLite PRAGMA statements and sqlite3.Row row_factory — get_db() already returns a properly configured PGShimConnection with _PgRow row factory set.
b. _check_last_execution(): Switched sqlite_master → information_schema.tables, adjusted calibration_slashing WHERE clause to use existing reason column instead of non-existent driver column.
c. find_stale_hypotheses(): last_match (datetime) vs last_evidence_update (datetime, not str) comparison now uses proper tz-aware datetime comparison instead of datetime vs string comparison.
price_history (was at 49, max id is 73766) and market_transactions (was at 47, max id is 52476) before running recalibration — without this fix, record_price_change() would fail with UniqueViolation.Result: ✅ Complete — 94/747 hypotheses updated with Elo-driven recalibration. All PostgreSQL compatibility bugs fixed. Commit: 443306e7f. Push blocked by auth failure (GitHub token invalid for push); supervisor notified of auth issue.
Task: CI recurring run — Update hypothesis scores from new debate rounds
Actions:
ci_elo_recalibration --dry-run443306e7f). All stale hypotheses have Elo adjustments below the noise floor.iX9Y55jEg8Lg6YRSe8LsdYSEHAyVB4C5OyR322auhAI is failing with "Password authentication is not supported." The remote URL uses HTTPS with the token, but the remote direct-origin uses SSH git@github.com:. The worktree branch is ahead of origin/main by 3 commits (ff2c69c95, c91946253, 443306e7f).Result: No-op — 12 stale hypotheses found but all Elo adjustments < 0.001 noise floor. Scores current from prior run. Push blocked by GitHub authentication (token invalid for HTTPS push); auth issue needs resolution at infrastructure level.
Task: Review blocked merge attempt and harden the Exchange Elo recalibration driver before retry.
Planned actions:
scidex/exchange/ci_elo_recalibration.py.Actions:
_as_aware_utc() to normalize PostgreSQL datetime rows and legacy ISO string rows before timestamp comparisons._check_last_execution() to use the shared timestamp helper and always close its DB connection..tzinfo exists.api.py figure fallback middleware in the working tree and restored unrelated local edits from the prior failed attempt (paper_processing_pipeline.py and unrelated specs) back to this branch's HEAD.python3 -m py_compile scidex/exchange/ci_elo_recalibration.py passed.python3 -m scidex.exchange.ci_elo_recalibration --dry-run could not reach PostgreSQL in this sandbox: psycopg.OperationalError: connection is bad; scidex status also reported services/database unavailable./home/ubuntu/scidex/.git/worktrees/..., outside the writable roots. git restore failed creating index.lock with Read-only file system, so rebase/stage/commit/push cannot be completed from this worker even though file edits inside the worktree are writable.kganjam/*; SciDEX-AI/SciDEX returned 404.{
"requirements": {
"analysis": 5,
"safety": 9
},
"_stall_skip_providers": [],
"_stall_requeued_by": "codex",
"_stall_requeued_at": "2026-04-11 03:23:03",
"completion_shas": [
"d6e4d69582e1f16c25a89bc2b1ddf5d6e39d0058",
"c368b80d371327e98bdf552eeec059655714b3af",
"6b7e0a09a36c9533741ed14f6c3210893e62208e",
"8b261095c957c22f45ef729fcf3f90ca8a9cb1ce",
"a9a6f4c38868cc573cf6f8e868f6bacc57eca289",
"0e216535c287c5b688470bd3230158eeff20cbf1",
"638dd1b139c7f5567cf5d756639a3d91c23d4aa6",
"c151b0a2f48833aad22360cea1ac25312c2867a4",
"024b048bed2ea49f37ef0ab3a1e5bf6bed4465d4",
"1c01bf952c71491fdb9f2850c8236444aa56a75e",
"276d122e479b9ca6fa237d2654eeb641d77b0ef1",
"d08276b46267377147ec9f5529bad97cfd07d6d2",
"6e06ea7b91064564bc9421c5db7f3251a3ae42c9",
"797f6d245cdd2f1477161bfc6555f30d6f340787",
"61337777aca5e61716126a9c7764f30685dc2a7b",
"14aaff1a7c83892ccb43f259b59cbf31ecf9c649",
"3e6798b48f6a6a36bf1df444cb73a54f3964339e",
"86574aa5895201f9d634ecde4c1f3281c8125b58",
"b1e831f93e33626910bda5085bfd7f25ddf26927",
"3f6391e0786418991b35ec3d2d3f952939302413",
"df50463ef976af011c79431fa17446e40e3d6c0e",
"25173fab7606fefac46007da16c52a4e7c4ff3c2",
"6886e80b175788a4b4e7224c7cf38da40d0d4daa",
"652a193482a975d6261286adbaddb6cbdd86ba0e",
"ce1ec141eeea6e7a6fe9a73f2a805ad033080a05"
],
"completion_shas_checked_at": "2026-04-13T05:47:09.348822+00:00",
"completion_shas_missing": [
"2d2e5233aaefd0e498ae9c36d290ee29ecb109ef",
"e4e43aebac98fe4eb5c1fd34f8b0a3ea695aeac8",
"60a0a1775c2857346603f637f66be8134e18b4a3",
"a7e011ac5c57bbed1a52484b577790d683aed67a",
"6849f29c8ea1e343895fb3c3f719c4cd597d2f47",
"5a04529ae05a8fe829b67208befc06307edbec41",
"863690a39964adeb2b3658471f35190d634d0640"
],
"_stall_skip_at": {},
"_stall_skip_pruned_at": "2026-04-14T10:37:14.022390+00:00"
}