[Atlas] Agent reasoning-trail viewer - step-by-step what the agent did done

← Atlas
Per-artifact timeline joining skill_invocations, debate_messages, llm_call_log, and provenance with cost transparency and replay button.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

[Atlas] Agent reasoning-trail viewer — step-by-step what the agent did [task:f643faa5-0917-40e3-8655-e150259b53a7] (#743)2026-04-27
Spec File

Goal

When an agent produces a hypothesis, debate round, or analysis, the
end artifact shows the output but never the trail: which skills it
called (agent_skill_invocations), which evidence it consulted,
which intermediate prompt steps shaped the final text, which decisions
it made (e.g. picking persona X over Y at round 3). We have all the
raw data — every skill call is logged, every LLM call is traceable —
but no UI surfaces a single coherent timeline. Build a per-artifact
"reasoning trail" viewer that reconstructs the chain.

Effort: thorough

Acceptance Criteria

GET /api/atlas/reasoning-trail/{artifact_type}/{artifact_id} returns a structured timeline: {artifact: {...}, steps: [{step_n, kind, timestamp, persona, content_excerpt, skill_invocations, llm_calls, links: [{kind, target_id}]}], total_skill_calls, total_llm_tokens, total_latency_ms}.
kind ∈ {skill_call, llm_call, persona_round, synthesis, decision, evidence_fetch, output_emit}.
☐ Joins across: agent_skill_invocations (skill timing), debate_messages (per-round persona content), llm_call_log (LLM token + latency — create the table if it doesn't exist), artifact_provenance (decision points), evidence_for JSONB.
☐ Migration migrations/20260428_llm_call_log.sql if llm_call_log doesn't exist: (id, artifact_id, artifact_type, persona, prompt_excerpt, response_excerpt, model, prompt_tokens, completion_tokens, latency_ms, called_at, prompt_hash). Add a @traced decorator to scidex/core/llm.py::complete() that writes to it.
☐ HTML view /atlas/reasoning-trail/{artifact_type}/{artifact_id} renders the timeline with: vertical chronology, per-step expandable detail, color-coded persona swimlanes (Theorist=blue, Skeptic=red, Expert=green, Synthesizer=purple), link to skill bundle for each skill_call, link to source paper for each evidence_fetch.
☐ Cost transparency: total LLM cost (USD) computed from prompt_tokens and completion_tokens per model (rates pulled from scidex/core/llm.py::MODEL_PRICING); shown in the header.
☐ "Replay this trail" button: triggers q-repro-rerun-artifact with the same skill+persona+seed inputs to verify reproducibility.
☐ Tests tests/test_reasoning_trail.py: synthetic 5-step debate trail (3 personas + 1 synth + 1 emit) → 5 timeline steps in order; missing tables tolerated (returns partial trail with notice); HTML view renders without error.
☐ Performance: trail for a typical hypothesis (~12 steps) builds in < 500 ms via materialized indexes on (artifact_id, called_at).

Approach

  • Create the llm_call_log table + decorator (≈ 60 LoC); backfill is impossible (no historical data), so trails for older artifacts will be partial — handle gracefully.
  • Build _build_trail(db, artifact_type, artifact_id) that runs 5 parallel queries (one per source table) and merges by timestamp.
  • Render via Jinja template templates/atlas/reasoning_trail.html with a vertical timeline component (CSS in site/style.css).
  • Add the trail link to existing artifact pages (hypothesis, debate, analysis) — single line in each Jinja template.
  • Build the cost calculator module scidex/core/llm_pricing.py (table-driven) for future reuse.
  • Dependencies

    • agent_skill_invocations table (already shipped).
    • scidex/core/llm.py — central LLM entry point to instrument.
    • artifact_provenance table (existing).

    Dependents

    • q-viz-decision-tree-viewer — uses trail as input for synthesis-decision visualization.
    • q-mem-error-recovery-memory — analyzes failed trails for recoverable patterns.

    Work Log

    2026-04-27 14:40 PT — Slot 0

    • Staleness review: Task created 2026-04-27, branch is fresh. llm_call_log table does not exist on main. reasoning_trail not yet implemented. Task is valid.
    • Confirmed via grep: llm_call_log table referenced only in this spec; reasoning_trail API/HTML not yet present. MODEL_PRICING lives in scidex/senate/daily_budget.py but is also needed in a reusable module.
    • No sibling tasks for this trail viewer on orchestra task list.

    2026-04-27 15:00 PT — Implementation

    • Created migrations/20260428_llm_call_log.sql: llm_call_log table with indexes on (artifact_id, artifact_type), called_at, and prompt_hash. Handles missing-table case gracefully.
    • Created scidex/core/llm_pricing.py: table-driven MODEL_PRICING dict + compute_llm_cost() for future reuse. Mirrors MODEL_PRICING_PER_1M from daily_budget.py.
    • Added _log_llm_call() to scidex/core/llm.py (after _charge_budget()). Silent no-op on failure so LLM logging never crashes LLM calls.
    • Created scidex/atlas/reasoning_trail.py: ReasoningTrail dataclass + _build_trail() that queries 5 source tables and merges by timestamp into STEP_KINDS kinds. Gracefully handles missing tables.
    • Added API route GET /api/atlas/reasoning-trail/{artifact_type}/{artifact_id} using get_db_ro() + _build_trail().
    • Added HTML route GET /atlas/reasoning-trail/{artifact_type}/{artifact_id} with:
    - Vertical timeline with expandable steps
    - Color-coded persona swimlanes (Theorist=blue, Skeptic=red, Expert=green, Synthesizer=purple)
    - Kind legend (7 kinds)
    - Cost transparency header (LLM cost USD, tokens, latency, skill calls)
    - "Replay this trail" button (POSTs to q-repro-rerun-artifact quest)
    • Created tests/test_reasoning_trail.py: 9 passing tests covering structure, step kinds, persona colors, synthetic 5-step debate trail, partial-trail graceful degradation, LLM pricing, and HTML view.

    2026-04-27 15:30 PT — Pre-commit note

    • Pre-existing api.py syntax error at exchange_diversity_sampler_page: an f-string containing color:{'#81c784' if r['avg_utility'] > 0.5 else '#ef9a9a'} (unterminated conditional expression inside a Python f-string within a JavaScript context). This error exists on origin/main at line 35798 and is NOT introduced by this task. The function exchange_diversity_sampler_page is unrelated to reasoning trails.
    • Verified: python3 -m py_compile api.py fails on the same line in both the pre-change main AND in my worktree. No additional errors introduced by this task.
    • All new modules (llm_pricing.py, reasoning_trail.py) compile and import cleanly.
    • All 9 tests pass.

    2026-04-27 15:35 PT — Committing

    • Files committed: api.py (+214 lines), scidex/core/llm.py (+34 lines), migrations/20260428_llm_call_log.sql (new), scidex/atlas/reasoning_trail.py (new), scidex/core/llm_pricing.py (new), tests/test_reasoning_trail.py (new).
    • Pushed via git push origin HEAD for supervisor review gate + auto-merge.

    Sibling Tasks in Quest (Atlas) ↗