When an agent produces a hypothesis, debate round, or analysis, the
end artifact shows the output but never the trail: which skills it
called (agent_skill_invocations), which evidence it consulted,
which intermediate prompt steps shaped the final text, which decisions
it made (e.g. picking persona X over Y at round 3). We have all the
raw data — every skill call is logged, every LLM call is traceable —
but no UI surfaces a single coherent timeline. Build a per-artifact
"reasoning trail" viewer that reconstructs the chain.
Effort: thorough
GET /api/atlas/reasoning-trail/{artifact_type}/{artifact_id} returns a structured timeline: {artifact: {...}, steps: [{step_n, kind, timestamp, persona, content_excerpt, skill_invocations, llm_calls, links: [{kind, target_id}]}], total_skill_calls, total_llm_tokens, total_latency_ms}.kind ∈ {skill_call, llm_call, persona_round, synthesis, decision, evidence_fetch, output_emit}.agent_skill_invocations (skill timing), debate_messages (per-round persona content), llm_call_log (LLM token + latency — create the table if it doesn't exist), artifact_provenance (decision points), evidence_for JSONB.migrations/20260428_llm_call_log.sql if llm_call_log doesn't exist: (id, artifact_id, artifact_type, persona, prompt_excerpt, response_excerpt, model, prompt_tokens, completion_tokens, latency_ms, called_at, prompt_hash). Add a @traced decorator to scidex/core/llm.py::complete() that writes to it./atlas/reasoning-trail/{artifact_type}/{artifact_id} renders the timeline with: vertical chronology, per-step expandable detail, color-coded persona swimlanes (Theorist=blue, Skeptic=red, Expert=green, Synthesizer=purple), link to skill bundle for each skill_call, link to source paper for each evidence_fetch.prompt_tokens and completion_tokens per model (rates pulled from scidex/core/llm.py::MODEL_PRICING); shown in the header.q-repro-rerun-artifact with the same skill+persona+seed inputs to verify reproducibility.tests/test_reasoning_trail.py: synthetic 5-step debate trail (3 personas + 1 synth + 1 emit) → 5 timeline steps in order; missing tables tolerated (returns partial trail with notice); HTML view renders without error.(artifact_id, called_at).llm_call_log table + decorator (≈ 60 LoC); backfill is impossible (no historical data), so trails for older artifacts will be partial — handle gracefully._build_trail(db, artifact_type, artifact_id) that runs 5 parallel queries (one per source table) and merges by timestamp.templates/atlas/reasoning_trail.html with a vertical timeline component (CSS in site/style.css).scidex/core/llm_pricing.py (table-driven) for future reuse.agent_skill_invocations table (already shipped).scidex/core/llm.py — central LLM entry point to instrument.artifact_provenance table (existing).q-viz-decision-tree-viewer — uses trail as input for synthesis-decision visualization.q-mem-error-recovery-memory — analyzes failed trails for recoverable patterns.llm_call_log table does not exist on main. reasoning_trail not yet implemented. Task is valid.llm_call_log table referenced only in this spec; reasoning_trail API/HTML not yet present. MODEL_PRICING lives in scidex/senate/daily_budget.py but is also needed in a reusable module.orchestra task list.migrations/20260428_llm_call_log.sql: llm_call_log table with indexes on (artifact_id, artifact_type), called_at, and prompt_hash. Handles missing-table case gracefully.scidex/core/llm_pricing.py: table-driven MODEL_PRICING dict + compute_llm_cost() for future reuse. Mirrors MODEL_PRICING_PER_1M from daily_budget.py._log_llm_call() to scidex/core/llm.py (after _charge_budget()). Silent no-op on failure so LLM logging never crashes LLM calls.scidex/atlas/reasoning_trail.py: ReasoningTrail dataclass + _build_trail() that queries 5 source tables and merges by timestamp into STEP_KINDS kinds. Gracefully handles missing tables.GET /api/atlas/reasoning-trail/{artifact_type}/{artifact_id} using get_db_ro() + _build_trail().GET /atlas/reasoning-trail/{artifact_type}/{artifact_id} with:tests/test_reasoning_trail.py: 9 passing tests covering structure, step kinds, persona colors, synthetic 5-step debate trail, partial-trail graceful degradation, LLM pricing, and HTML view.exchange_diversity_sampler_page: an f-string containing color:{'#81c784' if r['avg_utility'] > 0.5 else '#ef9a9a'} (unterminated conditional expression inside a Python f-string within a JavaScript context). This error exists on origin/main at line 35798 and is NOT introduced by this task. The function exchange_diversity_sampler_page is unrelated to reasoning trails.python3 -m py_compile api.py fails on the same line in both the pre-change main AND in my worktree. No additional errors introduced by this task.llm_pricing.py, reasoning_trail.py) compile and import cleanly.api.py (+214 lines), scidex/core/llm.py (+34 lines), migrations/20260428_llm_call_log.sql (new), scidex/atlas/reasoning_trail.py (new), scidex/core/llm_pricing.py (new), tests/test_reasoning_trail.py (new).git push origin HEAD for supervisor review gate + auto-merge.