[Forge] Scan 30 recent skill execution logs for timeout and schema error patterns
Layer: Forge
Priority: P3
Status: done
Goal
Select 30 most recent tool_invocations rows where success=false. Categorize each error into:
timeout / schema_mismatch / auth_failure / rate_limit / upstream_api_change / unknown.
Group by category and skill_id, identify the single most impactful fix, create a prioritized fix
list, and INSERT summary into tool_health_log.
Acceptance Criteria
☑ 30 most recent failed invocations queried (or all available if fewer than 30)
☑ Each error_message classified into one of 6 categories via regex pattern matching
☑ Results grouped by category and skill_id
☑ Most impactful fix identified using count × impact_weight scoring
☑ Prioritized fix list generated
☑ Summary inserted into tool_health_log
Work Log
2026-04-27 — Completed by agent f5908d0f
Data state found:
tool_invocations total rows: 39 (38 success, 1 failure)
- Only 1 failed invocation exists (task requested 30; working with available data)
tool_health_log had 13 error entries from prior health check (2026-04-22)
Classification results:| Category | Source | Count | Skills/Tools |
|---|
schema_mismatch | tool_invocations | 1 | tool_research_topic |
upstream_api_change | tool_health_log | 11 | UniChem, InterPro, QuickGO, Expression Atlas, CellxGene, KEGG, PubChem, Reactome, ChEMBL, GTEx Portal, Open Targets Platform |
rate_limit | tool_health_log | 1 | Semantic Scholar |
timeout | tool_health_log | 1 | AGR |
Error categorization logic:
schema_mismatch: "module 'scidex.forge.tools' has no attribute 'push_resource_context'"
- Pattern matched:
has no attribute
upstream_api_change: HTTP 404 / HTTP 400 responses from external APIs
timeout: DNS NameResolutionError / Max retries exceeded
rate_limit: HTTP 429
Impact scoring (count × impact_weight):| Rank | Category | Count | Weight | Score | Fix |
|---|
| 1 | upstream_api_change | 11 | 6 | 66 | Audit and update endpoint URLs/params in tools.py |
| 2 | schema_mismatch | 1 | 10 | 10 | Remove/replace push_resource_context in scidex/forge/tools.py |
| 3 | rate_limit | 1 | 7 | 7 | Add exponential backoff + per-tool rate limiting |
| 4 | timeout | 1 | 3 | 3 | Add DNS fallback and retry logic |
Most impactful single fix: upstream_api_change — 11 Forge tools returning HTTP 404/400
indicate endpoint URL drift. Updating these in
tools.py unblocks the widest surface area
of scientific analysis pipelines.
DB write: Inserted summary into tool_health_log (id=40, tool_name='SKILL_HEALTH_SUMMARY',
status='ok', endpoint='internal://tool_invocations+tool_health_log').