[Forge] Scan 30 recent skill execution logs for timeout and schema error patterns

← All Specs

[Forge] Scan 30 recent skill execution logs for timeout and schema error patterns

Layer: Forge Priority: P3 Status: done

Goal

Select 30 most recent tool_invocations rows where success=false. Categorize each error into: timeout / schema_mismatch / auth_failure / rate_limit / upstream_api_change / unknown.
Group by category and skill_id, identify the single most impactful fix, create a prioritized fix
list, and INSERT summary into tool_health_log.

Acceptance Criteria

☑ 30 most recent failed invocations queried (or all available if fewer than 30)
☑ Each error_message classified into one of 6 categories via regex pattern matching
☑ Results grouped by category and skill_id
☑ Most impactful fix identified using count × impact_weight scoring
☑ Prioritized fix list generated
☑ Summary inserted into tool_health_log

Work Log

2026-04-27 — Completed by agent f5908d0f

Data state found:

  • tool_invocations total rows: 39 (38 success, 1 failure)
  • Only 1 failed invocation exists (task requested 30; working with available data)
  • tool_health_log had 13 error entries from prior health check (2026-04-22)
Classification results:

CategorySourceCountSkills/Tools
schema_mismatchtool_invocations1tool_research_topic
upstream_api_changetool_health_log11UniChem, InterPro, QuickGO, Expression Atlas, CellxGene, KEGG, PubChem, Reactome, ChEMBL, GTEx Portal, Open Targets Platform
rate_limittool_health_log1Semantic Scholar
timeouttool_health_log1AGR
Error categorization logic:
  • schema_mismatch: "module 'scidex.forge.tools' has no attribute 'push_resource_context'"
- Pattern matched: has no attribute
  • upstream_api_change: HTTP 404 / HTTP 400 responses from external APIs
  • timeout: DNS NameResolutionError / Max retries exceeded
  • rate_limit: HTTP 429
Impact scoring (count × impact_weight):

RankCategoryCountWeightScoreFix
1upstream_api_change11666Audit and update endpoint URLs/params in tools.py
2schema_mismatch11010Remove/replace push_resource_context in scidex/forge/tools.py
3rate_limit177Add exponential backoff + per-tool rate limiting
4timeout133Add DNS fallback and retry logic
Most impactful single fix: upstream_api_change — 11 Forge tools returning HTTP 404/400
indicate endpoint URL drift. Updating these in tools.py unblocks the widest surface area
of scientific analysis pipelines.

DB write: Inserted summary into tool_health_log (id=40, tool_name='SKILL_HEALTH_SUMMARY',
status='ok', endpoint='internal://tool_invocations+tool_health_log').

File: f5908d0f_forge_scan_skill_execution_logs_spec.md
Modified: 2026-05-01 20:13
Size: 3.1 KB