[Forge] Artifact enrichment quest — evaluation consumer

← All Specs

[Forge] Artifact enrichment quest — evaluation consumer

Task

  • ID: fbb838fb-e5aa-4515-8f30-00959622ce98
  • Type: recurring
  • Frequency: every-6h
  • Layer: Forge

Goal

Keep model, notebook, and benchmark artifacts connected to the evaluation
context they need in order to be trusted: which dataset or benchmark was used
to evaluate them, what other artifacts share their entities, and where their
provenance chain starts. Naked artifacts (no eval dataset, no provenance) are
unusable for downstream claims.

What it does

Operates in three phases per cycle:

  • Evaluation context: For each artifact missing an
  • evaluation_dataset_id / evaluation_benchmark_id, attempts to infer
    and attach the correct reference by:
    - scanning the artifact's README / notebook cells for
    dataset: / benchmark: mentions,
    - matching against the dataset_registry and benchmarks tables,
    - writing back the inferred link with confidence_score.
  • Cross-links: For pairs of artifacts that share ≥2 entities (same
  • gene, drug, pathway), inserts a row in artifact_links with edge type
    shares_entities and the list of shared entity IDs.
  • Provenance backfill: For each artifact with an empty
  • provenance_chain, walks upstream dependencies (input datasets,
    parent notebooks, generating hypotheses) and materialises the chain
    as a JSON column.

    • Idempotent: each artifact in each phase is skipped if its state is already
    filled for that phase this cycle.
    • Release as a no-op when all artifacts are fully enriched.
    • Emits agent_contributions (type=artifact_enrichment) per write.

    Success criteria

    • Fraction of artifacts with a non-null evaluation_dataset_id or
    evaluation_benchmark_id trends to 100% for artifacts that claim to be
    evaluated.
    • Cross-link graph is non-empty: every artifact with ≥2 shared entities has
    at least one shares_entities row.
    • Every artifact has a populated provenance_chain or is explicitly marked
    as having no upstream (provenance_chain = '[]').
    • Run log: artifacts scanned per phase, links inferred, cross-links created,
    provenance filled, retries.

    Quality requirements

    • No stubs: inferred evaluation links must include a real dataset/benchmark
    ID and a non-zero confidence score, not a placeholder — link to meta-quest
    quest_quality_standards_spec.md.
    • When processing ≥10 artifacts per cycle, use 3–5 parallel agents (one per
    phase or one per slice) so the three phases don't serialise.
    • Log total items processed + retries so we can detect busywork (same
    artifact revisited without state change → either idempotency bug or
    inference is truly ambiguous — escalate to a Senate task in the latter
    case).
    • INFERRED: artifact_links and provenance_chain column names follow the
    pattern in artifact_enrichment_quest_spec.md; the driver reads schema
    at runtime.

    File: fbb838fb-e5aa-4515-8f30-00959622ce98_artifact_enrichment_eval_consumer_spec.md
    Modified: 2026-05-01 20:13
    Size: 3.1 KB