[Forge] Artifact enrichment quest — evaluation consumer
Task
- ID: fbb838fb-e5aa-4515-8f30-00959622ce98
- Type: recurring
- Frequency: every-6h
- Layer: Forge
Goal
Keep model, notebook, and benchmark artifacts connected to the evaluation
context they need in order to be trusted: which dataset or benchmark was used
to evaluate them, what other artifacts share their entities, and where their
provenance chain starts. Naked artifacts (no eval dataset, no provenance) are
unusable for downstream claims.
What it does
Operates in three phases per cycle:
Evaluation context: For each artifact missing an
evaluation_dataset_id /
evaluation_benchmark_id, attempts to infer
and attach the correct reference by:
- scanning the artifact's README / notebook cells for
dataset: /
benchmark: mentions,
- matching against the dataset_registry and benchmarks tables,
- writing back the inferred link with
confidence_score.
Cross-links: For pairs of artifacts that share ≥2 entities (same
gene, drug, pathway), inserts a row in
artifact_links with edge type
shares_entities and the list of shared entity IDs.
Provenance backfill: For each artifact with an empty
provenance_chain, walks upstream dependencies (input datasets,
parent notebooks, generating hypotheses) and materialises the chain
as a JSON column.
- Idempotent: each artifact in each phase is skipped if its state is already
filled for that phase this cycle.
- Release as a no-op when all artifacts are fully enriched.
- Emits
agent_contributions (type=artifact_enrichment) per write.
Success criteria
- Fraction of artifacts with a non-null
evaluation_dataset_id or
evaluation_benchmark_id trends to 100% for artifacts that claim to be
evaluated.
- Cross-link graph is non-empty: every artifact with ≥2 shared entities has
at least one
shares_entities row.
- Every artifact has a populated
provenance_chain or is explicitly marked
as having no upstream (
provenance_chain = '[]').
- Run log: artifacts scanned per phase, links inferred, cross-links created,
provenance filled, retries.
Quality requirements
- No stubs: inferred evaluation links must include a real dataset/benchmark
ID and a non-zero confidence score, not a placeholder — link to meta-quest
quest_quality_standards_spec.md.
- When processing ≥10 artifacts per cycle, use 3–5 parallel agents (one per
phase or one per slice) so the three phases don't serialise.
- Log total items processed + retries so we can detect busywork (same
artifact revisited without state change → either idempotency bug or
inference is truly ambiguous — escalate to a Senate task in the latter
case).
- INFERRED:
artifact_links and provenance_chain column names follow the
pattern in
artifact_enrichment_quest_spec.md; the driver reads schema
at runtime.