Spec: Add accession-level provenance for UKB AD GWAS dataset

← All Specs

Spec: Add accession-level provenance for UKB AD GWAS dataset

Task ID: c43a0413-2405-47e6-a25c-6b8c7a95d3b4 Quest: 415b277f-03b Layer: Atlas Created: 2026-04-27

Problem

The ukb-ad-gwas dataset (registered by the biomni_parity pipeline) scored 0.36 because it has only the
generic UK Biobank homepage URL with no:

  • GWAS accession number
  • Phenotype definition
  • Summary-statistics download path
  • schema_json column schema
  • Row-level citations / PMID
  • Evidence tier

This makes the dataset hard to reproduce or audit.

Solution

Update the datasets row for ukb-ad-gwas with:

  • GWAS accession: ieu-b-2 (IEU OpenGWAS)
  • Primary citation: Marioni et al. 2018, PMID 30315176, doi 10.1038/s41398-018-0189-y
  • Phenotype definition: AD-by-proxy (parent had self-reported Alzheimer's disease in UK Biobank)
  • Summary statistics URL: https://gwas.mrcieu.ac.uk/datasets/ieu-b-2/
  • Download path: https://gwas.mrcieu.ac.uk/files/ieu-b-2/ieu-b-2.vcf.gz
  • schema_json: Full GWAS summary-statistics column schema (rsid, chromosome, position,
  • effect_allele, other_allele, effect_allele_freq, beta, se, p_value, n, evidence_tier, row_citation)
  • License: CC-BY 4.0 (IEU OpenGWAS redistribution)
  • Quality score: raised to 0.82 (from 0.36)
  • Reference data

    FieldValue
    IEU OpenGWAS accessionieu-b-2
    Primary PMID30315176
    DOI10.1038/s41398-018-0189-y
    AuthorsMarioni RE et al.
    JournalTranslational Psychiatry (2018)
    PhenotypeAlzheimer's disease by proxy
    Genome buildGRCh37 (hg19)
    N total314,278
    N cases (proxy)27,696
    N controls286,582
    GWAS softwareBOLT-LMM v2.3.2
    Variants tested~12 million (HRC r1.1 imputation panel)
    GW-significant loci8 (APOE, BIN1, PICALM, CD33, CLU, CR1, MS4A, FERMT2)

    Acceptance criteria

    • schema_json is not null and contains columns, gwas_accession, primary_citation_pmid,
    phenotype, summary_statistics_url
    • description mentions accession ieu-b-2 and PMID 30315176
    • quality_score ≥ 0.78
    • license is set
    • quality_notes confirms the enrichment

    Work Log

    2026-04-27 — Initial implementation

    • Investigated current state: quality_score=0.36, schema_json=null, generic UKB homepage URL
    • Identified primary reference: Marioni et al. 2018 (PMID 30315176), IEU OpenGWAS accession ieu-b-2
    • Updated datasets row with full GWAS schema, accession, phenotype definition, summary statistics URL,
    PMID, evidence tier, license, and improved quality_score (0.82)
    • Commit: see task branch orchestra/task/c43a0413-add-accession-level-provenance-for-ukb-a

    File: ukb_ad_gwas_provenance_spec.md
    Modified: 2026-05-01 20:13
    Size: 2.6 KB