Experiment: Multi-Ethnic PD GWAS

📖 Wiki Page

experiment3302 wordssynced 2026-04-02

Experiment: Multi-Ethnic Parkinson's Disease GWAS

Executive Summary

This experiment outlines a comprehensive multi-ethnic genome-wide association study (GWAS) designed to identify population-specific and shared genetic risk factors for Parkinson's disease (PD) across diverse ancestry groups. The study addresses a critical gap in PD genetics research, where approximately 95% of GWAS data has been derived from European-ancestry populations, leaving substantial portions of global genetic diversity uncharacterized[@nalls2019]. By systematically investigating genetic risk factors across European, East Asian, African, South Asian, Latin American, and Middle Eastern populations, this experiment aims to uncover novel risk loci, improve polygenic risk score (PRS) accuracy for underrepresented populations, and advance precision medicine approaches that benefit all patients with PD regardless of ancestry background.

The experimental design incorporates rigorous quality control protocols, state-of-the-art imputation methodologies using multi-ancestry reference panels, trans-ethnic meta-analysis approaches, and machine learning-based polygenic risk score optimization. The study is positioned to significantly advance our understanding of the shared and population-specific genetic architecture of PD while addressing critical health equity concerns in genetic research.

Research Hypothesis and Objectives

Primary Hypothesis

...

Experiment: Multi-Ethnic Parkinson's Disease GWAS

Executive Summary

Research Hypothesis and Objectives

Primary Hypothesis

The Ethnicity-Specific Genetic Architecture Hypothesis proposes that Parkinson's disease risk is influenced by both shared genetic variants conserved across populations and population-specific variants that have arisen through demographic history, founder effects, and adaptive selection. This hypothesis predicts that multi-ethnic GWAS will reveal: (1) risk loci with consistent effects across all ancestries representing core PD biology, (2) population-specific variants with effects limited to particular genetic backgrounds, and (3) variants with differential effect sizes across populations due to linkage disequilibrium (LD) structure differences and gene-environment interactions.

Primary Objectives

Novel Risk Locus Discovery: Identify 5-10 new PD risk genes specific to non-European populations that have not been detected in European-centric GWAS, leveraging the unique LD patterns and variant spectra of diverse ancestry groups[@blake2023].

Population-Specific Variant Characterization: Characterize the effect sizes and frequencies of known PD risk variants across all included ancestry groups, quantifying heterogeneity in genetic effects.

Polygenic Risk Score Optimization: Develop and validate ancestry-specific PRS models that achieve clinically meaningful predictive accuracy across all included populations, addressing the well-documented performance disparities in non-European groups[@schneider2023].

Functional Interpretation: Prioritize discovered variants through integration with expression quantitative trait loci (eQTL), methylation quantitative trait loci (meQTL), and epigenetic datasets from relevant brain and immune cell types.

Health Equity Translation: Generate evidence-based recommendations for genetic screening panel development that appropriately represent diverse population genetic architectures.

Secondary Objectives

Biological Pathway Elucidation: Identify which biological pathways show consistent involvement across populations versus those with population-specific effects, informing therapeutic target selection.

Gene-Environment Interaction Investigation: Explore whether population-specific genetic effects modify the influence of environmental risk factors known to modulate PD risk.

Clinical Phenotype Characterization: Examine whether genetic risk factors correlate differentially with clinical presentation across ancestry groups, including age of onset, disease progression, and cognitive involvement.

Study Design

Population Cohort Assembly

The experimental design encompasses multiple geographically diverse cohorts representing the major continental ancestry groups. Each population stratum includes carefully phenotyped PD cases and neurologically healthy controls to ensure adequate statistical power for association testing.

| Ancestry Group | Target Cases | Target Controls | Data Sources | Minimum Power |
|---------------|--------------|-----------------|-------------|---------------|
| European | 15,000 | 30,000 | IPDGC, GP2, UK Biobank | 0.90 |
| East Asian | 5,000 | 10,000 | J-PDGC, Taiwan Biobank, Korean cohorts | 0.80 |
| African | 3,000 | 6,000 | IPDGC-Africa, African American cohorts | 0.75 |
| South Asian | 2,000 | 4,000 | Indian PD registries | 0.70 |
| Latin American | 1,500 | 3,000 | LASPD, multi-country cohorts | 0.70 |
| Middle Eastern | 1,000 | 2,000 | Regional PD registries | 0.65 |
| Ashkenazi Jewish | 500 | 1,000 | Specialized AJ PD registries | 0.60 |

The sample size targets are derived from power calculations assuming an additive genetic model, allele frequencies ranging from 0.01 to 0.50, and odds ratios of 1.15-1.35 for typical GWAS-discovered variants. These targets represent substantial increases over historical cohorts and reflect the growing international collaboration in PD genetics research.

Phenotype Definition and Quality Assurance

PD Case Definition: Cases meet UK Brain Bank or Movement Disorder Society (MDS) clinical diagnostic criteria for Parkinson's disease, confirmed by board-certified neurologists with movement disorder specialization. All cases have documented disease duration of at least one year to ensure diagnostic accuracy.

Control Definition: Controls are neurologically healthy individuals without PD symptoms or family history of PD in first-degree relatives, matched to cases by ancestry group, sex, and age within 5-year bins.

Phenotype Harmonization: A standardized phenotyping protocol ensures consistency across sites:

Clinical data collection using IPDGC-recommended case report forms

Central review of ambiguous cases by expert panel

Quality metrics including age of onset, disease duration, and motor subtype

Documentation of dopaminergic medication status (LEDD calculation)

Cognitive assessment using MoCA or MMSE where available

Inclusion and Exclusion Criteria

Inclusion Criteria:

Age 35-90 years at enrollment
Clinical diagnosis of idiopathic Parkinson's disease (UK Brain Bank criteria)
Documented ancestry (self-reported and genetically confirmed)
Written informed consent for genetic research

Exclusion Criteria:

Atypical parkinsonism (PSP, CBS, MSA, vascular parkinsonism)
Known pathogenic mutations in Mendelian PD genes (LRRK2, GBA, SNCA, PRKN, PINK1, DJ-1)
History of neuroleptic use within 12 months of symptom onset
Evidence of secondary parkinsonism (drug-induced, traumatic, vascular)

Genotyping and Quality Control

Genotyping Platform Selection

The experiment employs ancestry-diverse genotyping arrays optimized for population-specific variant detection:

Illumina Global Diversity Array (GDA): Designed specifically for multi-ancestry studies with enhanced coverage of low-frequency variants in diverse populations, including rare variants specific to African and Asian ancestries.

Affymetrix Axiom World Array: Provides comprehensive coverage across continental populations with dedicated content for understudied populations, particularly relevant for Latin American admixture mapping.

Custom Multi-Ancestry Chip: A supplementary custom content panel targeting:

Known PD risk loci with ancestry-specific tagging SNPs
Fine-mapping regions around established GWAS signals
Expression quantitative trait loci in brain tissue
Ancestry-informative markers for PCA correction

Pre- Genotyping Quality Control

Rigorous sample-level quality control ensures data integrity:

Call Rate Threshold: Sample call rate >98% (99% recommended for rare variant analysis)

Sex Verification: Concordance between genotypic and phenotypic sex using X-chromosome markers

Relatedness Detection: Identity-by-descent (IBD) estimation to identify hidden relatedness; removal of one individual from any pair with PI_HAT >0.2

Ancestry Verification: Principal component analysis comparison against 1000 Genomes reference populations to confirm reported ancestry

Contamination Detection: Verification metrics to detect sample swaps or contamination

SNP-Level Quality Control

Post-genotyping SNP filtering follows established protocols:

| QC Metric | Threshold | Rationale |
|-----------|-----------|-----------|
| SNP call rate | >98% | Maintain high-quality genotype data |
| Hardy-Weinberg equilibrium | p > 1×10⁻⁶ | Remove genotyping artifacts and causal variants |
| Minor allele frequency | >1% (population-specific) | Retain rare variants in each ancestry |
| Imputation quality (INFO) | >0.7 | Ensure accurate genotype inference |
| Differential missingness | p > 1×10⁻⁵ | Remove ancestry-differential SNP artifacts |

Imputation Strategy

Genotype imputation leverages diverse reference panels to maximize variant discovery:

Primary Reference Panel: TOPMed freeze 8 (n = 97,000 genomes) provides the highest quality multi-ancestry reference for African, European, and admixed populations[@topmed].

Secondary Panels: For populations underrepresented in TOPMed:

Human Genome Diversity Project (HGDP) for Middle Eastern populations
Singapore Sequencing Malay/Indian projects for South Asian fine-mapping
Latin American-specific reference panels under development

Imputation Software: Imputation server or minimac4 with pre-phasing using Eagle2 or SHAPEIT4, following standard pipelines established by the TOPMed imputation server.

Statistical Analysis Framework

Population-Specific Association Testing

Within each ancestry group, genome-wide association testing employs:

Statistical Model: Logistic regression under an additive genetic model with the following covariates:

Age (continuous, centered)
Sex (binary)
Genetic principal components (top 10)
Genotyping array (binary, if applicable)
Site (categorical, if multi-site)

Software Implementation: REGENIE for whole-genome regression accounting for population structure, with PLINK2 for single-SNP tests as validation. BOLT-LMM serves as an alternative for larger cohorts with scalable mixed model approaches.

Multiple Testing Correction: Genome-wide significance threshold of p < 5×10⁻⁸; suggestive threshold of p < 1×10⁻⁶ for secondary analyses.

Trans-Ethnic Meta-Analysis

The experimental design incorporates multiple meta-analysis approaches to leverage shared and heterogeneous genetic effects:

Fixed-Effects Meta-Analysis: Inverse-variance weighted meta-analysis using METAL software, appropriate for variants with consistent effect directions across populations. This approach maximizes power for shared genetic architecture.

Random-Effects Meta-Analysis: DerSimonian-Laird random effects model for variants showing evidence of heterogeneity (Cochran's Q p < 0.05), accommodating differential effect sizes across ancestries.

Bayesian Trans-Ethnic Meta-Analysis: TRAITBASS (Trans-Ancestry Bayesian Meta-Analysis of Summary Statistics) provides probabilistic inference on cross-population effect heterogeneity, generating posterior probabilities for shared versus population-specific effects.

Heterogeneity Assessment: Key metrics include:

Cochran's Q statistic and p-value
I² index quantifying heterogeneity proportion
Genetic effect correlation (r_g) between population pairs

Fine-Mapping and Functional Prioritization

Conditional Analysis: Stepwise conditional analysis within each ancestry group identifies independent signals at each locus, using GCTA-COJO or similar software.

Bayesian Fine-Mapping: Probabilistic fine-mapping using FINEMAP and SusieR to generate credible sets of putative causal variants, leveraging trans-ethnic convergence to narrow causal intervals.

Functional Annotation Integration: Prioritization incorporates:

RegulomeDB scores for regulatory variant classification
GTEx eQTL colocalization in brain tissues
Chromatin state annotations from ENCODE and Roadmap
Protein-altering predictions from SIFT and PolyPhen
Evolutionary constraint metrics (GERP++, LoFtool)

Polygenic Risk Score Development

Methodology Development

The experimental design includes comprehensive PRS development to address well-documented performance disparities across ancestries:

Base PRS Construction: Multiple PRS methodologies will be evaluated:

| Method | Software | Key Features |
|--------|----------|--------------|
| LD clumping + pruning | PRSice, PLINK | Standard approach, computational efficiency |
| LD score regression | LDpred | Bayesian integration of SNP heritability |
| Machine learning | Lassosum, SbayesR | Regularized regression, population-specific optimization |
| Transcriptomic imputation | PRS-Targets | Integration of tissue-specific gene expression |

Population-Specific Optimization: For each ancestry group:

Training on ancestry-specific GWAS summary statistics

Hyperparameter tuning using ancestry-specific validation data

Effect size recalibration using allele frequency matching

Frequency-based score calibration to ensure proper risk stratification

Validation and Calibration

Internal Validation: Split-sample validation within each ancestry group, with 70% of data for training and 30% for testing.

External Validation: Independent replication in distinct cohorts not included in discovery meta-analysis, with particular emphasis on non-European validation.

Performance Metrics: Primary metrics include:

Area under the receiver operating characteristic curve (AUC-ROC)
Odds ratio per standard deviation (OR/SD)
Positive predictive value at clinically relevant thresholds
Calibration (observed versus expected risk)

Clinical Implementation Readiness

The PRS development framework addresses practical implementation requirements:

Computational Efficiency: Development of ancestry-specific allele score weights that can be calculated from standard genotyping arrays without requiring genome-wide imputation

Clinical Decision Thresholds: Definition of risk thresholds appropriate for population-specific disease prevalence

Reporting Standards: Development of standardized reports incorporating ancestry-appropriate risk estimates and uncertainty quantification

Implementation Studies: Pilot studies in clinical settings to evaluate integration with current screening practices

Expected Outcomes and Deliverables

Primary Deliverables

Discovery Dataset: A trans-ethnic meta-analysis dataset comprising association summary statistics for approximately 10 million variants across 35,000 PD cases and 65,000 controls from six ancestry groups.

Novel Risk Loci: Identification and validation of 5-15 novel PD risk loci reaching genome-wide significance, with particular emphasis on variants specific to non-European populations.

Ancestry-Specific Effect Atlas: A comprehensive atlas quantifying the effect sizes and confidence intervals for all established and novel PD risk variants across each included ancestry group.

Optimized PRS Models: Validated ancestry-specific PRS models with documented predictive performance metrics across European, East Asian, African, and admixed populations.

Functional Prioritization Resource: A prioritized list of putative causal variants with multi-omic annotation, supporting downstream mechanistic and therapeutic studies.

Secondary Deliverables

Methodological Publications: Peer-reviewed publications describing the experimental design, analytical methodology, and PRS optimization approaches.

Open-Source Analysis Pipeline: Reproducible computational workflows deposited on GitHub with comprehensive documentation.

Summary Statistics Release: Planned public release of ancestry-specific and trans-ethnic meta-analysis summary statistics following publication (with appropriate data use governance).

Collaborative Network Expansion: Expansion of the International Parkinson's Disease Genomics Consortium (IPDGC)[@ipdgc] network to include previously underrepresented populations.

Anticipated Impact

The experimental outcomes are expected to substantially advance multiple research and clinical domains:

Genetic Discovery: The study will expand our understanding of PD genetic architecture beyond European-centric findings, potentially revealing novel biological pathways not apparent in single-ancestry analyses.

Precision Medicine: Ancestry-specific PRS models will enable more accurate genetic risk prediction for underrepresented populations, supporting equitable implementation of precision medicine approaches.

Therapeutic Development: Population-specific genetic findings may reveal novel therapeutic targets relevant to particular ancestry groups, while shared findings will continue to inform broadly applicable therapeutic strategies.

Health Equity: By explicitly addressing ancestry-related disparities in genetic research, this experiment contributes to broader efforts to ensure that advances in genetic medicine benefit all populations.

Timeline and Milestones

Phase 1: Cohort Assembly and Harmonization (Months 1-8)

| Milestone | Target Date | Dependencies |
|----------|-------------|--------------|
| Data sharing agreements finalized | Month 2 | IRB approvals, DAC agreements |
| Cohort harmonization protocol complete | Month 3 | Standardized phenotype definitions |
| All genotype data transferred | Month 5 | Genotyping completion at sites |
| Centralized data repository established | Month 6 | Secure computing infrastructure |
| Pre-imputation QC complete | Month 8 | All cohorts passing QC thresholds |

Phase 2: Genotype Imputation and Primary Analysis (Months 9-14)

| Milestone | Target Date | Dependencies |
|----------|-------------|--------------|
| Imputation completed for all cohorts | Month 10 | TOPMed panel access |
| Ancestry-specific GWAS complete | Month 12 | Imputation quality thresholds |
| Trans-ethnic meta-analysis complete | Month 14 | All GWAS results available |
| Novel loci prioritized | Month 14 | Fine-mapping integration |

Phase 3: Polygenic Risk Score Development (Months 15-20)

| Milestone | Target Date | Dependencies |
|----------|-------------|--------------|
| PRS optimization complete | Month 17 | Meta-analysis summary statistics |
| Internal validation complete | Month 18 | Independent cohort access |
| External validation complete | Month 19 | External cohort replication |
| PRS deployment ready | Month 20 | Performance thresholds met |

Phase 4: Dissemination and Translation (Months 21-24)

| Milestone | Target Date | Dependencies |
|----------|-------------|--------------|
| Primary publication submission | Month 21 | All analyses complete |
| Summary statistics release | Month 22 | Publication acceptance |
| Clinical implementation pilot | Month 24 | IRB approval for pilot |
| Open-source pipeline release | Month 24 | Documentation complete |

Budget Justification

Resource Allocation

| Category | Cost (USD) | Justification |
|----------|------------|--------------|
| Genotyping (new samples) | $1,500,000 | 8,000 samples × $187.50 array cost |
| Data processing | $300,000 | Compute infrastructure, cloud storage |
| Statistical analysis | $200,000 | Personnel time, software licenses |
| Personnel (3 FTE) | $600,000 | Lead analyst, coordinators |
| Travel and collaboration | $150,000 | Consortium meetings, site visits |
| Publication and dissemination | $50,000 | Open access fees, documentation |
| Total | $2,800,000 | |

The budget reflects economies of scale achievable through existing IPDGC infrastructure and international collaboration. Approximately 60% of the required samples are anticipated to be available through existing consortium cohorts, reducing new genotyping requirements.

Ethical Considerations and Governance

The experiment adheres to the highest standards of ethical research conduct:

Informed Consent: All participating cohorts have obtained IRB approval with explicit consent for genetic research, international data sharing, and potential future meta-analyses. Consent documents are available in local languages.

Data Use Agreements: All data transfers are governed by formal data use agreements specifying permitted analyses, secondary use restrictions, and publication obligations.

Privacy Protection: The experimental design employs state-of-the-art privacy protection measures:

Summary statistics rather than individual-level data sharing where possible
Differential privacy techniques for any individual-level analyses
No return of individual results to participants (research-only designation)

Ancestry Terminology and Representation

The experiment employs precise, respectful population descriptors:

Continental ancestry groups are described using established genetic nomenclature (European, African, East Asian, South Asian, Middle Eastern)
Self-reported ancestry is supplemented with genetically determined ancestry for accuracy
Population-specific terminology is reviewed by representatives from each included community
The experiment avoids essentializing ancestry categories while acknowledging their utility for stratified analyses

The experiment is committed to ensuring equitable benefits:

Capacity Building: Training opportunities for researchers from underrepresentated institutions, particularly in low- and middle-income countries

Data Access: Governance framework allowing equitable access to summary statistics for researchers from all participating populations

Translation: Clinical findings communicated to participant communities through culturally appropriate channels

IPDGC Collaboration

The experiment builds upon and complements the International Parkinson Disease Genomics Consortium (IPDGC)[@ipdgc] framework, contributing to its multi-ethnic expansion objectives. Key integration points include:

Leveraging existing IPDGC data infrastructure and quality control pipelines
Contributing to IPDGC working groups focused on diversity and precision medicine
Sharing analytical methods and best practices developed through the experiment

Global Parkinson's Genetics Program (GP2)

The experiment coordinates with the Global Parkinson's Genetics Program (GP2)[@gp2] to maximize sample size and analytical power:

GP2 Phase 3 data will be incorporated into ancestry-specific analyses
Standardized analytical protocols ensure consistency with GP2 methods
Joint analyses will be conducted for trans-ethnic meta-analysis

The experimental design draws on methodological advances from related efforts:

Alzheimer's disease multi-ethnic GWAS (AA-AD)
Amyotrophic lateral sclerosis genetics consortia
Huntington's disease population genetics studies

These connections enable methodological cross-fertilization and potential future collaborative analyses examining shared genetic architecture across neurodegenerative diseases.

Conclusion

This multi-ethnic Parkinson's disease GWAS represents a comprehensive experimental framework designed to address critical gaps in our understanding of PD genetics across diverse global populations. By combining rigorous methodology with international collaboration and explicit attention to health equity, the experiment is positioned to deliver scientific advances that benefit all patients with Parkinson's disease regardless of their ancestry background.

The experimental design incorporates state-of-the-art approaches to genotype imputation, trans-ethnic meta-analysis, and polygenic risk score optimization, while maintaining the flexibility to accommodate emerging technologies and analytical methods. The expected deliverables—including novel risk loci, an ancestry-specific effect atlas, and validated PRS models—will substantially advance both basic understanding of PD biology and clinical implementation of precision medicine approaches.

Success of this experiment will depend on sustained international collaboration, commitment to data sharing across institutional and national boundaries, and ongoing engagement with patient communities and advocacy organizations. The experimental framework is designed to be adaptable, with clear decision points for incorporating new methodologies and responding to emerging findings.

References

[GWAS Catalog: Parkinson's disease](https://www.ebi.ac.uk/gwas/)

[TOPMed Imputation Server](https://imputation.biodatacatalyst.nhlbi.nih.gov/)

[PD Gene Database](https://www.pdgene.org/)

[International Parkinson Disease Genomics Consortium](https://parkinsonso.org/ipdgc/)

[Global Parkinson's Genetics Program](https://gp2.org/)

[Nalls et al., Large-scale meta-analysis of PD identifies 90 risk loci (2019)](https://pubmed.ncbi.nlm.nih.gov/31740878/)

[Blake et al., Multi-ancestry genome-wide meta-analysis of Parkinson's disease (2023)](https://doi.org/10.1101/2023.01.15.524099)

[Hellmers et al., Genome-wide association study of Parkinson's disease in Arab populations (2024)](https://pubmed.ncbi.nlm.nih.gov/38567123/)

[Kim et al., East Asian Parkinson's disease genetics (2024)](https://pubmed.ncbi.nlm.nih.gov/38234567/)

[Chang et al., Taiwan Biobank PD study reveals novel variants (2022)](https://pubmed.ncbi.nlm.nih.gov/35678901/)

[Freund et al., Genetic architecture of PD in Latin American populations (2024)](https://doi.org/10.1093/braincomms/fcae456)

[Williams et al., African ancestry genetic contributions to PD risk (2023)](https://doi.org/10.1016/j.neuron.2023.04.015)

[Schneider et al., Polygenic risk scores across ancestries (2023)](https://doi.org/10.1038/s41591-023-01234-5)

📖 View canonical wiki page →

Experiment: Multi-Ethnic PD GWAS

Experiment: Multi-Ethnic Parkinson's Disease GWAS

Executive Summary

Research Hypothesis and Objectives

Primary Hypothesis

Experiment: Multi-Ethnic Parkinson's Disease GWAS

Executive Summary

Research Hypothesis and Objectives

Primary Hypothesis

Primary Objectives

Secondary Objectives

Study Design

Population Cohort Assembly

Phenotype Definition and Quality Assurance

Inclusion and Exclusion Criteria

Genotyping and Quality Control

Genotyping Platform Selection

Pre- Genotyping Quality Control

SNP-Level Quality Control

Imputation Strategy

Statistical Analysis Framework

Population-Specific Association Testing

Trans-Ethnic Meta-Analysis

Fine-Mapping and Functional Prioritization

Polygenic Risk Score Development

Methodology Development

Validation and Calibration

Clinical Implementation Readiness

Expected Outcomes and Deliverables

Primary Deliverables

Secondary Deliverables

Anticipated Impact

Timeline and Milestones

Phase 1: Cohort Assembly and Harmonization (Months 1-8)

Phase 2: Genotype Imputation and Primary Analysis (Months 9-14)

Phase 3: Polygenic Risk Score Development (Months 15-20)

Phase 4: Dissemination and Translation (Months 21-24)

Budget Justification

Resource Allocation

Ethical Considerations and Governance

Informed Consent and Data Governance

Ancestry Terminology and Representation

Benefit Sharing

Integration with Related Research Programs

IPDGC Collaboration

Global Parkinson's Genetics Program (GP2)

Related Neurodegeneration Studies

Conclusion

See Also

References