🧫
GWAS meta-analysis of major depressive disorder across three cohorts
active
experiment
Created: 2026-04-06T12:30:40
By: etl-v1-backfill
Quality:
50%
✓ SciDEX
ID: exp-5f1ec586-f80a-4c02-9641-6f9f1d3b363c
🧫 Experiment Protocol
Exploratorymajor depressive disorderhuman patientsproposed
A comprehensive genome-wide association study meta-analysis combining three large cohorts (23andMe, CONVERGE, and PGC) to identify genetic variants associated with major depressive disorder. The study analyzed genetic data from 90,150 MDD cases and 246,603 controls across different populations. Cases were defined using various diagnostic criteria including self-reported clinical diagnosis/treatment (23andMe), Composite International Diagnostic Interview (CONVERGE), and structured diagnostic interviews or DSM-IV checklists (PGC). The meta-analysis identified genome-wide significant associations and prioritized variants for replication studies.
PRIMARY OUTCOME
genetic association with MDD risk
EXPECTED OUTCOMES
- 1. Primary: Identify 15-25 genome-wide significant loci (p < 5×10^-8) associated with MDD risk with odds ratios 1.05-1.15
- 2. Secondary: Achieve SNP-based heritability estimate of 8-12% on liability scale with standard error <1%
- 3. Pathway enrichment: Significant enrichment (FDR < 0.05) in neurotransmitter signaling pathways, particularly serotonergic and dopaminergic systems
- 4. Replication success: >70% of genome-wide significant variants replicated (p < 0.05) in independent cohorts with consistent effect directions
- 5. Polygenic prediction: PRS explaining 1.5-3% variance in MDD liability in independent test cohorts (Nagelkerke R^2)
- 6. Genetic correlation: Significant positive correlation (rG > 0.6) with anxiety disorders and negative correlation with subjective well-being
- 7. Clinical translation: Identify 3-5 druggable targets among associated genes with existing pharmacological evidence
SUCCESS CRITERIA
- • Genome-wide significance: ≥10 independent loci reaching p < 5×10^-8 with genomic inflation λGC < 1.1
- • Quality metrics: >95% genotype call rate and Hardy-Weinberg equilibrium p > 10^-6 for analyzed variants
- • Heterogeneity control: I^2 < 50% for top associations indicating consistent effects across cohorts
- • Replication threshold: >60% of lead variants show nominal significance (p < 0.05) in replication cohorts
- • Statistical power: Achieved >80% power to detect variants with MAF >5% and OR >1.1 at genome-wide significance
- • Data completeness: Successfully analyze >95% of planned samples with <5% exclusions due to quality control
- • Reproducibility: Key findings replicate in sensitivity analyses and show consistent results across analytical approaches
PROTOCOL
**Phase 1: Cohort Preparation and Quality Control** — Month 1-2
Assemble three primary cohorts: 23andMe (n=75,607 cases, 231,747 controls), CONVERGE (n=5,303 cases, 5,337 controls), and PGC (n=9,240 cases, 9,519 controls). Implement standardized quality control: remove samples with call rate <95%, exclude SNPs with MAF <1%, call rate <95%, or Hardy-Weinberg equilibrium p <10^-6. Perform principal component analysis to identify and exclude population outliers (>4 standard deviations from population mean). Remove related individuals (IBD >0.1) and samples with sex discordancies. Conduct batch effect detection using quantile-quantile plots and genomic inflation factor (λGC) calculations. Impute genotypes using 1000 Genomes Project Phase 3 reference panel with IMPUTE2 software, retaining variants with info score >0.8.
**Phase 2: Individual Cohort GWAS Analysis** — Month 2-3
Perform genome-wide association analysis for each cohort separately using logistic regression with additive genetic model. Include covariates: age, sex, and first 10 principal components to control for population stratification. For 23andMe cohort, use self-reported clinical diagnosis/treatment as case definition. For CONVERGE cohort, employ Composite International Diagnostic Interview (CIDI) with recurrent MDD episodes. For PGC cohort, use structured clinical interviews or DSM-IV symptom checklists. Calculate genomic inflation factors and ensure λGC <1.1 after covariate adjustment. Generate Manhattan plots and quantile-quantile plots for each cohort. Estimate SNP-based heritability using LDSC (linkage disequilibrium score regression).
**Phase 3: Meta-Analysis Implementation** — Month 3-4
Conduct fixed-effects inverse variance-weighted meta-analysis using METAL software. Test ~9 million SNPs passing quality control across all cohorts. Weight each study by sample size and apply genomic control correction if λGC >1.05. Identify genome-wide significant loci using threshold p <5×10^-8. Test for between-study heterogeneity using Cochran's Q statistic and I^2 measure. Perform conditional analysis to identify independent signals within associated loci using GCTA-COJO. Calculate effective sample size accounting for case-control imbalance. Generate forest plots for top associations and regional association plots for significant loci using LocusZoom.
**Phase 4: Functional Annotation and Prioritization** — Month 4-5
Annotate genome-wide significant variants using ANNOVAR and VEP (Variant Effect Predictor). Prioritize variants based on functional consequences: missense variants with CADD score >15, eQTLs from GTEx database (brain tissues), chromatin interactions from Hi-C data, and regulatory elements from ENCODE. Perform gene-based association testing using MAGMA with default settings (SNP-wise mean model). Conduct pathway enrichment analysis using GSEA with curated gene sets from MSigDB, focusing on neurotransmitter signaling, synaptic function, and neuronal development pathways. Map lead SNPs to target genes using multiple approaches: nearest gene, eQTL mapping, chromatin conformation capture, and DEPICT gene prioritization.
**Phase 5: Replication and Validation Studies** — Month 5-6
Select top 30 genome-wide significant variants for replication in independent cohorts: UK Biobank depression cases (n=40,000), iPSYCH Danish registry (n=15,000), and FinnGen (n=25,000). Calculate required sample sizes for 80% power to replicate associations with original effect sizes. Perform lookup analyses for established psychiatric GWAS loci from schizophrenia, bipolar disorder, and autism spectrum disorders. Test for genetic correlation between MDD and related psychiatric traits using LDSC. Conduct polygenic risk score (PRS) analysis using PRSice software with p-value thresholds from 0.001 to 0.5. Validate PRS performance in independent test sets measuring AUC and Nagelkerke R^2.
**Phase 6: Clinical Translation and Reporting** — Month 6-7
Perform drug target enrichment analysis using OpenTargets platform to identify actionable genes. Conduct Mendelian randomization analyses to test causal relationships between identified loci and MDD-related phenotypes (BMI, smoking, education). Generate comprehensive summary statistics file with standardized format including chromosome, position, alleles, effect sizes, standard errors, and p-values for >9 million variants. Perform power calculations for future studies and estimate sample sizes needed to identify additional loci. Create interactive web portal for results visualization and data sharing. Conduct sensitivity analyses excluding participants with bipolar disorder or other psychiatric comorbidities. Prepare manuscripts following STREGA guidelines for genetic association studies.
Source: PMID 29728651 ↗
🧫 Experiment Extras
PATHWAY
neurotransmitter signaling, neuronal development
MARKET PRICE
$0.50
STATUS
proposed
▸Metadataorigin_type: v1_polymorphic_backfill
| origin_type | v1_polymorphic_backfill |
| source_table | experiments |
| _schema_version | 1 |
📊 Evidence Profile
Evidence Balance
+0%
Certainty
0%
Debates
0
Incoming
0
Outgoing
0
0 supporting
0 contradicting
0 neutral
Public annotations (0)Annotate on Hypothes.is →
No public annotations yet.