Epigenetic reprogramming in aging neuronsΒΆ
Analysis ID: SDA-2026-04-02-gap-epigenetic-reprog-b685190e
Date: 2026-04-02
Domain: neurodegeneration
Hypotheses Generated: 7
Knowledge Graph Edges: 91
Key HypothesesΒΆ
- Nutrient-Sensing Epigenetic Circuit Reactivation (score: 0.619)
- Selective HDAC3 Inhibition with Cognitive Enhancement (score: 0.565)
- Chromatin Accessibility Restoration via BRD4 Modulation (score: 0.522)
- Astrocyte-Mediated Neuronal Epigenetic Rescue (score: 0.459)
- Mitochondrial-Nuclear Epigenetic Cross-Talk Restoration (score: 0.446)
This notebook presents a computational analysis including differential gene expression, pathway enrichment, and multi-dimensional hypothesis scoring. Data is simulated based on known biology from the Allen Brain Cell Atlas (SEA-AD) and published literature.
1. Setup and Data GenerationΒΆ
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
# --- Gene Expression Analysis ---
# Simulated expression data based on known AD biology from SEA-AD atlas
# Real data would come from Allen Brain Cell Atlas API
np.random.seed(42)
genes = ['SIRT1', 'HDAC3', 'BRD4', 'HDAC', 'SIRT3', 'TET2', 'OCT4', 'TREM2', 'APOE', 'APP', 'MAPT', 'PSEN1', 'CD33', 'CLU', 'BIN1', 'GFAP', 'AQP4', 'SLC17A7', 'GAD1', 'MBP']
cell_types = ['Microglia', 'Astrocytes', 'Exc_Neurons', 'Inh_Neurons', 'Oligodendrocytes']
conditions = ['Control', 'AD']
# Generate biologically plausible expression values
# AD-associated genes show cell-type specific changes
n_samples = 50
expression_data = {}
for gene in genes:
expression_data[gene] = {}
for ct in cell_types:
ctrl = np.random.lognormal(mean=2.0, sigma=0.5, size=n_samples)
# Simulate known biology: microglia upregulate immune genes in AD
if gene in ['TREM2', 'CD33', 'C1QA'] and ct == 'Microglia':
ad = ctrl * np.random.lognormal(mean=0.8, sigma=0.3, size=n_samples)
elif gene in ['GFAP'] and ct == 'Astrocytes':
ad = ctrl * np.random.lognormal(mean=0.6, sigma=0.3, size=n_samples)
elif gene in ['SLC17A7', 'SYP'] and 'Neuron' in ct:
ad = ctrl * np.random.lognormal(mean=-0.3, sigma=0.2, size=n_samples)
elif gene in ['MBP'] and ct == 'Oligodendrocytes':
ad = ctrl * np.random.lognormal(mean=-0.2, sigma=0.2, size=n_samples)
else:
ad = ctrl * np.random.lognormal(mean=0.1, sigma=0.3, size=n_samples)
expression_data[gene][ct] = {'Control': ctrl, 'AD': ad}
print(f"Generated expression data for {len(genes)} genes x {len(cell_types)} cell types")
print(f"Samples per condition: {n_samples}")
Generated expression data for 20 genes x 5 cell types Samples per condition: 50
2. Differential Expression HeatmapΒΆ
Log2 fold change of gene expression between AD and control samples across cell types. Significance: * p<0.05, ** p<0.01, *** p<0.001 (Mann-Whitney U test).
# --- Expression Heatmap: Log2 Fold Change ---
log2fc = np.zeros((len(genes), len(cell_types)))
pvalues = np.zeros((len(genes), len(cell_types)))
for i, gene in enumerate(genes):
for j, ct in enumerate(cell_types):
ctrl = expression_data[gene][ct]['Control']
ad = expression_data[gene][ct]['AD']
log2fc[i, j] = np.log2(np.mean(ad) / np.mean(ctrl))
_, pvalues[i, j] = stats.mannwhitneyu(ctrl, ad, alternative='two-sided')
fig, ax = plt.subplots(figsize=(10, 8))
im = ax.imshow(log2fc, cmap='RdBu_r', vmin=-2, vmax=2, aspect='auto')
ax.set_xticks(range(len(cell_types)))
ax.set_xticklabels([ct.replace('_', ' ') for ct in cell_types], rotation=45, ha='right', fontsize=9)
ax.set_yticks(range(len(genes)))
ax.set_yticklabels(genes, fontsize=9)
# Add significance markers
for i in range(len(genes)):
for j in range(len(cell_types)):
sig = ''
if pvalues[i, j] < 0.001: sig = '***'
elif pvalues[i, j] < 0.01: sig = '**'
elif pvalues[i, j] < 0.05: sig = '*'
color = 'white' if abs(log2fc[i, j]) > 1 else 'black'
ax.text(j, i, f'{log2fc[i,j]:.2f}\n{sig}', ha='center', va='center',
fontsize=7, color=color)
cbar = plt.colorbar(im, ax=ax, label='Log2 Fold Change (AD/Control)')
ax.set_title('Differential Gene Expression: AD vs Control by Cell Type', fontsize=12, fontweight='bold')
fig.patch.set_facecolor('#0a0a14')
ax.set_facecolor('#151525')
ax.tick_params(colors='#e0e0e0')
ax.title.set_color('#4fc3f7')
cbar.ax.yaxis.set_tick_params(color='#e0e0e0')
cbar.ax.yaxis.label.set_color('#e0e0e0')
plt.setp(plt.getp(cbar.ax.axes, 'yticklabels'), color='#e0e0e0')
plt.tight_layout()
plt.show()
print(f"\nSignificant changes (p < 0.05): {int(np.sum(pvalues < 0.05))} / {pvalues.size}")
Significant changes (p < 0.05): 7 / 100
3. Volcano Plot: Microglia ExpressionΒΆ
Differential expression in microglia β the primary immune cells of the brain. Red = upregulated in AD, blue = downregulated. Dashed line = p=0.05 threshold.
# --- Volcano Plot: Microglia Expression Changes ---
fig, ax = plt.subplots(figsize=(10, 7))
fig.patch.set_facecolor('#0a0a14')
ax.set_facecolor('#151525')
fc_vals = []
pv_vals = []
gene_labels = []
for gene in genes:
ctrl = expression_data[gene]['Microglia']['Control']
ad = expression_data[gene]['Microglia']['AD']
fc = np.log2(np.mean(ad) / np.mean(ctrl))
_, pv = stats.mannwhitneyu(ctrl, ad, alternative='two-sided')
fc_vals.append(fc)
pv_vals.append(-np.log10(max(pv, 1e-300)))
gene_labels.append(gene)
fc_vals = np.array(fc_vals)
pv_vals = np.array(pv_vals)
# Color by significance
colors = []
for fc, pv in zip(fc_vals, pv_vals):
if pv > -np.log10(0.05) and abs(fc) > 0.5:
colors.append('#ef5350' if fc > 0 else '#4fc3f7')
else:
colors.append('#555555')
ax.scatter(fc_vals, pv_vals, c=colors, s=80, alpha=0.8, edgecolors='white', linewidths=0.5)
# Label significant genes
for i, (fc, pv, label) in enumerate(zip(fc_vals, pv_vals, gene_labels)):
if pv > -np.log10(0.05) and abs(fc) > 0.3:
ax.annotate(label, (fc, pv), fontsize=8, color='#e0e0e0',
xytext=(5, 5), textcoords='offset points')
ax.axhline(-np.log10(0.05), color='#ffd54f', linestyle='--', alpha=0.5, label='p=0.05')
ax.axvline(0.5, color='#81c784', linestyle='--', alpha=0.3)
ax.axvline(-0.5, color='#81c784', linestyle='--', alpha=0.3)
ax.set_xlabel('Log2 Fold Change (AD/Control)', color='#e0e0e0', fontsize=11)
ax.set_ylabel('-Log10(p-value)', color='#e0e0e0', fontsize=11)
ax.set_title('Volcano Plot: Microglia Gene Expression in AD', fontsize=12, fontweight='bold', color='#4fc3f7')
ax.tick_params(colors='#e0e0e0')
ax.legend(facecolor='#151525', edgecolor='#333', labelcolor='#e0e0e0')
plt.tight_layout()
plt.show()
# Summary statistics
sig_up = sum(1 for fc, pv in zip(fc_vals, pv_vals) if pv > -np.log10(0.05) and fc > 0.5)
sig_down = sum(1 for fc, pv in zip(fc_vals, pv_vals) if pv > -np.log10(0.05) and fc < -0.5)
print(f"\nSignificantly upregulated: {sig_up}")
print(f"Significantly downregulated: {sig_down}")
Significantly upregulated: 2 Significantly downregulated: 0
4. Statistical AnalysisΒΆ
Comprehensive statistical testing including non-parametric Mann-Whitney U tests, effect sizes (Cohen's d), and one-way ANOVA for cell-type variation.
# --- Statistical Tests ---
print("=" * 70)
print("STATISTICAL ANALYSIS SUMMARY")
print("=" * 70)
results = []
for gene in genes[:10]:
for ct in cell_types:
ctrl = expression_data[gene][ct]['Control']
ad = expression_data[gene][ct]['AD']
# Mann-Whitney U test (non-parametric)
stat_mw, p_mw = stats.mannwhitneyu(ctrl, ad, alternative='two-sided')
# Effect size (Cohen's d)
pooled_std = np.sqrt((np.std(ctrl)**2 + np.std(ad)**2) / 2)
cohens_d = (np.mean(ad) - np.mean(ctrl)) / pooled_std if pooled_std > 0 else 0
if p_mw < 0.05:
results.append({
'Gene': gene,
'Cell_Type': ct.replace('_', ' '),
'Log2FC': np.log2(np.mean(ad) / np.mean(ctrl)),
'P_value': p_mw,
'Cohens_d': cohens_d,
'Effect': 'Large' if abs(cohens_d) > 0.8 else ('Medium' if abs(cohens_d) > 0.5 else 'Small'),
})
stats_df = pd.DataFrame(results)
if len(stats_df) > 0:
stats_df = stats_df.sort_values('P_value')
print(f"\nSignificant results (p < 0.05): {len(stats_df)} gene-cell type pairs\n")
print(stats_df.head(15).to_string(index=False))
# ANOVA across cell types for top genes
print("\n" + "=" * 70)
print("ONE-WAY ANOVA: Expression Variation Across Cell Types (AD samples)")
print("=" * 70)
for gene in genes[:5]:
groups = [expression_data[gene][ct]['AD'] for ct in cell_types]
f_stat, p_anova = stats.f_oneway(*groups)
sig = "***" if p_anova < 0.001 else ("**" if p_anova < 0.01 else ("*" if p_anova < 0.05 else "ns"))
print(f" {gene:15s} F={f_stat:8.2f} p={p_anova:.2e} {sig}")
else:
print("No significant results found at p < 0.05")
====================================================================== STATISTICAL ANALYSIS SUMMARY ====================================================================== Significant results (p < 0.05): 3 gene-cell type pairs Gene Cell_Type Log2FC P_value Cohens_d Effect TREM2 Microglia 1.302864 1.467927e-07 1.172228 Large TREM2 Astrocytes 0.275097 3.960883e-02 0.405000 Small HDAC3 Inh Neurons 0.326855 4.303850e-02 0.405556 Small ====================================================================== ONE-WAY ANOVA: Expression Variation Across Cell Types (AD samples) ====================================================================== SIRT1 F= 1.21 p=3.07e-01 ns HDAC3 F= 1.67 p=1.58e-01 ns BRD4 F= 1.40 p=2.34e-01 ns HDAC F= 1.49 p=2.05e-01 ns SIRT3 F= 0.54 p=7.04e-01 ns
5. Pathway Enrichment AnalysisΒΆ
Hypergeometric test for enrichment of hypothesis target genes in curated biological pathways (Reactome/KEGG-style). Identifies which molecular processes are overrepresented.
# --- Pathway Enrichment Analysis ---
# Using curated pathway-gene associations (Reactome/KEGG-style)
target_genes = ['SIRT1', 'HDAC3', 'BRD4', 'HDAC', 'SIRT3', 'TET2', 'OCT4']
pathways = {
'Amyloid Processing': ['APP', 'PSEN1', 'PSEN2', 'BACE1', 'ADAM10', 'APH1A'],
'Tau Phosphorylation': ['MAPT', 'GSK3B', 'CDK5', 'DYRK1A', 'PP2A', 'MARK4'],
'Microglial Activation': ['TREM2', 'CD33', 'TYROBP', 'SPI1', 'C1QA', 'C3'],
'Neuroinflammation': ['IL1B', 'TNF', 'IL6', 'NFKB1', 'NLRP3', 'CXCL10'],
'Lipid Metabolism': ['APOE', 'CLU', 'ABCA7', 'ABCA1', 'LDLR', 'LRP1'],
'Synaptic Function': ['SYP', 'SLC17A7', 'SNAP25', 'DLG4', 'GRIA1', 'GRIN2B'],
'Mitochondrial Function': ['PINK1', 'PRKN', 'SIRT3', 'TFAM', 'PPARGC1A', 'DNM1L'],
'Autophagy-Lysosome': ['LAMP1', 'SQSTM1', 'BECN1', 'ATG5', 'TFEB', 'GRN'],
'Oxidative Stress': ['SOD1', 'SOD2', 'GPX4', 'NRF2', 'HMOX1', 'CAT'],
'Calcium Signaling': ['CALM1', 'CAMK2A', 'ITPR1', 'RYR2', 'ATP2A2', 'CACNA1C'],
'Ferroptosis': ['ACSL4', 'GPX4', 'SLC7A11', 'LPCAT3', 'TFRC', 'FTH1'],
'Epigenetic Regulation': ['HDAC1', 'DNMT1', 'TET2', 'KDM6A', 'EZH2', 'SIRT1'],
}
# Compute enrichment scores (hypergeometric test)
from scipy.stats import hypergeom
total_genes = 20000 # approximate genome size
query_size = len(target_genes)
enrichment_results = []
for pathway_name, pathway_genes in pathways.items():
overlap = set(target_genes) & set(pathway_genes)
if len(overlap) > 0 or np.random.random() < 0.3: # Include some non-overlapping for context
k = len(overlap)
M = total_genes
n = len(pathway_genes)
N = query_size
pval = hypergeom.sf(k - 1, M, n, N) if k > 0 else 1.0
fold_enrichment = (k / max(N, 1)) / (n / M) if n > 0 else 0
enrichment_results.append({
'Pathway': pathway_name,
'Genes_in_Pathway': n,
'Overlap': k,
'Overlap_Genes': ', '.join(overlap) if overlap else '-',
'Fold_Enrichment': fold_enrichment,
'P_value': pval,
})
enrich_df = pd.DataFrame(enrichment_results).sort_values('P_value')
enrich_df['Significant'] = enrich_df['P_value'] < 0.05
print("Pathway Enrichment Results:")
print(enrich_df[['Pathway', 'Overlap', 'Fold_Enrichment', 'P_value', 'Significant']].to_string(index=False))
Pathway Enrichment Results:
Pathway Overlap Fold_Enrichment P_value Significant
Epigenetic Regulation 2 952.380952 0.000002 True
Mitochondrial Function 1 476.190476 0.002098 True
Tau Phosphorylation 0 0.000000 1.000000 False
Microglial Activation 0 0.000000 1.000000 False
Synaptic Function 0 0.000000 1.000000 False
Neuroinflammation 0 0.000000 1.000000 False
Calcium Signaling 0 0.000000 1.000000 False
# --- Pathway Enrichment Bar Plot ---
fig, ax = plt.subplots(figsize=(10, 6))
fig.patch.set_facecolor('#0a0a14')
ax.set_facecolor('#151525')
plot_df = enrich_df.head(10).sort_values('P_value', ascending=False)
colors = ['#ef5350' if p < 0.05 else '#555' for p in plot_df['P_value']]
bars = ax.barh(plot_df['Pathway'], -np.log10(plot_df['P_value'].clip(lower=1e-20)),
color=colors, edgecolor='white', linewidth=0.5, height=0.6)
ax.axvline(-np.log10(0.05), color='#ffd54f', linestyle='--', alpha=0.7, label='p=0.05 threshold')
ax.set_xlabel('-Log10(p-value)', color='#e0e0e0', fontsize=11)
ax.set_title('Pathway Enrichment Analysis', fontsize=12, fontweight='bold', color='#4fc3f7')
ax.tick_params(colors='#e0e0e0')
ax.legend(facecolor='#151525', edgecolor='#333', labelcolor='#e0e0e0')
# Add overlap counts
for bar, overlap in zip(bars, plot_df['Overlap']):
if overlap > 0:
ax.text(bar.get_width() + 0.1, bar.get_y() + bar.get_height()/2,
f'{overlap} genes', va='center', fontsize=8, color='#81c784')
plt.tight_layout()
plt.show()
6. Hypothesis Multi-Dimensional ScoringΒΆ
Top hypotheses scored across 6 key dimensions: mechanistic plausibility, evidence strength, novelty, feasibility, therapeutic impact, and druggability.
# --- Hypothesis Scoring Radar Chart ---
hyp_data = [{"title": "Nutrient-Sensing Epigenetic Circuit Reactivation", "scores": {"Mechanistic": 0.9, "Evidence": 0.85, "Novelty": 0.7, "Feasibility": 0.95, "Impact": 0.85, "Druggability": 0.9}}, {"title": "Selective HDAC3 Inhibition with Cognitive Enhancem", "scores": {"Mechanistic": 0.75, "Evidence": 0.8, "Novelty": 0.85, "Feasibility": 0.7, "Impact": 0.8, "Druggability": 0.75}}, {"title": "Chromatin Accessibility Restoration via BRD4 Modul", "scores": {"Mechanistic": 0.65, "Evidence": 0.6, "Novelty": 0.9, "Feasibility": 0.6, "Impact": 0.7, "Druggability": 0.95}}]
categories = list(hyp_data[0]['scores'].keys())
N = len(categories)
angles = [n / float(N) * 2 * np.pi for n in range(N)]
angles += angles[:1]
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
fig.patch.set_facecolor('#0a0a14')
ax.set_facecolor('#151525')
colors = ['#4fc3f7', '#ef5350', '#81c784']
for idx, h in enumerate(hyp_data):
values = [h['scores'][cat] for cat in categories]
values += values[:1]
ax.plot(angles, values, 'o-', linewidth=2, color=colors[idx % 3],
label=h['title'][:40], markersize=4)
ax.fill(angles, values, alpha=0.1, color=colors[idx % 3])
ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories, fontsize=9, color='#e0e0e0')
ax.set_ylim(0, 1)
ax.set_yticks([0.2, 0.4, 0.6, 0.8])
ax.set_yticklabels(['0.2', '0.4', '0.6', '0.8'], fontsize=8, color='#888')
ax.tick_params(colors='#e0e0e0')
ax.grid(color=(1, 1, 1, 0.1))
ax.spines['polar'].set_color((1, 1, 1, 0.2))
ax.set_title('Hypothesis Multi-Dimensional Scoring', fontsize=12,
fontweight='bold', color='#4fc3f7', pad=20)
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1),
facecolor='#151525', edgecolor='#333', labelcolor='#e0e0e0', fontsize=8)
plt.tight_layout()
plt.show()
7. Knowledge Graph EdgesΒΆ
Causal relationships extracted from the multi-agent debate:
| Source | Relation | Target | Confidence |
|---|---|---|---|
| SIRT1 | therapeutic_target | neurodegeneration | 0.78 |
| HDAC3 | therapeutic_target | neurodegeneration | 0.73 |
| BRD4 | therapeutic_target | neurodegeneration | 0.66 |
| SIRT3 | therapeutic_target | neurodegeneration | 0.64 |
| TET2 | therapeutic_target | neurodegeneration | 0.56 |
| OCT4 | therapeutic_target | neurodegeneration | 0.55 |
| SIRT1 | associated_with | SIRT3 | 0.8 |
| TET2 | regulates | DNA_methylation | 0.75 |
| SIRT1 | regulates | chromatin_remodeling | 0.8 |
| BRD4 | regulates | chromatin_remodeling | 0.7 |
| SIRT3 | regulates | mitochondria | 0.7 |
| OCT4 | activates | cellular_reprogramming | 0.8 |
| APP | co_discussed | SIRT1 | 0.4 |
| PARP1 | co_discussed | SIRT1 | 0.4 |
| PARP1 | co_discussed | SIRT3 | 0.4 |
| BDNF | co_discussed | SYN1 | 0.4 |
| DLG4 | co_discussed | PARP1 | 0.4 |
| DLG4 | co_discussed | SYN1 | 0.4 |
| PARP1 | co_discussed | SYN1 | 0.4 |
| BDNF | co_discussed | HDAC | 0.4 |
Total edges: 91
MethodologyΒΆ
This analysis was generated by SciDEX's multi-agent scientific debate system:
- Theorist generates novel hypotheses based on known biology
- Skeptic challenges assumptions and identifies weaknesses
- Domain Expert assesses druggability, feasibility, and clinical relevance
- Synthesizer ranks hypotheses and extracts knowledge graph edges
Gene expression data is simulated based on published SEA-AD atlas findings (Allen Institute for Brain Science).
Generated: 2026-04-02 13:53 UTC
Platform: SciDEX
Source: GitHub