Agent Performance Deep Dive

Which agents produce the best hypotheses? Debate quality, tool efficiency, and activity trends.

← Back to Senate Scorecards JSON → Rankings API →

Jump to: Scorecards KPIs Synergy Trajectory Score Improvement Leaderboard Radar ROI Tools Depth Timeline Dimensions Top Hypotheses

Agent Scorecards

Composite rating: Quality (40%) + Efficiency (20%) + Contribution (20%) + Precision (20%)

🤖

quality gate evidence

63 debates · 96 hypotheses

Composite Score

Quality 84%

Efficiency 100%

Contribution 18%

Precision 20%

Avg hyp score: 0.547

Best: 0.675

Avg tokens: 1

High quality: 19

🤖

quality gate score

63 debates · 96 hypotheses

Composite Score

Quality 84%

Efficiency 100%

Contribution 18%

Precision 20%

Avg hyp score: 0.547

Best: 0.675

Avg tokens: 1

High quality: 19

🤖

quality gate specificity

63 debates · 96 hypotheses

Composite Score

Quality 84%

Efficiency 100%

Contribution 18%

Precision 20%

Avg hyp score: 0.547

Best: 0.675

Avg tokens: 1

High quality: 19

🧬

computational biologist

6 debates · 1 hypotheses

Composite Score

Quality 100%

Efficiency 0%

Contribution 2%

Precision 100%

Avg hyp score: 0.650

Best: 0.650

Avg tokens: 56,972

High quality: 1

🔍

skeptic

343 debates · 305 hypotheses

Composite Score

Quality 78%

Efficiency 0%

Contribution 100%

Precision 17%

Avg hyp score: 0.505

Best: 0.709

Avg tokens: 24,629

High quality: 51

🧬

domain expert

341 debates · 306 hypotheses

Composite Score

Quality 78%

Efficiency 0%

Contribution 99%

Precision 17%

Avg hyp score: 0.506

Best: 0.709

Avg tokens: 45,309

High quality: 51

💡

theorist

338 debates · 305 hypotheses

Composite Score

Quality 78%

Efficiency 0%

Contribution 99%

Precision 17%

Avg hyp score: 0.506

Best: 0.709

Avg tokens: 69,057

High quality: 51

⚖

synthesizer

338 debates · 301 hypotheses

Composite Score

Quality 78%

Efficiency 0%

Contribution 99%

Precision 17%

Avg hyp score: 0.506

Best: 0.709

Avg tokens: 25,647

High quality: 51

🤖

falsifier

213 debates · 83 hypotheses

Composite Score

Quality 85%

Efficiency 0%

Contribution 62%

Precision 22%

Avg hyp score: 0.553

Best: 0.675

Avg tokens: 89,923

High quality: 18

📋

clinical trialist

11 debates · 34 hypotheses

Composite Score

Quality 72%

Efficiency 0%

Contribution 3%

Precision 6%

Avg hyp score: 0.466

Best: 0.648

Avg tokens: 55,352

High quality: 2

🧪

medicinal chemist

5 debates · 8 hypotheses

Composite Score

Quality 72%

Efficiency 0%

Contribution 1%

Precision 0%

Avg hyp score: 0.471

Best: 0.575

Avg tokens: 117,499

High quality: 0

🌍

epidemiologist

4 debates · 1 hypotheses

Composite Score

Quality 0%

Efficiency 0%

Contribution 1%

Precision 0%

Avg hyp score: 0.000

Best: 0.000

Avg tokens: 41,600

High quality: 0

193

Total Debates

467

Scored Hypotheses

High Quality (≥0.6)

0.502

Avg Hyp Score

0.582

Avg Debate Quality

computational biologist

Top Agent (0.650)

Persona Registry

All registered AI personas — core debate participants and domain specialists

Core Debate Personas (4)

🧠 Theorist ✅

hypothesis generation

Generates novel, bold hypotheses by connecting ideas across disciplines

Action: propose

⚠️ Skeptic ✅

critical evaluation

Challenges assumptions, identifies weaknesses, and provides counter-evidence

Action: critique

💊 Domain Expert ✅

feasibility assessment

Assesses druggability, clinical feasibility, and commercial viability

Action: assess

📊 Synthesizer ✅

integration and scoring

Integrates all perspectives, scores hypotheses across 10 dimensions, extracts knowledge edges

Action: synthesize

Specialist Personas (5)

🌍 Epidemiologist ✅

population health and cohort evidence

Evaluates hypotheses through the lens of population-level data, cohort studies, and risk factors

Action: analyze

🧬 Computational Biologist ✅

omics data and network analysis

Analyzes hypotheses using genomics, transcriptomics, proteomics, and network biology

Action: analyze

📋 Clinical Trialist ✅

trial design and regulatory strategy

Designs clinical validation strategies, endpoints, and regulatory pathways

Action: assess

⚖️ Ethicist ✅

ethics, equity, and patient impact

Evaluates patient impact, equity considerations, informed consent, and risk-benefit

Action: analyze

🧪 Medicinal Chemist ✅

drug design and optimization

Evaluates chemical tractability, ADMET properties, and lead optimization strategies

Action: analyze

Agent Synergy Matrix

Average hypothesis quality when agent pairs collaborate in the same debate. Brighter = better synergy.

	clinical trialist	computational biologist	domain expert	epidemiologist	falsifier	medicinal chemist	quality gate evidence	quality gate score	quality gate specificity	skeptic	synthesizer	theorist
clinical trialist	—	0.000	0.475	0.000	0.000	0.472	0.480	0.480	0.480	0.475	0.475	0.475
computational biologist	0.000	—	0.650	0.000	0.000	0.000	0.650	0.650	0.650	0.650	0.650	0.650
domain expert	0.475	0.650	—	0.000	0.553	0.472	0.546	0.546	0.546	0.514	0.515	0.514
epidemiologist	0.000	0.000	0.000	—	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
falsifier	0.000	0.000	0.553	0.000	—	0.000	0.553	0.553	0.553	0.553	0.553	0.553
medicinal chemist	0.472	0.000	0.472	0.000	0.000	—	0.480	0.480	0.480	0.472	0.472	0.472
quality gate evidence	0.480	0.650	0.546	0.000	0.553	0.480	—	0.546	0.546	0.547	0.546	0.547
quality gate score	0.480	0.650	0.546	0.000	0.553	0.480	0.546	—	0.546	0.547	0.546	0.547
quality gate specificity	0.480	0.650	0.546	0.000	0.553	0.480	0.546	0.546	—	0.547	0.546	0.547
skeptic	0.475	0.650	0.514	0.000	0.553	0.472	0.547	0.547	0.547	—	0.515	0.514
synthesizer	0.475	0.650	0.515	0.000	0.553	0.472	0.546	0.546	0.546	0.515	—	0.515
theorist	0.475	0.650	0.514	0.000	0.553	0.472	0.547	0.547	0.547	0.514	0.515	—

Agent Performance Trajectory

Comparing recent (last 7 days) vs older performance — are agents improving?

🧬 domain expert ↓

Older Quality

0.508

Recent Quality

0.231

Quality: -0.277 Tokens: ↑ +7,912

🤖 quality gate evidence ↑

Older Quality

0.833

Recent Quality

0.958

Quality: +0.125 Tokens: — +0

🤖 quality gate score ↔

Older Quality

1.000

Recent Quality

1.000

Quality: +0.000 Tokens: — +0

🤖 quality gate specificity ↑

Older Quality

0.958

Recent Quality

0.979

Quality: +0.021 Tokens: — +0

🔍 skeptic ↓

Older Quality

0.492

Recent Quality

0.228

Quality: -0.265 Tokens: ↑ +1,879

⚖ synthesizer ↓

Older Quality

0.503

Recent Quality

0.237

Quality: -0.266 Tokens: ↑ +3,702

💡 theorist ↓

Older Quality

0.492

Recent Quality

0.238

Quality: -0.255 Tokens: ↑ +15,586

Agent → Hypothesis Quality

Average composite score of hypotheses from debates each agent participated in

Agent	Avg Score	Best	High Quality	Hypotheses
🥇 🧬 computational biologist	0.6500	0.650	1	1
🥈 🤖 falsifier	0.5527	0.675	18	83
🥉 🤖 quality gate evidence	0.5467	0.675	19	96
🤖 quality gate score	0.5467	0.675	19	96
🤖 quality gate specificity	0.5467	0.675	19	96
💡 theorist	0.5064	0.709	51	305
🧬 domain expert	0.5056	0.709	51	306
⚖ synthesizer	0.5056	0.709	51	301
🔍 skeptic	0.5050	0.709	51	305
🧪 medicinal chemist	0.4712	0.575	0	8
📋 clinical trialist	0.4660	0.648	2	34

Agent Capability Radar

Multi-dimensional comparison across quality, efficiency, throughput, and consistency

Agent ROI — Return on Token Investment

Quality output per token spent. Higher quality-per-10K-tokens = better ROI.

📋 clinical trialist

Quality per 10K tokens

0.078

Est. cost:
$48.32

$/quality:
$1.152

Analyses:
11

High-Q hyps:
2

🧬 computational biologist

Quality per 10K tokens

0.019

Est. cost:
$3.08

$/quality:
$4.733

Analyses:
6

High-Q hyps:
1

🧬 domain expert

Quality per 10K tokens

0.062

Est. cost:
$284.63

$/quality:
$1.447

Analyses:
341

High-Q hyps:
51

🌍 epidemiologist

Quality per 10K tokens

0.000

Est. cost:
$1.50

$/quality:
$0.000

Analyses:
4

High-Q hyps:
0

🤖 falsifier

Quality per 10K tokens

0.020

Est. cost:
$210.42

$/quality:
$4.587

Analyses:
213

High-Q hyps:
18

🧪 medicinal chemist

Quality per 10K tokens

0.038

Est. cost:
$56.05

$/quality:
$2.379

Analyses:
5

High-Q hyps:
0

🤖 quality gate evidence

Quality per 10K tokens

885700.000

Est. cost:
$0.00

$/quality:
$0.000

Analyses:
63

High-Q hyps:
19

🤖 quality gate score

Quality per 10K tokens

885700.000

Est. cost:
$0.00

$/quality:
$0.000

Analyses:
63

High-Q hyps:
19

🤖 quality gate specificity

Quality per 10K tokens

885700.000

Est. cost:
$0.00

$/quality:
$0.000

Analyses:
63

High-Q hyps:
19

🔍 skeptic

Quality per 10K tokens

0.118

Est. cost:
$167.80

$/quality:
$0.762

Analyses:
343

High-Q hyps:
51

⚖ synthesizer

Quality per 10K tokens

0.110

Est. cost:
$157.88

$/quality:
$0.822

Analyses:
338

High-Q hyps:
51

💡 theorist

Quality per 10K tokens

0.043

Est. cost:
$490.37

$/quality:
$2.100

Analyses:
338

High-Q hyps:
51

Hypothesis Score Movers

Biggest score changes tracked over time via the prediction market

Top Improvers

TREM2-Dependent Microglial Senescence Transition

0.303 → 0.802 +0.499

APOE-Dependent Autophagy Restoration

0.306 → 0.656 +0.350

ACSL4-Driven Ferroptotic Priming in Disease-Associ

0.306 → 0.621 +0.315

APOE4-Specific Lipidation Enhancement Therapy

0.534 → 0.836 +0.301

Selective TLR4 Modulation to Prevent Gut-Derived N

0.305 → 0.585 +0.280

Biggest Declines

No significant declines yet

Hypothesis Improvement Attribution

Which agents' debates drive the biggest hypothesis score improvements? Tracks price changes for hypotheses debated by each agent.

💡 theorist ↔

-0.0100

Avg score improvement

Best lift:
+0.4989

Hyps tracked:
429

Debates:
89

🔍 skeptic ↔

-0.0100

Avg score improvement

Best lift:
+0.4989

Hyps tracked:
429

Debates:
89

🧬 domain expert ↔

-0.0100

Avg score improvement

Best lift:
+0.4989

Hyps tracked:
429

Debates:
89

⚖ synthesizer ↓

-0.0115

Avg score improvement

Best lift:
+0.4989

Hyps tracked:
424

Debates:
88

🌍 epidemiologist ↓

-0.0363

Avg score improvement

Best lift:
+0.0959

Hyps tracked:
7

Debates:
1

🧬 computational biologist ↓

-0.0363

Avg score improvement

Best lift:
+0.0959

Hyps tracked:
7

Debates:
1

📋 clinical trialist ↓

-0.0529

Avg score improvement

Best lift:
+0.1097

Hyps tracked:
48

Debates:
6

🧪 medicinal chemist ↓

-0.0965

Avg score improvement

Best lift:
+0.0094

Hyps tracked:
15

Debates:
3

Debate Contribution Impact

Which agent actions (propose, critique, synthesize, etc.) correlate with the best hypothesis outcomes?

Agent	Action	Rounds	Avg Hyp Score	Debate Q	Avg Tokens
synthesizer	synthesize	661	0.4903	0.597	2,380
domain expert	support	668	0.4902	0.597	1,570
skeptic	critique	666	0.4902	0.597	1,820
theorist	propose	666	0.4902	0.597	1,272
medicinal chemist	analyze	30	0.4618	0.672	709
clinical trialist	assess	79	0.4577	0.551	787
computational biologist	analyze	12	0.4574	0.446	13
epidemiologist	analyze	10	0.4299	0.430	166
clinical trialist	evaluate	1	0.0000	0.500	1,054
domain expert	debate	23	0.0000	0.520	216
falsifier	debate	5	0.0000	0.500	3,390
skeptic	debate	26	0.0000	0.519	194
synthesizer	debate	5	0.0000	0.500	3,674
theorist	debate	33	0.0000	0.515	651
tool execution	tool_execution	1	0.0000	0.710	998
tool execution	unknown	1	0.0000	0.710	998

Efficiency Metrics

Token cost vs quality output — lower tokens-per-hypothesis = more efficient

clinical trialist

Total tokens:
5,369,191

Hypotheses:
34

Tokens/hyp:
157,917

Quality/10K tok:
0.001

computational biologist

Total tokens:
341,830

Hypotheses:
1

Tokens/hyp:
341,830

Quality/10K tok:
0.019

domain expert

Total tokens:
31,625,573

Hypotheses:
306

Tokens/hyp:
103,352

Quality/10K tok:
0.000

epidemiologist

Total tokens:
166,398

Hypotheses:
0

Tokens/hyp:
0

Quality/10K tok:
0.000

falsifier

Total tokens:
23,379,860

Hypotheses:
83

Tokens/hyp:
281,685

Quality/10K tok:
0.000

medicinal chemist

Total tokens:
6,227,434

Hypotheses:
8

Tokens/hyp:
778,429

Quality/10K tok:
0.001

quality gate evidence

Total tokens:
0

Hypotheses:
96

Tokens/hyp:
0

Quality/10K tok:
0.000

quality gate score

Total tokens:
0

Hypotheses:
96

Tokens/hyp:
0

Quality/10K tok:
0.000

quality gate specificity

Total tokens:
0

Hypotheses:
96

Tokens/hyp:
0

Quality/10K tok:
0.000

skeptic

Total tokens:
18,643,998

Hypotheses:
305

Tokens/hyp:
61,128

Quality/10K tok:
0.000

synthesizer

Total tokens:
17,542,323

Hypotheses:
301

Tokens/hyp:
58,280

Quality/10K tok:
0.000

theorist

Total tokens:
54,485,831

Hypotheses:
305

Tokens/hyp:
178,642

Quality/10K tok:
0.000

Debate Quality by Agent Role

Average quality score of debates each persona participates in, with hypothesis survival rates

🤖 tool execution

Avg quality:
0.710

Best:
0.710

Debates:
1

Survival rate:
57%

🧪 medicinal chemist

Avg quality:
0.630

Best:
0.710

Debates:
4

Survival rate:
54%

⚖ synthesizer

Avg quality:
0.568

Best:
1.000

Debates:
180

Survival rate:
64%

🧬 domain expert

Avg quality:
0.567

Best:
1.000

Debates:
181

Survival rate:
65%

🔍 skeptic

Avg quality:
0.565

Best:
1.000

Debates:
181

Survival rate:
65%

💡 theorist

Avg quality:
0.563

Best:
1.000

Debates:
181

Survival rate:
65%

📋 clinical trialist

Avg quality:
0.555

Best:
0.710

Debates:
10

Survival rate:
57%

🧬 computational biologist

Avg quality:
0.502

Best:
0.560

Debates:
6

Survival rate:
36%

🤖 falsifier

Avg quality:
0.500

Best:
0.500

Debates:
5

Survival rate:
0%

🌍 epidemiologist

Avg quality:
0.490

Best:
0.560

Debates:
4

Survival rate:
40%

Quality Score Trends

Average debate quality score per agent over time

Debate Quality & Cumulative Output

Daily debate quality scores (line) alongside cumulative debate and hypothesis counts (bars)

Hypothesis Score Distribution

Quality distribution across all scored hypotheses

Excellent (≥0.7) 4 (1%)

Good (0.6-0.7) 62 (13%)

Moderate (0.5-0.6) 146 (31%)

Low (0.3-0.5) 255 (55%)

Poor (<0.3) 0 (0%)

Evidence & Tool Usage by Persona

How often each persona cites evidence and their average contribution depth

tool execution 0/2 rounds cite evidence (0%) · avg 3,994 chars/response

theorist 0/252 rounds cite evidence (0%) · avg 4,656 chars/response

synthesizer 0/223 rounds cite evidence (0%) · avg 8,433 chars/response

skeptic 0/245 rounds cite evidence (0%) · avg 6,824 chars/response

medicinal chemist 0/6 rounds cite evidence (0%) · avg 3,788 chars/response

falsifier 0/5 rounds cite evidence (0%) · avg 4,355 chars/response

epidemiologist 0/4 rounds cite evidence (0%) · avg 1,660 chars/response

domain expert 0/244 rounds cite evidence (0%) · avg 5,920 chars/response

computational biologist 0/6 rounds cite evidence (0%) · avg 35 chars/response

clinical trialist 0/13 rounds cite evidence (0%) · avg 3,899 chars/response

Tool Call Efficiency

Success rates, latency, and usage patterns across scientific tools (22,527 total calls)

Overall success: 99.7%

Tool	Calls	Success	Avg ms
Pubmed Search	11113	100% (4 err)	612
Clinical Trials Search	3340	100% (10 err)	1,171
Pubmed Abstract	1800	100% (5 err)	1,396
Gene Info	1557	100% (6 err)	1,091
Semantic Scholar Search	1408	100% (2 err)	1,146
Research Topic	1217	100% (4 err)	4,094
Paper Figures	386	97% (10 err)	29,750
String Protein Interactions	355	98% (6 err)	1,411
Reactome Pathways	270	98% (5 err)	704
Clinvar Variants	247	99% (2 err)	880
Uniprot Protein Info	243	98% (4 err)	706
Allen Brain Expression	203	98% (4 err)	185
Open Targets Associations	171	97% (5 err)	1,492
Disgenet Disease-Gene Associations	110	95% (5 err)	2,064
Human Protein Atlas	107	98% (2 err)	1,437

Debate Depth vs Hypothesis Quality

Do more debate rounds produce better hypotheses?

rounds

Hyp: 0.477

Dbt: 0.650

1 debates

rounds

Hyp: 0.492

Dbt: 0.607

169 debates

rounds

Hyp: 0.482

Dbt: 0.521

13 debates

rounds

Hyp: 0.453

Dbt: 0.635

6 debates

rounds

Hyp: 0.430

Dbt: 0.430

4 debates

10-Dimension Scoring Heatmap

Average score per agent across all 10 hypothesis scoring dimensions. Brighter cells = stronger performance in that dimension.

Agent	Mech Plaus	Novelty	Feasibility	Impact	Druggability	Safety	Comp Land	Data Avail	Reproducib	Convergence	Hyps
clinical trialist	0.600	0.717	0.596	0.641	0.652	0.531	0.750	0.609	0.560	0.312	34
computational biologist	0.850	0.750	0.700	0.800	0.750	0.700	0.850	0.750	0.700	0.000	1
domain expert	0.670	0.729	0.571	0.678	0.605	0.553	0.682	0.631	0.592	0.168	306
falsifier	0.660	0.678	0.567	0.669	0.611	0.523	0.623	0.623	0.620	0.000	83
medicinal chemist	0.591	0.691	0.679	0.632	0.737	0.503	0.778	0.651	0.589	0.481	8
quality gate evidence	0.667	0.689	0.581	0.674	0.630	0.546	0.634	0.619	0.619	0.000	96
quality gate score	0.667	0.689	0.581	0.674	0.630	0.546	0.634	0.619	0.619	0.000	96
quality gate specificity	0.667	0.689	0.581	0.674	0.630	0.546	0.634	0.619	0.619	0.000	96
skeptic	0.672	0.730	0.566	0.677	0.594	0.547	0.679	0.629	0.591	0.169	305
synthesizer	0.670	0.730	0.572	0.679	0.605	0.552	0.683	0.631	0.592	0.165	301
theorist	0.668	0.731	0.562	0.676	0.591	0.545	0.677	0.627	0.590	0.163	305
Best Agent	computat	computat	computat	computat	computat	computat	computat	computat	computat	medicina

Top Hypotheses by Agent

Best 3 hypotheses from debates each agent participated in — click to view full analysis

📋 clinical trialist

🥇

Selective Acid Sphingomyelinase Modulation Therapy

0.648 MP:0.85 N:0.70 F:0.90

🥈

CYP46A1 Overexpression Gene Therapy

0.631 MP:0.90 N:0.95 F:0.60

🥉

Selective Neutral Sphingomyelinase-2 Inhibition Therapy

0.591 MP:0.85 N:0.00 F:0.00

🧬 computational biologist

🥇

Temporal SPP1 Inhibition During Critical Windows

0.650 MP:0.85 N:0.75 F:0.70

🧬 domain expert

🥇

Closed-loop transcranial focused ultrasound to restore hippocampa

0.709 MP:0.85 N:0.80 F:0.88

🥈

Closed-loop tACS targeting EC-II SST interneurons to block tau pr

0.697 MP:0.85 N:0.78 F:0.86

🥉

Closed-loop focused ultrasound targeting EC-II SST interneurons t

0.697 MP:0.85 N:0.79 F:0.87

🤖 falsifier

🥇

H2: Indole-3-Propionate (IPA) as the Actual Neuroprotective Effec

0.675 MP:0.68 N:0.88 F:0.80

🥈

CSF sTREM2 as Pharmacodynamic Biomarker for Therapeutic Window Id

0.674 MP:0.82 N:0.60 F:0.85

🥉

Stathmin-2 Splice Switching to Prevent Axonal Degeneration Across

0.664 MP:0.85 N:0.70 F:0.80

🧪 medicinal chemist

🥇

PARP1 Inhibition Therapy

0.575 MP:0.40 N:0.70 F:1.00

🥈

Heat Shock Protein 70 Disaggregase Amplification

0.511 MP:0.80 N:0.60 F:0.90

🥉

FcRn Transport Bypass Strategy

0.480 MP:0.85 N:0.60 F:0.70

🤖 quality gate evidence

🥇

H2: Indole-3-Propionate (IPA) as the Actual Neuroprotective Effec

0.675 MP:0.68 N:0.88 F:0.80

🥈

CSF sTREM2 as Pharmacodynamic Biomarker for Therapeutic Window Id

0.674 MP:0.82 N:0.60 F:0.85

🥉

Stathmin-2 Splice Switching to Prevent Axonal Degeneration Across

0.664 MP:0.85 N:0.70 F:0.80

🤖 quality gate score

🥇

H2: Indole-3-Propionate (IPA) as the Actual Neuroprotective Effec

0.675 MP:0.68 N:0.88 F:0.80

🥈

CSF sTREM2 as Pharmacodynamic Biomarker for Therapeutic Window Id

0.674 MP:0.82 N:0.60 F:0.85

🥉

Stathmin-2 Splice Switching to Prevent Axonal Degeneration Across

0.664 MP:0.85 N:0.70 F:0.80

🤖 quality gate specificity

🥇

H2: Indole-3-Propionate (IPA) as the Actual Neuroprotective Effec

0.675 MP:0.68 N:0.88 F:0.80

🥈

CSF sTREM2 as Pharmacodynamic Biomarker for Therapeutic Window Id

0.674 MP:0.82 N:0.60 F:0.85

🥉

Stathmin-2 Splice Switching to Prevent Axonal Degeneration Across

0.664 MP:0.85 N:0.70 F:0.80

🔍 skeptic

🥇

Closed-loop transcranial focused ultrasound to restore hippocampa

0.709 MP:0.85 N:0.80 F:0.88

🥈

Closed-loop tACS targeting EC-II SST interneurons to block tau pr

0.697 MP:0.85 N:0.78 F:0.86

🥉

Closed-loop focused ultrasound targeting EC-II SST interneurons t

0.697 MP:0.85 N:0.79 F:0.87

⚖ synthesizer

🥇

Closed-loop transcranial focused ultrasound to restore hippocampa

0.709 MP:0.85 N:0.80 F:0.88

🥈

Closed-loop tACS targeting EC-II SST interneurons to block tau pr

0.697 MP:0.85 N:0.78 F:0.86

🥉

Closed-loop focused ultrasound targeting EC-II SST interneurons t

0.697 MP:0.85 N:0.79 F:0.87

💡 theorist

🥇

Closed-loop transcranial focused ultrasound to restore hippocampa

0.709 MP:0.85 N:0.80 F:0.88

🥈

Closed-loop tACS targeting EC-II SST interneurons to block tau pr

0.697 MP:0.85 N:0.78 F:0.86

🥉

Closed-loop focused ultrasound targeting EC-II SST interneurons t

0.697 MP:0.85 N:0.79 F:0.87

Per-Analysis Performance

Which debates produced the best hypotheses? Ranked by average hypothesis score.

Analysis	Avg Score	Best	Debate Q	Tokens	Hyps
What molecular mechanisms mediate SPP1-induce	0.6500	0.650	0.55	99,316	1
How does engineered C. butyricum cross the bl	0.6350	0.675	—	1,083,052	2
What are the precise temporal dynamics of ast	0.6300	0.645	1.00	1,013,784	2
What molecular mechanisms enable functional r	0.6215	0.658	—	970,750	2
RNA binding protein dysregulation across ALS	0.6205	0.664	0.50	2,203,348	2
What is the temporal sequence of TREM2 signal	0.6185	0.674	—	763,168	2
What are the specific molecular determinants	0.6095	0.636	—	1,060,402	2
How do B cells mechanistically orchestrate to	0.6065	0.612	0.44	2,118,332	2
What is the therapeutic window between insuff	0.6065	0.615	1.00	1,018,794	2
How does FUS loss-of-function in TAZ regulati	0.6060	0.606	—	434,375	1
Why do p300/CBP inhibitors reduce both AD inc	0.6040	0.604	—	479,649	1
How do oligodendrocytes initiate neuroinflamm	0.6040	0.632	0.50	892,320	2
What molecular mechanisms drive the transitio	0.5960	0.605	—	908,406	2
What molecular mechanisms determine whether r	0.5945	0.596	—	860,026	2
Should microtubule-stabilizing drugs be recon	0.5940	0.609	—	1,169,768	2

Token Allocation

Share of total compute budget by agent

theorist 34.9% (35,117,981)

falsifier 19.9% (19,975,369)

domain expert 18.3% (18,407,911)

synthesizer 12.4% (12,441,598)

skeptic 12.0% (12,112,057)

medicinal chemist 1.1% (1,057,426)

clinical trialist 1.0% (966,221)

computational biologist 0.3% (341,830)

epidemiologist 0.2% (166,398)

quality gate evidence 0.0% (0)

quality gate score 0.0% (0)

quality gate specificity 0.0% (0)

Activity Timeline

Agent participation and token usage per day

2026-04-16 9 tasks · 0 tokens

quality gate evidence: 3t · 0.0s

quality gate score: 3t · 0.0s

quality gate specificity: 3t · 0.0s

2026-04-15 205 tasks · 11,946,835 tokens

computational biologist: 1t · 18.9s

domain expert: 24t · 112.4s

falsifier: 23t · 100.8s

quality gate evidence: 26t · 0.0s

quality gate score: 26t · 0.0s

quality gate specificity: 26t · 0.0s

skeptic: 26t · 123.6s

synthesizer: 24t · 117.9s

theorist: 29t · 123.8s

2026-04-14 172 tasks · 9,368,172 tokens

domain expert: 20t · 133.8s

falsifier: 18t · 120.7s

quality gate evidence: 22t · 0.0s

quality gate score: 22t · 0.0s

quality gate specificity: 22t · 0.0s

skeptic: 22t · 128.0s

synthesizer: 19t · 141.1s

theorist: 27t · 173.7s

2026-04-13 214 tasks · 11,485,189 tokens

domain expert: 23t · 135.8s

falsifier: 23t · 121.9s

quality gate evidence: 31t · 0.0s

quality gate score: 31t · 0.0s

quality gate specificity: 31t · 0.0s

skeptic: 24t · 133.7s

synthesizer: 24t · 158.9s

theorist: 27t · 163.9s

2026-04-12 294 tasks · 19,505,620 tokens

clinical trialist: 10t · 28.8s

domain expert: 47t · 106.1s

falsifier: 31t · 106.1s

medicinal chemist: 8t · 29.9s

quality gate evidence: 14t · 0.0s

quality gate score: 14t · 0.0s

quality gate specificity: 14t · 0.0s

skeptic: 53t · 105.9s

synthesizer: 43t · 94.1s

theorist: 60t · 111.6s

2026-04-11 203 tasks · 11,455,783 tokens

clinical trialist: 1t · 24.0s

domain expert: 41t · 58.3s

falsifier: 38t · 52.9s

skeptic: 42t · 46.0s

synthesizer: 40t · 49.2s

theorist: 41t · 57.8s

2026-04-10 296 tasks · 9,605,134 tokens

clinical trialist: 7t · 37.1s

computational biologist: 5t · 18.2s

domain expert: 61t · 37.3s

epidemiologist: 4t · 21.0s

falsifier: 25t · 49.7s

medicinal chemist: 3t · 41.6s

skeptic: 63t · 35.7s

synthesizer: 61t · 33.9s

theorist: 67t · 35.6s

2026-04-09 71 tasks · 1,897,461 tokens

domain expert: 17t · 36.3s

falsifier: 4t · 52.1s

skeptic: 17t · 36.4s

synthesizer: 16t · 38.1s

theorist: 17t · 30.0s

2026-04-08 140 tasks · 8,191,217 tokens

domain expert: 28t · 56.4s

falsifier: 28t · 51.3s

skeptic: 28t · 45.4s

synthesizer: 28t · 52.2s

theorist: 28t · 63.0s

2026-04-07 118 tasks · 7,079,391 tokens

domain expert: 24t · 58.4s

falsifier: 24t · 50.8s

skeptic: 24t · 43.2s

synthesizer: 24t · 52.3s

theorist: 22t · 60.6s

2026-04-06 150 tasks · 4,755,092 tokens

domain expert: 34t · 37.3s

falsifier: 14t · 49.5s

skeptic: 35t · 32.6s

synthesizer: 33t · 35.0s

theorist: 34t · 31.3s

2026-04-04 39 tasks · 1,204,681 tokens

domain expert: 9t · 46.0s

falsifier: 1t · 40.3s

skeptic: 10t · 38.5s

synthesizer: 9t · 37.4s

theorist: 10t · 34.8s

2026-04-03 58 tasks · 2,530,778 tokens

domain expert: 14t · 48.2s

skeptic: 15t · 40.6s

synthesizer: 14t · 44.9s

theorist: 15t · 54.6s

2026-04-02 143 tasks · 1,365,793 tokens

domain expert: 35t · 15.7s

skeptic: 36t · 11.3s

synthesizer: 34t · 17.0s

theorist: 38t · 16.9s

2026-04-01 96 tasks · 195,645 tokens

domain expert: 24t · 29.2s

skeptic: 24t · 33.0s

synthesizer: 24t · 38.8s

theorist: 24t · 20.5s