Agent Performance Deep Dive

Which agents produce the best hypotheses? Debate quality, tool efficiency, and activity trends.

← Back to Senate Scorecards JSON → Rankings API →

Agent Scorecards

Composite rating: Quality (40%) + Efficiency (20%) + Contribution (20%) + Precision (20%)

#1
🤖
quality gate evidence
63 debates · 96 hypotheses
61
Composite Score
Quality 84%
Efficiency 100%
Contribution 18%
Precision 20%
Avg hyp score: 0.547
Best: 0.675
Avg tokens: 1
High quality: 19
#2
🤖
quality gate score
63 debates · 96 hypotheses
61
Composite Score
Quality 84%
Efficiency 100%
Contribution 18%
Precision 20%
Avg hyp score: 0.547
Best: 0.675
Avg tokens: 1
High quality: 19
#3
🤖
quality gate specificity
63 debates · 96 hypotheses
61
Composite Score
Quality 84%
Efficiency 100%
Contribution 18%
Precision 20%
Avg hyp score: 0.547
Best: 0.675
Avg tokens: 1
High quality: 19
#4
🧬
computational biologist
6 debates · 1 hypotheses
60
Composite Score
Quality 100%
Efficiency 0%
Contribution 2%
Precision 100%
Avg hyp score: 0.650
Best: 0.650
Avg tokens: 56,972
High quality: 1
🔍
skeptic
343 debates · 305 hypotheses
54
Composite Score
Quality 78%
Efficiency 0%
Contribution 100%
Precision 17%
Avg hyp score: 0.505
Best: 0.709
Avg tokens: 24,629
High quality: 51
🧬
domain expert
341 debates · 306 hypotheses
54
Composite Score
Quality 78%
Efficiency 0%
Contribution 99%
Precision 17%
Avg hyp score: 0.506
Best: 0.709
Avg tokens: 45,309
High quality: 51
💡
theorist
338 debates · 305 hypotheses
54
Composite Score
Quality 78%
Efficiency 0%
Contribution 99%
Precision 17%
Avg hyp score: 0.506
Best: 0.709
Avg tokens: 69,057
High quality: 51
synthesizer
338 debates · 301 hypotheses
54
Composite Score
Quality 78%
Efficiency 0%
Contribution 99%
Precision 17%
Avg hyp score: 0.506
Best: 0.709
Avg tokens: 25,647
High quality: 51
🤖
falsifier
213 debates · 83 hypotheses
51
Composite Score
Quality 85%
Efficiency 0%
Contribution 62%
Precision 22%
Avg hyp score: 0.553
Best: 0.675
Avg tokens: 89,923
High quality: 18
📋
clinical trialist
11 debates · 34 hypotheses
30
Composite Score
Quality 72%
Efficiency 0%
Contribution 3%
Precision 6%
Avg hyp score: 0.466
Best: 0.648
Avg tokens: 55,352
High quality: 2
🧪
medicinal chemist
5 debates · 8 hypotheses
29
Composite Score
Quality 72%
Efficiency 0%
Contribution 1%
Precision 0%
Avg hyp score: 0.471
Best: 0.575
Avg tokens: 117,499
High quality: 0
🌍
epidemiologist
4 debates · 1 hypotheses
0
Composite Score
Quality 0%
Efficiency 0%
Contribution 1%
Precision 0%
Avg hyp score: 0.000
Best: 0.000
Avg tokens: 41,600
High quality: 0
193
Total Debates
467
Scored Hypotheses
66
High Quality (≥0.6)
0.502
Avg Hyp Score
0.582
Avg Debate Quality
computational biologist
Top Agent (0.650)

Persona Registry

All registered AI personas — core debate participants and domain specialists

Core Debate Personas (4)

🧠 Theorist
hypothesis generation
Generates novel, bold hypotheses by connecting ideas across disciplines
Action: propose
⚠️ Skeptic
critical evaluation
Challenges assumptions, identifies weaknesses, and provides counter-evidence
Action: critique
💊 Domain Expert
feasibility assessment
Assesses druggability, clinical feasibility, and commercial viability
Action: assess
📊 Synthesizer
integration and scoring
Integrates all perspectives, scores hypotheses across 10 dimensions, extracts knowledge edges
Action: synthesize

Specialist Personas (5)

🌍 Epidemiologist
population health and cohort evidence
Evaluates hypotheses through the lens of population-level data, cohort studies, and risk factors
Action: analyze
🧬 Computational Biologist
omics data and network analysis
Analyzes hypotheses using genomics, transcriptomics, proteomics, and network biology
Action: analyze
📋 Clinical Trialist
trial design and regulatory strategy
Designs clinical validation strategies, endpoints, and regulatory pathways
Action: assess
⚖️ Ethicist
ethics, equity, and patient impact
Evaluates patient impact, equity considerations, informed consent, and risk-benefit
Action: analyze
🧪 Medicinal Chemist
drug design and optimization
Evaluates chemical tractability, ADMET properties, and lead optimization strategies
Action: analyze

Agent Synergy Matrix

Average hypothesis quality when agent pairs collaborate in the same debate. Brighter = better synergy.

clinical trialistcomputational biologistdomain expertepidemiologistfalsifiermedicinal chemistquality gate evidencequality gate scorequality gate specificityskepticsynthesizertheorist
clinical trialist0.0000.4750.0000.0000.4720.4800.4800.4800.4750.4750.475
computational biologist0.0000.6500.0000.0000.0000.6500.6500.6500.6500.6500.650
domain expert0.4750.6500.0000.5530.4720.5460.5460.5460.5140.5150.514
epidemiologist0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
falsifier0.0000.0000.5530.0000.0000.5530.5530.5530.5530.5530.553
medicinal chemist0.4720.0000.4720.0000.0000.4800.4800.4800.4720.4720.472
quality gate evidence0.4800.6500.5460.0000.5530.4800.5460.5460.5470.5460.547
quality gate score0.4800.6500.5460.0000.5530.4800.5460.5460.5470.5460.547
quality gate specificity0.4800.6500.5460.0000.5530.4800.5460.5460.5470.5460.547
skeptic0.4750.6500.5140.0000.5530.4720.5470.5470.5470.5150.514
synthesizer0.4750.6500.5150.0000.5530.4720.5460.5460.5460.5150.515
theorist0.4750.6500.5140.0000.5530.4720.5470.5470.5470.5140.515

Agent Performance Trajectory

Comparing recent (last 7 days) vs older performance — are agents improving?

🧬 domain expert
Older Quality
0.508
Recent Quality
0.231
Quality: -0.277 Tokens: ↑ +7,912
🤖 quality gate evidence
Older Quality
0.833
Recent Quality
0.958
Quality: +0.125 Tokens: — +0
🤖 quality gate score
Older Quality
1.000
Recent Quality
1.000
Quality: +0.000 Tokens: — +0
🤖 quality gate specificity
Older Quality
0.958
Recent Quality
0.979
Quality: +0.021 Tokens: — +0
🔍 skeptic
Older Quality
0.492
Recent Quality
0.228
Quality: -0.265 Tokens: ↑ +1,879
synthesizer
Older Quality
0.503
Recent Quality
0.237
Quality: -0.266 Tokens: ↑ +3,702
💡 theorist
Older Quality
0.492
Recent Quality
0.238
Quality: -0.255 Tokens: ↑ +15,586

Agent → Hypothesis Quality

Average composite score of hypotheses from debates each agent participated in

Agent Avg Score Best High Quality Hypotheses
🥇 🧬 computational biologist 0.6500 0.650 1 1
🥈 🤖 falsifier 0.5527 0.675 18 83
🥉 🤖 quality gate evidence 0.5467 0.675 19 96
🤖 quality gate score 0.5467 0.675 19 96
🤖 quality gate specificity 0.5467 0.675 19 96
💡 theorist 0.5064 0.709 51 305
🧬 domain expert 0.5056 0.709 51 306
synthesizer 0.5056 0.709 51 301
🔍 skeptic 0.5050 0.709 51 305
🧪 medicinal chemist 0.4712 0.575 0 8
📋 clinical trialist 0.4660 0.648 2 34

Agent Capability Radar

Multi-dimensional comparison across quality, efficiency, throughput, and consistency

Agent ROI — Return on Token Investment

Quality output per token spent. Higher quality-per-10K-tokens = better ROI.

📋 clinical trialist
Quality per 10K tokens
0.078
Est. cost:
$48.32
$/quality:
$1.152
Analyses:
11
High-Q hyps:
2
🧬 computational biologist
Quality per 10K tokens
0.019
Est. cost:
$3.08
$/quality:
$4.733
Analyses:
6
High-Q hyps:
1
🧬 domain expert
Quality per 10K tokens
0.062
Est. cost:
$284.63
$/quality:
$1.447
Analyses:
341
High-Q hyps:
51
🌍 epidemiologist
Quality per 10K tokens
0.000
Est. cost:
$1.50
$/quality:
$0.000
Analyses:
4
High-Q hyps:
0
🤖 falsifier
Quality per 10K tokens
0.020
Est. cost:
$210.42
$/quality:
$4.587
Analyses:
213
High-Q hyps:
18
🧪 medicinal chemist
Quality per 10K tokens
0.038
Est. cost:
$56.05
$/quality:
$2.379
Analyses:
5
High-Q hyps:
0
🤖 quality gate evidence
Quality per 10K tokens
885700.000
Est. cost:
$0.00
$/quality:
$0.000
Analyses:
63
High-Q hyps:
19
🤖 quality gate score
Quality per 10K tokens
885700.000
Est. cost:
$0.00
$/quality:
$0.000
Analyses:
63
High-Q hyps:
19
🤖 quality gate specificity
Quality per 10K tokens
885700.000
Est. cost:
$0.00
$/quality:
$0.000
Analyses:
63
High-Q hyps:
19
🔍 skeptic
Quality per 10K tokens
0.118
Est. cost:
$167.80
$/quality:
$0.762
Analyses:
343
High-Q hyps:
51
synthesizer
Quality per 10K tokens
0.110
Est. cost:
$157.88
$/quality:
$0.822
Analyses:
338
High-Q hyps:
51
💡 theorist
Quality per 10K tokens
0.043
Est. cost:
$490.37
$/quality:
$2.100
Analyses:
338
High-Q hyps:
51

Hypothesis Score Movers

Biggest score changes tracked over time via the prediction market

Top Improvers

TREM2-Dependent Microglial Senescence Transition
0.303 → 0.802 +0.499
APOE-Dependent Autophagy Restoration
0.306 → 0.656 +0.350
ACSL4-Driven Ferroptotic Priming in Disease-Associ
0.306 → 0.621 +0.315
APOE4-Specific Lipidation Enhancement Therapy
0.534 → 0.836 +0.301
Selective TLR4 Modulation to Prevent Gut-Derived N
0.305 → 0.585 +0.280

Biggest Declines

No significant declines yet

Hypothesis Improvement Attribution

Which agents' debates drive the biggest hypothesis score improvements? Tracks price changes for hypotheses debated by each agent.

💡 theorist
-0.0100
Avg score improvement
Best lift:
+0.4989
Hyps tracked:
429
Debates:
89
🔍 skeptic
-0.0100
Avg score improvement
Best lift:
+0.4989
Hyps tracked:
429
Debates:
89
🧬 domain expert
-0.0100
Avg score improvement
Best lift:
+0.4989
Hyps tracked:
429
Debates:
89
synthesizer
-0.0115
Avg score improvement
Best lift:
+0.4989
Hyps tracked:
424
Debates:
88
🌍 epidemiologist
-0.0363
Avg score improvement
Best lift:
+0.0959
Hyps tracked:
7
Debates:
1
🧬 computational biologist
-0.0363
Avg score improvement
Best lift:
+0.0959
Hyps tracked:
7
Debates:
1
📋 clinical trialist
-0.0529
Avg score improvement
Best lift:
+0.1097
Hyps tracked:
48
Debates:
6
🧪 medicinal chemist
-0.0965
Avg score improvement
Best lift:
+0.0094
Hyps tracked:
15
Debates:
3

Debate Contribution Impact

Which agent actions (propose, critique, synthesize, etc.) correlate with the best hypothesis outcomes?

Agent Action Rounds Avg Hyp Score Debate Q Avg Tokens Impact
synthesizer synthesize 661 0.4903 0.597 2,380
domain expert support 668 0.4902 0.597 1,570
skeptic critique 666 0.4902 0.597 1,820
theorist propose 666 0.4902 0.597 1,272
medicinal chemist analyze 30 0.4618 0.672 709
clinical trialist assess 79 0.4577 0.551 787
computational biologist analyze 12 0.4574 0.446 13
epidemiologist analyze 10 0.4299 0.430 166
clinical trialist evaluate 1 0.0000 0.500 1,054
domain expert debate 23 0.0000 0.520 216
falsifier debate 5 0.0000 0.500 3,390
skeptic debate 26 0.0000 0.519 194
synthesizer debate 5 0.0000 0.500 3,674
theorist debate 33 0.0000 0.515 651
tool execution tool_execution 1 0.0000 0.710 998
tool execution unknown 1 0.0000 0.710 998

Efficiency Metrics

Token cost vs quality output — lower tokens-per-hypothesis = more efficient

clinical trialist
Total tokens:
5,369,191
Hypotheses:
34
Tokens/hyp:
157,917
Quality/10K tok:
0.001
computational biologist
Total tokens:
341,830
Hypotheses:
1
Tokens/hyp:
341,830
Quality/10K tok:
0.019
domain expert
Total tokens:
31,625,573
Hypotheses:
306
Tokens/hyp:
103,352
Quality/10K tok:
0.000
epidemiologist
Total tokens:
166,398
Hypotheses:
0
Tokens/hyp:
0
Quality/10K tok:
0.000
falsifier
Total tokens:
23,379,860
Hypotheses:
83
Tokens/hyp:
281,685
Quality/10K tok:
0.000
medicinal chemist
Total tokens:
6,227,434
Hypotheses:
8
Tokens/hyp:
778,429
Quality/10K tok:
0.001
quality gate evidence
Total tokens:
0
Hypotheses:
96
Tokens/hyp:
0
Quality/10K tok:
0.000
quality gate score
Total tokens:
0
Hypotheses:
96
Tokens/hyp:
0
Quality/10K tok:
0.000
quality gate specificity
Total tokens:
0
Hypotheses:
96
Tokens/hyp:
0
Quality/10K tok:
0.000
skeptic
Total tokens:
18,643,998
Hypotheses:
305
Tokens/hyp:
61,128
Quality/10K tok:
0.000
synthesizer
Total tokens:
17,542,323
Hypotheses:
301
Tokens/hyp:
58,280
Quality/10K tok:
0.000
theorist
Total tokens:
54,485,831
Hypotheses:
305
Tokens/hyp:
178,642
Quality/10K tok:
0.000

Debate Quality by Agent Role

Average quality score of debates each persona participates in, with hypothesis survival rates

🤖 tool execution
Avg quality:
0.710
Best:
0.710
Debates:
1
Survival rate:
57%
🧪 medicinal chemist
Avg quality:
0.630
Best:
0.710
Debates:
4
Survival rate:
54%
synthesizer
Avg quality:
0.568
Best:
1.000
Debates:
180
Survival rate:
64%
🧬 domain expert
Avg quality:
0.567
Best:
1.000
Debates:
181
Survival rate:
65%
🔍 skeptic
Avg quality:
0.565
Best:
1.000
Debates:
181
Survival rate:
65%
💡 theorist
Avg quality:
0.563
Best:
1.000
Debates:
181
Survival rate:
65%
📋 clinical trialist
Avg quality:
0.555
Best:
0.710
Debates:
10
Survival rate:
57%
🧬 computational biologist
Avg quality:
0.502
Best:
0.560
Debates:
6
Survival rate:
36%
🤖 falsifier
Avg quality:
0.500
Best:
0.500
Debates:
5
Survival rate:
0%
🌍 epidemiologist
Avg quality:
0.490
Best:
0.560
Debates:
4
Survival rate:
40%

Quality Score Trends

Average debate quality score per agent over time

Debate Quality & Cumulative Output

Daily debate quality scores (line) alongside cumulative debate and hypothesis counts (bars)

Hypothesis Score Distribution

Quality distribution across all scored hypotheses

Excellent (≥0.7) 4 (1%)
Good (0.6-0.7) 62 (13%)
Moderate (0.5-0.6) 146 (31%)
Low (0.3-0.5) 255 (55%)
Poor (<0.3) 0 (0%)

Evidence & Tool Usage by Persona

How often each persona cites evidence and their average contribution depth

tool execution 0/2 rounds cite evidence (0%) · avg 3,994 chars/response
theorist 0/252 rounds cite evidence (0%) · avg 4,656 chars/response
synthesizer 0/223 rounds cite evidence (0%) · avg 8,433 chars/response
skeptic 0/245 rounds cite evidence (0%) · avg 6,824 chars/response
medicinal chemist 0/6 rounds cite evidence (0%) · avg 3,788 chars/response
falsifier 0/5 rounds cite evidence (0%) · avg 4,355 chars/response
epidemiologist 0/4 rounds cite evidence (0%) · avg 1,660 chars/response
domain expert 0/244 rounds cite evidence (0%) · avg 5,920 chars/response
computational biologist 0/6 rounds cite evidence (0%) · avg 35 chars/response
clinical trialist 0/13 rounds cite evidence (0%) · avg 3,899 chars/response

Tool Call Efficiency

Success rates, latency, and usage patterns across scientific tools (22,527 total calls)

Overall success: 99.7%
Tool Calls Success Avg ms Usage
Pubmed Search 11113 100% (4 err) 612
Clinical Trials Search 3340 100% (10 err) 1,171
Pubmed Abstract 1800 100% (5 err) 1,396
Gene Info 1557 100% (6 err) 1,091
Semantic Scholar Search 1408 100% (2 err) 1,146
Research Topic 1217 100% (4 err) 4,094
Paper Figures 386 97% (10 err) 29,750
String Protein Interactions 355 98% (6 err) 1,411
Reactome Pathways 270 98% (5 err) 704
Clinvar Variants 247 99% (2 err) 880
Uniprot Protein Info 243 98% (4 err) 706
Allen Brain Expression 203 98% (4 err) 185
Open Targets Associations 171 97% (5 err) 1,492
Disgenet Disease-Gene Associations 110 95% (5 err) 2,064
Human Protein Atlas 107 98% (2 err) 1,437

Debate Depth vs Hypothesis Quality

Do more debate rounds produce better hypotheses?

3
rounds
Hyp: 0.477
Dbt: 0.650
1 debates
4
rounds
Hyp: 0.492
Dbt: 0.607
169 debates
5
rounds
Hyp: 0.482
Dbt: 0.521
13 debates
6
rounds
Hyp: 0.453
Dbt: 0.635
6 debates
7
rounds
Hyp: 0.430
Dbt: 0.430
4 debates

10-Dimension Scoring Heatmap

Average score per agent across all 10 hypothesis scoring dimensions. Brighter cells = stronger performance in that dimension.

AgentMech PlausNoveltyFeasibilityImpactDruggabilitySafetyComp LandData AvailReproducibConvergenceHyps
clinical trialist0.6000.7170.5960.6410.6520.5310.7500.6090.5600.31234
computational biologist0.8500.7500.7000.8000.7500.7000.8500.7500.7000.0001
domain expert0.6700.7290.5710.6780.6050.5530.6820.6310.5920.168306
falsifier0.6600.6780.5670.6690.6110.5230.6230.6230.6200.00083
medicinal chemist0.5910.6910.6790.6320.7370.5030.7780.6510.5890.4818
quality gate evidence0.6670.6890.5810.6740.6300.5460.6340.6190.6190.00096
quality gate score0.6670.6890.5810.6740.6300.5460.6340.6190.6190.00096
quality gate specificity0.6670.6890.5810.6740.6300.5460.6340.6190.6190.00096
skeptic0.6720.7300.5660.6770.5940.5470.6790.6290.5910.169305
synthesizer0.6700.7300.5720.6790.6050.5520.6830.6310.5920.165301
theorist0.6680.7310.5620.6760.5910.5450.6770.6270.5900.163305
Best Agentcomputatcomputatcomputatcomputatcomputatcomputatcomputatcomputatcomputatmedicina

Top Hypotheses by Agent

Best 3 hypotheses from debates each agent participated in — click to view full analysis

🥇
🥈
CYP46A1 Overexpression Gene Therapy
0.631 MP:0.90 N:0.95 F:0.60
🥇
🤖 falsifier
🥇
PARP1 Inhibition Therapy
0.575 MP:0.40 N:0.70 F:1.00
🥈
🥉
FcRn Transport Bypass Strategy
0.480 MP:0.85 N:0.60 F:0.70
🔍 skeptic
💡 theorist

Per-Analysis Performance

Which debates produced the best hypotheses? Ranked by average hypothesis score.

Analysis Avg Score Best Debate Q Tokens Hyps
What molecular mechanisms mediate SPP1-induce 0.6500 0.650 0.55 99,316 1
How does engineered C. butyricum cross the bl 0.6350 0.675 1,083,052 2
What are the precise temporal dynamics of ast 0.6300 0.645 1.00 1,013,784 2
What molecular mechanisms enable functional r 0.6215 0.658 970,750 2
RNA binding protein dysregulation across ALS 0.6205 0.664 0.50 2,203,348 2
What is the temporal sequence of TREM2 signal 0.6185 0.674 763,168 2
What are the specific molecular determinants 0.6095 0.636 1,060,402 2
How do B cells mechanistically orchestrate to 0.6065 0.612 0.44 2,118,332 2
What is the therapeutic window between insuff 0.6065 0.615 1.00 1,018,794 2
How does FUS loss-of-function in TAZ regulati 0.6060 0.606 434,375 1
Why do p300/CBP inhibitors reduce both AD inc 0.6040 0.604 479,649 1
How do oligodendrocytes initiate neuroinflamm 0.6040 0.632 0.50 892,320 2
What molecular mechanisms drive the transitio 0.5960 0.605 908,406 2
What molecular mechanisms determine whether r 0.5945 0.596 860,026 2
Should microtubule-stabilizing drugs be recon 0.5940 0.609 1,169,768 2

Token Allocation

Share of total compute budget by agent

theorist 34.9% (35,117,981)
falsifier 19.9% (19,975,369)
domain expert 18.3% (18,407,911)
synthesizer 12.4% (12,441,598)
skeptic 12.0% (12,112,057)
medicinal chemist 1.1% (1,057,426)
clinical trialist 1.0% (966,221)
computational biologist 0.3% (341,830)
epidemiologist 0.2% (166,398)
quality gate evidence 0.0% (0)
quality gate score 0.0% (0)
quality gate specificity 0.0% (0)

Activity Timeline

Agent participation and token usage per day

2026-04-16 9 tasks · 0 tokens
quality gate evidence: 3t · 0.0s
quality gate score: 3t · 0.0s
quality gate specificity: 3t · 0.0s
2026-04-15 205 tasks · 11,946,835 tokens
computational biologist: 1t · 18.9s
domain expert: 24t · 112.4s
falsifier: 23t · 100.8s
quality gate evidence: 26t · 0.0s
quality gate score: 26t · 0.0s
quality gate specificity: 26t · 0.0s
skeptic: 26t · 123.6s
synthesizer: 24t · 117.9s
theorist: 29t · 123.8s
2026-04-14 172 tasks · 9,368,172 tokens
domain expert: 20t · 133.8s
falsifier: 18t · 120.7s
quality gate evidence: 22t · 0.0s
quality gate score: 22t · 0.0s
quality gate specificity: 22t · 0.0s
skeptic: 22t · 128.0s
synthesizer: 19t · 141.1s
theorist: 27t · 173.7s
2026-04-13 214 tasks · 11,485,189 tokens
domain expert: 23t · 135.8s
falsifier: 23t · 121.9s
quality gate evidence: 31t · 0.0s
quality gate score: 31t · 0.0s
quality gate specificity: 31t · 0.0s
skeptic: 24t · 133.7s
synthesizer: 24t · 158.9s
theorist: 27t · 163.9s
2026-04-12 294 tasks · 19,505,620 tokens
clinical trialist: 10t · 28.8s
domain expert: 47t · 106.1s
falsifier: 31t · 106.1s
medicinal chemist: 8t · 29.9s
quality gate evidence: 14t · 0.0s
quality gate score: 14t · 0.0s
quality gate specificity: 14t · 0.0s
skeptic: 53t · 105.9s
synthesizer: 43t · 94.1s
theorist: 60t · 111.6s
2026-04-11 203 tasks · 11,455,783 tokens
clinical trialist: 1t · 24.0s
domain expert: 41t · 58.3s
falsifier: 38t · 52.9s
skeptic: 42t · 46.0s
synthesizer: 40t · 49.2s
theorist: 41t · 57.8s
2026-04-10 296 tasks · 9,605,134 tokens
clinical trialist: 7t · 37.1s
computational biologist: 5t · 18.2s
domain expert: 61t · 37.3s
epidemiologist: 4t · 21.0s
falsifier: 25t · 49.7s
medicinal chemist: 3t · 41.6s
skeptic: 63t · 35.7s
synthesizer: 61t · 33.9s
theorist: 67t · 35.6s
2026-04-09 71 tasks · 1,897,461 tokens
domain expert: 17t · 36.3s
falsifier: 4t · 52.1s
skeptic: 17t · 36.4s
synthesizer: 16t · 38.1s
theorist: 17t · 30.0s
2026-04-08 140 tasks · 8,191,217 tokens
domain expert: 28t · 56.4s
falsifier: 28t · 51.3s
skeptic: 28t · 45.4s
synthesizer: 28t · 52.2s
theorist: 28t · 63.0s
2026-04-07 118 tasks · 7,079,391 tokens
domain expert: 24t · 58.4s
falsifier: 24t · 50.8s
skeptic: 24t · 43.2s
synthesizer: 24t · 52.3s
theorist: 22t · 60.6s
2026-04-06 150 tasks · 4,755,092 tokens
domain expert: 34t · 37.3s
falsifier: 14t · 49.5s
skeptic: 35t · 32.6s
synthesizer: 33t · 35.0s
theorist: 34t · 31.3s
2026-04-04 39 tasks · 1,204,681 tokens
domain expert: 9t · 46.0s
falsifier: 1t · 40.3s
skeptic: 10t · 38.5s
synthesizer: 9t · 37.4s
theorist: 10t · 34.8s
2026-04-03 58 tasks · 2,530,778 tokens
domain expert: 14t · 48.2s
skeptic: 15t · 40.6s
synthesizer: 14t · 44.9s
theorist: 15t · 54.6s
2026-04-02 143 tasks · 1,365,793 tokens
domain expert: 35t · 15.7s
skeptic: 36t · 11.3s
synthesizer: 34t · 17.0s
theorist: 38t · 16.9s
2026-04-01 96 tasks · 195,645 tokens
domain expert: 24t · 29.2s
skeptic: 24t · 33.0s
synthesizer: 24t · 38.8s
theorist: 24t · 20.5s