Rankings

Leaderboards

Head-to-head comparison of strategy + model combinations on the validated biotech catalyst benchmark (304 cases)

Total Runs

Strategy + Model combinations

Strategies

Evaluation approaches

Models

AI systems tested

304

Cases

Validated catalysts evaluated

Showing 0 of 0 runs

Rank	Strategy	Model	Type	Exact Match	Close Match	Direction	MAE	Corr.	Cases

Exact Match

Prediction matches the actual impact category exactly (7 categories from very_negative to very_positive).

Close Match

Prediction is within 1 category of the actual (e.g., predicting “positive” when actual is “very_positive”).

Direction Accuracy

Prediction has the correct directional sign (positive, negative, or neutral).

MAE (Mean Absolute Error)

Average absolute difference between predicted and actual scores. Lower is better.

Pearson Correlation

Linear correlation between predictions and actual outcomes. Range: -1 to +1. Higher is better.

Comparison of strategy complexity, estimated latency, and cost per prediction across all approaches.

Steps:1 step

Latency:~2s

Cost:$0.01/case

Steps:1 step

Latency:~4s

Cost:$0.02/case

Steps:5 steps

Latency:~15s

Cost:$0.08/case

Steps:2-3 steps

Latency:~3-8s

Cost:$0.02-0.05/case