Rankings

Leaderboards

Head-to-head comparison of strategy + model combinations on the validated biotech catalyst benchmark (304 cases)

0
Total Runs
Strategy + Model combinations
0
Strategies
Evaluation approaches
0
Models
AI systems tested
304
Cases
Validated catalysts evaluated

Showing 0 of 0 runs

RankStrategyModelTypeExact MatchClose MatchDirectionMAECorr.Cases

Categorical Metrics

Exact Match

Prediction matches the actual impact category exactly (7 categories from very_negative to very_positive).

Close Match

Prediction is within 1 category of the actual (e.g., predicting “positive” when actual is “very_positive”).

Direction Accuracy

Prediction has the correct directional sign (positive, negative, or neutral).

Numeric Metrics

MAE (Mean Absolute Error)

Average absolute difference between predicted and actual scores. Lower is better.

Pearson Correlation

Linear correlation between predictions and actual outcomes. Range: -1 to +1. Higher is better.

Cost-Efficiency Overview

Comparison of strategy complexity, estimated latency, and cost per prediction across all approaches.

Tier 1: Direct

Steps:1 step
Latency:~2s
Cost:$0.01/case

Tier 2: CoT

Steps:1 step
Latency:~4s
Cost:$0.02/case

Tier 3: Agent

Steps:5 steps
Latency:~15s
Cost:$0.08/case

Tier 4: ML Hybrid

Steps:2-3 steps
Latency:~3-8s
Cost:$0.02-0.05/case