Rankings
Leaderboards
Head-to-head comparison of strategy + model combinations on the validated biotech catalyst benchmark (304 cases)
0
Total Runs
Strategy + Model combinations
0
Strategies
Evaluation approaches
0
Models
AI systems tested
304
Cases
Validated catalysts evaluated
Showing 0 of 0 runs
| Rank | Strategy | Model | Type | Exact Match | Close Match | Direction | MAE | Corr. | Cases |
|---|
Categorical Metrics
Exact Match
Prediction matches the actual impact category exactly (7 categories from very_negative to very_positive).
Close Match
Prediction is within 1 category of the actual (e.g., predicting “positive” when actual is “very_positive”).
Direction Accuracy
Prediction has the correct directional sign (positive, negative, or neutral).
Numeric Metrics
MAE (Mean Absolute Error)
Average absolute difference between predicted and actual scores. Lower is better.
Pearson Correlation
Linear correlation between predictions and actual outcomes. Range: -1 to +1. Higher is better.
Cost-Efficiency Overview
Comparison of strategy complexity, estimated latency, and cost per prediction across all approaches.
Tier 1: Direct
Steps:1 step
Latency:~2s
Cost:$0.01/case
Tier 2: CoT
Steps:1 step
Latency:~4s
Cost:$0.02/case
Tier 3: Agent
Steps:5 steps
Latency:~15s
Cost:$0.08/case
Tier 4: ML Hybrid
Steps:2-3 steps
Latency:~3-8s
Cost:$0.02-0.05/case