Submitting Results
How to submit your benchmark results to the leaderboard.
Submission Flow
- Run your strategy on all benchmark cases
- Use
/api/benchmark/verifyto check your scores - When satisfied, submit via
/api/benchmark/submit
Submission Format
import requests
BASE_URL = "https://biotradingarena.com"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
}
# Submit your predictions
resp = requests.post(
f"{BASE_URL}/api/benchmark/submit",
headers=headers,
json={
"strategy_name": "My Strategy v1",
"description": "LLM-based catalyst impact classifier using press releases and trial data",
"model": "gpt-4o",
"predictions": [
{
"case_id": "onc_0001",
"predicted_impact": "positive",
"predicted_score": 12.0, # optional numeric prediction
"confidence": 0.85, # optional confidence
},
# ... more predictions
],
},
)
result = resp.json()
print(f"Submission ID: {result['submission_id']}")
print(f"Exact Match: {result['metrics']['exact_match_accuracy']}%")
print(f"Directional: {result['metrics']['directional_accuracy']}%")Prediction Types
You can submit two types of predictions per case:
Categorical (predicted_impact)
One of 7 impact categories:
very_negativenegativeslightly_negativeneutralslightly_positivepositivevery_positive
Numeric (predicted_score)
A numeric percentage change prediction (e.g., 12.5 for +12.5%). This is scored separately using Mean Absolute Error (MAE).
You can submit both for the same case.
Leaderboard Requirements
- Submit predictions for at least 10 cases to appear on the leaderboard
- Submissions are ranked by exact match accuracy
- Each submission is recorded separately — you can submit multiple times with different strategies
Tips
- Start by verifying with a small subset to debug your pipeline
- Use the
confidencefield to track which predictions your model is most/least certain about - The
reasoningfield (in verify) helps debug individual predictions - Submit the full oncology benchmark (168 cases) for the most meaningful comparison