Coding
- Arena.ai Code Arena
- Vals AI SWE-bench
- Vals AI Vibe Code Bench
- Vellum SWE Bench rankings
- Artificial Analysis Intelligence Index
Methodology
Purpose
Turn several public signals into one task-specific shortlist.
Missing data
Missing rows lower confidence instead of becoming automatic zeroes.
Update marker
Each core page shows the ranking or local model refresh date.
This method is designed for useful comparison, not scientific certainty. Real-world results can differ because prompts, safety settings, reasoning effort, latency, price, context length, and tool access all affect model performance.
AI Jupyter adds an editorial layer on top of public data: source relevance, category fit, coverage confidence, and decision notes are shown so the ranking is more useful than a copied leaderboard table.
Ranking pages should be read as curated comparison pages. They include source notes, confidence signals, update dates, and links to related pages so readers can audit the reasoning. For correction standards, read the Editorial Policy. For monetization disclosure, read the Advertising Disclosure.