Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
-
Updated
Jun 27, 2024 - Python
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
A technical guide and live-tracking repository for the world's top AI models, specialized by coding, reasoning, and multimodal performance.
A curated collection of AI model benchmarks and leaderboards — covering general rankings, coding, agents, reasoning, embeddings, and more
WordleBench — Deterministic AI Wordle benchmark. Compare 34+ LLMs (GPT-5, Claude 4.5, Gemini, Grok, Llama) head-to-head on accuracy, speed, and cost across 50 standardized words.
119 AI models × 55 benchmarks with per-score freshness dates, auto-updated pricing, task routing. Every score has a date and source URL. Daily CI.
Daily-synced Top 10 LLM leaderboards (SWE-bench Verified, Terminal-Bench, OSWorld, ARC-AGI-2, HLE) from benchlm.ai, plus a curated AI coding tools landscape.
Open LLM leaderboard featuring Xiaomi MiMo v2.5 & MiMo 100T head-to-head with GPT-5, Claude, Gemini, DeepSeek, Llama 4. ARC-AGI · SWE-Bench · MMLU-Pro · GPQA · HumanEval · BFCL.
E/RP benchmark leaderboard for LLMs
Add a description, image, and links to the llm-leaderboard topic page so that developers can more easily learn about it.
To associate your repository with the llm-leaderboard topic, visit your repo's landing page and select "manage topics."