24 Oct 25

by cos 3 months ago saved 3 times
Tags:

15 Sep 25

This leaderboard ranks LLMs based on their performance in Valyrian Games competitions. Models are ranked using the TrueSkill rating system, which accounts for win/loss records and the relative skill of opponents.

by cos 4 months ago


04 Sep 25

Measures how effectively various LLMs can infer a narrow or specific “theme” (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.

by cos 5 months ago
Tags:

A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.

by cos 5 months ago
Tags: