Skip to content
View chlorell's full-sized avatar
🪴
🪴
  • Ocean Planet Studios
  • Ancient Greece

Block or report chlorell

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders.

14 Updated Jun 11, 2026

A multi-agent benchmark where eight LLMs play a money-driven elimination game with private transfers and a buyout endgame, and are ranked by final wealth

15 1 Updated May 27, 2026

LLM Persuasion Benchmark tests whether one language model can change another model’s stated position over the course of a multi-turn conversation. It runs round-robin persuasion dialogues on contes…

30 1 Updated Mar 27, 2026

Adversarial multi-turn benchmark for LLM debate quality, using side-swapped matchups and multi-model judging to rank models by judged debate performance.

20 1 Updated Jun 10, 2026

This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story

388 10 Updated Jun 10, 2026

Documents the style side of the short-story Creative Writing LLM benchmark: we generated many short stories with a range of LLMs, then analyzed those stories for stylistic fingerprints and within-m…

24 2 Updated Dec 18, 2025

A benchmark for conversational bargaining by language models. In each 20‑round match one LLM plays buyer, one plays seller, and both hold a hidden private value. Every round they swap a short publi…

44 1 Updated Jun 10, 2026

The BAZAAR challenges LLMs to navigate the double-auction marketplace, where buyers and sellers must make strategic decisions with incomplete information. Each agent receives a private value and mu…

37 4 Updated Jul 30, 2025

Systemic, uninstructed collusion among frontier LLMs in a simulated bidding environment

18 1 Updated Jul 15, 2025

Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large Language Models (LLMs) in a resource-sharing econ…

41 2 Updated Apr 10, 2025

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a…

87 2 Updated Dec 9, 2025

Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which ite…

71 2 Updated Apr 16, 2026

LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each other or to 50 initial random words.

35 1 Updated Mar 20, 2025

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation …

33 2 Updated Mar 20, 2025

Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words

Python 228 8 Updated May 28, 2026

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

HTML 247 9 Updated Aug 7, 2025

glTF – Runtime 3D Asset Delivery

HTML 7,749 1,193 Updated Jun 1, 2026

OpenGL Image (GLI)

C++ 590 131 Updated Apr 5, 2026

ASL libraries will be migrated here in the stlab namespace, new libraries will be created here.

C++ 679 75 Updated May 27, 2026

Angle Project (https://code.google.com/p/angleproject/) with support for Windows Store Apps (WinRT)

C++ 44 14 Updated May 2, 2014

The JavaScript library for modern SVG graphics.

JavaScript 14,009 1,120 Updated Jun 12, 2026