linkhut

Sort by:

Order:

04 Mar 26

BullshitBench v2 Explorer

https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html

An LLM benchmark testing the model’s push back (or lack thereof) against BS.

by sebastien 2 weeks ago

Tags:

29 Oct 25

SimpleBench

https://simple-bench.com/

by auguste 4 months ago saved 3 times

Tags:

SimpleBench

https://simple-bench.com/

by personalcontext 4 months ago saved 3 times

Tags:

24 Oct 25

SimpleBench

https://simple-bench.com/

by cos 4 months ago saved 3 times

Tags:

15 Sep 25

LLM Leaderboard - Valyrian Games

https://valyriantech.github.io/ValyrianGamesLeaderboard/leaderboard.html

This leaderboard ranks LLMs based on their performance in Valyrian Games competitions. Models are ranked using the TrueSkill rating system, which accounts for win/loss records and the relative skill of opponents.

by cos 6 months ago

Tags:

AndroidWorld Leaderboard [last update: 28/08/2025)] - Google Sheets

https://docs.google.com/spreadsheets/d/1cchzP9dlTZ3WXQTfYNhh3avxoLipqHN75v1Tb86uhHo/edit?gid=0#gid=0

Benchmark of smartphone interaction for LLMs and multi-agent systems

by cos 6 months ago

Tags:

04 Sep 25

lechmazur/generalization: Thematic Generalization Benchmark

https://github.com/lechmazur/generalization

Measures how effectively various LLMs can infer a narrow or specific “theme” (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.

by cos 6 months ago

Tags:

lechmazur/step_game: Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure

https://github.com/lechmazur/step_game

A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.

by cos 6 months ago

Tags:

05 May 25

Independent analysis of AI models and API providers

https://artificialanalysis.ai/

by ciwchris 10 months ago saved 4 times

Tags:

03 Mar 21

Pure RayTracing benchmark by MARvizer

https://marvizer.itch.io/pure-raytracing-benchmark

by pyrho Mar 2021

Tags:

17 Nov 20

PostgreSQL: Oracle vs. PostgreSQL - a comment

https://www.postgresql.org/message-id/flat/0959C81A-5B07-4D04-A6B1-57AA78A9A553%40gmail.com

Comparing the developer experience in terms of time and resource usage of performing clean installs of Oracle and PostgreSQL.

by mlb Nov 2020

Tags:

09 Jun 20

Is KG-8 The RO-59 Killah? - Page 4 - deskthority

https://deskthority.net/viewtopic.php?p=63784#p63784

by pyrho Jun 2020

Tags:

30 Aug 19

How fast are web workers? - Mozilla Hacks - the Web developer blog

https://hacks.mozilla.org/2015/07/how-fast-are-web-workers/

by pyrho Aug 2019

Tags:

26 Aug 19

Should small Rust structs be passed by-copy or by-borrow? | Hacker News

https://news.ycombinator.com/item?id=20798033

by chrisSt Aug 2019

Tags:

03 Jun 16

https://ecraven.github.io/r7rs-benchmarks/benchmark.html

https://ecraven.github.io/r7rs-benchmarks/benchmark.html

by ckoshikumo Jun 2016

Tags: