11 Sep 25
08 Sep 25
05 Sep 25
via: https://lobste.rs/s/jdqoem/how_big_are_our_embeddings_now_why
by silas
5 months ago
04 Sep 25
Measures how effectively various LLMs can infer a narrow or specific “theme” (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.
by cos
5 months ago
A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.
by cos
5 months ago