08 Sep 25
05 Sep 25
via: https://lobste.rs/s/jdqoem/how_big_are_our_embeddings_now_why
by silas
5 months ago
04 Sep 25
Measures how effectively various LLMs can infer a narrow or specific “theme” (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.
by cos
5 months ago
A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.
by cos
5 months ago
02 Sep 25
31 Aug 25
30 Aug 25
28 Aug 25
27 Aug 25
26 Aug 25
20 Aug 25
by silas
5 months ago
Finally, some convergence. Weird that OpenCode is not mentioned, though.
Finally, some convergence. Weird that OpenCode is not mentioned, though.