05 Sep 25
via: https://lobste.rs/s/jdqoem/how_big_are_our_embeddings_now_why
04 Sep 25
Measures how effectively various LLMs can infer a narrow or specific “theme” (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.
A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.
02 Sep 25
31 Aug 25
30 Aug 25
28 Aug 25
27 Aug 25
26 Aug 25
20 Aug 25
Finally, some convergence. Weird that OpenCode is not mentioned, though.
Finally, some convergence. Weird that OpenCode is not mentioned, though.
19 Aug 25
Discover why OpenAI’s gpt-oss model family is ideal for building reliable and safe AI agents, not just for developers.