📊 You are visitor #49346 to this AWESOME site! 📊
Last updated: 2025-12-25 | Server uptime: 99.9% ⚡

Today's Stories

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔔 OPEN SOURCE

🎄 We release 67,074 Qwen3-Coder OpenHands trajectories on SWE-rebench + 2 model checkpoints!

via r/LocalLLaMA 👤 u/Fabulous_Pollution10 📅 2025-12-24

⬆️ 6 ups ⚡ Score: 8.3

"Happy holidays! 🎄 I’m Ibragim from Nebius. We’re releasing a big dataset for agentic coding research: 67,074 OpenHands trajectories (plus 2 RFT checkpoints), built from 3,800 resolved issues across 1,800+ Python repos. The trajectories are long: 64 turns on average, up to 100 turns, and up to 131..."

🛠️ SHOW HN

Show HN: Vibium – Browser automation for AI and humans, by Selenium's creator

via HackerNews 👤 hugs 📅 2025-12-24

🔺 307 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 92 comments 🐝 BUZZING

📰 NEWS

Asterisk AI Voice Agent

via HackerNews 👤 akrulino 📅 2025-12-24

🔺 105 pts ⚡ Score: 7.8

💬 HackerNews Buzz: 45 comments 😤 NEGATIVE ENERGY

🔬 RESEARCH

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

via Arxiv 👤 Amirhosein Ghasemabadi, Di Niu 📅 2025-12-23

⚡ Score: 6.8

"Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample consistency, or text-based self-critique, which incur additional compute or correlate weakly with tr..."

🔬 RESEARCH

Bohrium + SciMaster: Building the Infrastructure and Ecosystem for Agentic Science at Scale

via Arxiv 👤 Linfeng Zhang, Siheng Chen, Yuzhu Cai et al. 📅 2025-12-23

⚡ Score: 6.8

"AI agents are emerging as a practical way to run multi-step scientific workflows that interleave reasoning with tool use and verification, pointing to a shift from isolated AI-assisted steps toward \emph{agentic science at scale}. This shift is increasingly feasible, as scientific tools and models c..."

🔬 RESEARCH

Step-DeepResearch Technical Report

via Arxiv 👤 Chen Hu, Haikuo Du, Heng Wang et al. 📅 2025-12-23

⚡ Score: 6.7

"As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-sour..."

🔬 RESEARCH

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

via Arxiv 👤 Seijin Kobayashi, Yanick Schimpf, Maximilian Schlegel et al. 📅 2025-12-23

⚡ Score: 6.7

"Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token c..."

🔬 RESEARCH

LongVideoAgent: Multi-Agent Reasoning with Long Videos

via Arxiv 👤 Runtao Liu, Ziyi Liu, Jiaqi Tang et al. 📅 2025-12-23

⚡ Score: 6.6

"Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We pro..."

📰 NEWS

ChatGTP can now almost make a correct alphabet chart

via r/ChatGPT 👤 u/SunInevitable1890 📅 2025-12-24

⬆️ 526 ups ⚡ Score: 6.5

"It's much better at it than the previous model."

💬 Reddit Discussion: 63 comments 👍 LOWKEY SLAPS

📰 NEWS

DogGPT lawyer

via r/ChatGPT 👤 u/Historical_County357 📅 2025-12-24

⬆️ 616 ups ⚡ Score: 6.5

"Imagine you pay all your life savings to go to court and this is the lawyer you paid for."

💬 Reddit Discussion: 33 comments 😤 NEGATIVE ENERGY

📰 NEWS

Microsoft denies rewriting Windows 11 in Rust using AI

via HackerNews 👤 zdw 📅 2025-12-25

🔺 58 pts ⚡ Score: 6.5

💬 HackerNews Buzz: 75 comments 🐝 BUZZING

🔬 RESEARCH

Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent

via Arxiv 👤 Humza Nusrat, Luke Francisco, Bing Luo et al. 📅 2025-12-23

⚡ Score: 6.5

"Stereotactic radiosurgery (SRS) demands precise dose shaping around critical structures, yet black-box AI systems have limited clinical adoption due to opacity concerns. We tested whether chain-of-thought reasoning improves agentic planning in a retrospective cohort of 41 patients with brain metasta..."

🛠️ TOOLS

Built a gateway to use Claude alongside other LLMs with automatic failover and cost tracking (open source)

via r/claudeai 👤 u/dinkinflika0 📅 2025-12-24

⬆️ 23 ups ⚡ Score: 6.4

"If you're using Claude in production, you've probably hit rate limits, wanted to compare Claude vs GPT-4 for specific tasks, or needed fallback when Anthropic has downtime. **What we built:** Bifrost - an open source LLM gateway that lets you route between Claude (all models), OpenAI, Gemini, Bedr..."

🔬 RESEARCH

Distilling to Hybrid Attention Models via KL-Guided Layer Selection

via Arxiv 👤 Yanhong Li, Songlin Yang, Shawn Tan et al. 📅 2025-12-23

⚡ Score: 6.2

"Distilling pretrained softmax attention Transformers into more efficient hybrid architectures that interleave softmax and linear attention layers is a promising approach for improving the inference efficiency of LLMs without requiring expensive pretraining from scratch. A critical factor in the conv..."

🔬 RESEARCH

Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs

via Arxiv 👤 Rui Pan, Zhuofu Chen, Ravi Netravali 📅 2025-12-23

⚡ Score: 6.1

"Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a strength for drafters in speculative decoding with autoregressi..."

Today's Stories

📡 AI NEWS BUT ACTUALLY GOOD