-
Tsinghua University
- Beijing, China
-
18:09
(UTC +08:00) - https://hbx-hbx.github.io/
- @hbx_hbx
Highlights
- Pro
Stars
Towards a Unified View of Large Language Model Post-Training
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
The official code repository for the paper "CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents"
The official repository for the dataset FactualBench, which is introduced in paper "Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization".
My learning notes for ML SYS.
MiniCPM5-1B: A SOTA 1B on-device LLM, small yet powerful.
A Survey of Reinforcement Learning for Large Reasoning Models
Scalable RL solution for advanced reasoning of language models
This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box
A large-scale, fine-grained, diverse preference dataset (and models).
A bibliography and survey of the papers surrounding o1
✨✨Latest Advances on Multimodal Large Language Models
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Code for the paper "The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning"
Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"
The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
Can large language models provide useful feedback on research papers? A large-scale empirical analysis.
Chrome Extensions Samples
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)