Stars
A benchmark for evaluating realistic preference-following in personalized user-LLM interactions.
OpenClaw-RL: Train any agent simply by talking
PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory
Reinforcement Learning via Self-Distillation (SDPO)
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.
The official implementation of the paper "Mem-α: Learning Memory Construction via Reinforcement Learning"
Training Proactive and Personalized LLM Agents
Baselines for personalized RLHF methods including GPO, DPO, and various reward modeling approaches
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
Official Repo for Open-Reasoner-Zero
Code and data for "Inferring Rewards from Language in Context" [ACL 2022].
The official repo for the code and data of paper SMART
Official Implementation of "DeLLMa: Decision Making Under Uncertainty with Large Language Models"
DialOp: Decision-oriented dialogue environments for collaborative language agents
Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals
[ICLR 2025] No Preference Left Behind: Group Distributional Preference Optimization
Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)
[ACL 2025 Demo] Repository for the demo and paper: ReasonGraph: Visualisation of Reasoning Paths
[NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
[NeurIPS D&B '25] The one-stop repository for LLM unlearning
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
[EMNLP 2024] Ask-before-Plan: Proactive Language Agents for Real-World Planning