Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Stars
[TPAMI 2025] Official code for "SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation"
Open source code for Paper: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
The official implementation of the paper "Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models" (NeurIPS 2025 Poster).
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
Paper Debugger is the best overleaf companion
[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personali…
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.
codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
《Reinforcement Learning: An Introduction》(第二版)中文翻译
Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Agent0 Series: Self-Evolving Agents from Zero Data
A general memory system for agents, powered by deep-research
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curatio…
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)
SECOM: On Memory Construction and Retrieval for Personalized Conversational Agents, ICLR 2025
Agent S: an open agentic framework that uses computers like a human
The official repository for "MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants".
Multi-Agent System Powered by LLMs for End-to-end Multimodal ML Automation
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering