Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 10,900 946 Updated Nov 6, 2025

axon-rl / gem

A Gym for Agentic LLMs

Python 347 20 Updated Oct 30, 2025

wandb / weave

Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.

Python 1,016 134 Updated Nov 6, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 1,437 113 Updated Nov 5, 2025

letta-ai / sleep-time-compute

accompanying material for sleep-time compute paper

Python 117 13 Updated Apr 30, 2025

deepseek-ai / DeepSeek-V3.2-Exp

Python 966 67 Updated Oct 2, 2025

thinking-machines-lab / manifolds

Supporting code for the blog post on modular manifolds.

Python 100 14 Updated Sep 26, 2025

facebookresearch / cwm

Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.

Python 704 55 Updated Sep 24, 2025

DatarusAI / Datarus-JupyterAgent

A sophisticated multi-step reasoning pipeline powered by the Datarus-R1-14B-Preview model

Python 223 3 Updated Aug 21, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 15,993 1,264 Updated Oct 27, 2025

facebookresearch / meta-agents-research-environments

Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environment…

Python 338 42 Updated Oct 31, 2025

ShengranHu / ADAS

[ICLR 2025] Automated Design of Agentic Systems

Python 1,444 221 Updated Jan 28, 2025

open-thought / reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,210 100 Updated Oct 6, 2025

huggingface / jupyter-agent

Training LLMs to reason and analyze data with notebooks

Python 49 4 Updated Sep 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zhihui Xie zhxieml

Achievements

Achievements

Highlights

Block or report zhxieml

Stars

OS-Copilot / OS-Sentinel

zjunlp / DataMind

mangiucugna / json_repair

hkust-nlp / Toolathlon

ScalingIntelligence / KernelBench

chroma-core / context-rot

TheAgentArk / Toucan

THUDM / AgentRL

princeton-pli / hal-harness

ServiceNow / PipelineRL

alibaba / ROLL

karpathy / nanochat

yanring / Megatron-MoE-ModelZoo

alibaba / Pai-Megatron-Patch

astral-sh / ruff

vals-ai / finance-agent

modelscope / ms-swift