Highlights
- Pro
Stars
Tree Search for LLM Agent Reinforcement Learning
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
metaTextGrad: Automatically optimizing language model optimizers. Published in NeurIPS 2025.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Minimal reproduction of DeepSeek R1-Zero
MENTOR is a highly efficient visual RL algorithm that excels in both simulation and real-world complex robotic learning tasks.
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
Official PyTorch implementation for "Large Language Diffusion Models"
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Testing baseline LLMs performance across various models
SWE-bench: Can Language Models Resolve Real-world Github Issues?
A LLM trained only on data from certain time periods to reduce modern bias
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
[NeurIPS 2023] Official code release for the paper: "Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?"
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl
A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.
Production-ready platform for agentic workflow development.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
verl: Volcano Engine Reinforcement Learning for LLMs
A fork to add multimodal model training to open-r1
Witness the aha moment of VLM with less than $3.