Stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.
Build, run, manage agentic software at scale.
Best Practices on Recommendation Systems
verl: Volcano Engine Reinforcement Learning for LLMs
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Witness the aha moment of VLM with less than $3.
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
Sparking "Thinking with Videos" via Reinforcement Learning
[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
【AAAI 2026】GenVidBench: A 6-Million Benchmark for AI-Generated Video Detection
This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"
[NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
Official Repository for "FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process", ACM MM 2024
the official code for Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization
[Information Fusion] Official Implementation of DAE (Bridging Cognition and Emotion: Empathy-Driven Multimodal Misinformation Detection)
Official Implementation of LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning