Stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.
Build, run, manage agentic software at scale.
Best Practices on Recommendation Systems
verl: Volcano Engine Reinforcement Learning for LLMs
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Elevate your AI research writing, no more tedious polishing ✨
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Witness the aha moment of VLM with less than $3.
Notes about courses Dive into Deep Learning by Mu Li
MiniMax-M2, a model built for Max coding & agentic workflows.
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Links to conference/journal publications in automated fact-checking (resources for the TACL22/EMNLP23 paper).
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
Sparking "Thinking with Videos" via Reinforcement Learning
[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
【AAAI 2026】GenVidBench: A 6-Million Benchmark for AI-Generated Video Detection
This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"