Stars
The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"
Official repository for “PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss”
An agentic skills framework & software development methodology that works.
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
A survey for visual generation alignment
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
A Flexible and Powerful Parameter Server for large-scale machine learning
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Official Code of "Distribution Matching Distillation Meets Reinforcement Learning"
Official inference repo for FLUX.2 models
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
The world's simplest facial recognition api for Python and the command line
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
Official implementation of MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction