Stars
📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"
[ICLR 2026] Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
[ICML 2026] a unified reinforcement learning toolbox for joint RL on language models and diffusion models
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
the official repo for "D-AR: Diffusion via Autoregressive Models"
[Preprint] ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Implementation of "Hyperspherical Latents Improve Continuous-Token Autoregressive Generation"
[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
A PyTorch native platform for training generative AI models
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Lear…
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
Official code for ICCV 2025 paper, X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
A framework that allows you to apply Sparse AutoEncoder on any models
SigLIP-based Aesthetic Score Predictor
Open protocol for communication between AI agents, applications, and humans.
[NeurIPS 2025] Controllable Human-centric Keyframe Interpolation with Generative Prior
Official Implementation of Paper Transfer between Modalities with MetaQueries
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance