-
Bytedance Seed
- San Jose
- https://enjoyyi.github.io/
- @Enjoy_Yi
Stars
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.
Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Training Large Language Model to Reason in a Continuous Latent Space
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Lear…
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Kandinsky 5.0: A family of diffusion models for Video & Image generation
[NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
Muon is an optimizer for hidden layers in neural networks
[ICLR 2026] Code for our paper "Next Visual Granularity Generation".
[NeurIPS'25 Spotlight] Boosting Generative Image Modeling via Joint Image-Feature Synthesis
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
[ICLR'26] Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality…
MoVQGAN - model for the image encoding and reconstruction
[CVPR 2025🔥] Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
NEO Series: Native Vision-Language Models from First Principles
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[CVPR 2025 Highlight] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
Fully Open Framework for Democratized Multimodal Training
[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning