-
Google Deepmind
- Kirkland, WA
- https://lxa9867.github.io/
Stars
WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Native Multimodal Models are World Learners
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[NeurIPS'25 Spotlight] Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Code for our paper "Next Visual Granularity Generation".
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Wan: Open and Advanced Large-Scale Video Generative Models
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
[NeurIPS 2025] Efficient Reasoning Vision Language Models
[NeurIPS 2025] Geometry Aware Operator Transformer As An Efficient And Accurate Neural Surrogate For PDEs On Arbitrary Domains
Train transformer language models with reinforcement learning.
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
MAGI-1: Autoregressive Video Generation at Scale