-
xAI
- Bellevue, WA
- https://lxa9867.github.io/
Stars
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Native Multimodal Models are World Learners
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[NeurIPS'25 Spotlight] Boosting Generative Image Modeling via Joint Image-Feature Synthesis
[ICLR 2026] Code for our paper "Next Visual Granularity Generation".
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
[🚀 ICLR 2026 Oral]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Wan: Open and Advanced Large-Scale Video Generative Models
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
[NeurIPS 2025] Efficient Reasoning Vision Language Models
[NeurIPS 2025] Geometry Aware Operator Transformer As An Efficient And Accurate Neural Surrogate For PDEs On Arbitrary Domains
Train transformer language models with reinforcement learning.
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers