-
Alibaba
- China
- https://cxxgtxy.github.io
Stars
RISE: Reliable Improvement in Self-Evolving Vision-Language Models
[ICML 2026] The official implementation of paper "Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation"
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation
[SIGGRAPH 2026] MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
This is the official repository of "LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics".
[2026 CVPR]Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
[CVPR 2026] Elucidating the SNR-t Bias of Diffusion Probabilistic Models
[ICLR2026] Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
DreamX-World: A General-Purpose Interactive World Model
Let Skills Evolve Collectively with Agentic Evolver
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[ICLR2026] AutoDrive-R2: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
[CVPR 2026] Semantic Context Matters: Improving Conditioning for Autoregressive Models
A comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models in 4D settings.
[ICLR 2026] FASA: FREQUENCY-AWARE SPARSE ATTENTION
PyTorch re-implementation for MeanFlow
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, a…
[KDD 2026 Oral] MobilityBench: A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
IntTravel: A Real-World Dataset and Generative Framework for Integrated Multi-Task Travel Recommendation
Code2World: A GUI World Model via Renderable Code Generation
Official repository for “PixelGen: Improving Pixel Diffusion with Perceptual Loss”
[ICLR2026] Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
[ICLR 2026] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation