-
Shanghai Jiao Tong Univesity
- Shanghai
- @yangshuai1227
- https://YS-IMTech.github.io
Starred repositories
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Official Implementation of "MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives"
[Siggraph Asia 25] SS4D: Native 4D Generative Model via Structured Spacetime Latents
The paper list of "Memory in the Age of AI Agents: A Survey"
WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
"E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
[NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
[ArXiv 25] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Official inference repo for FLUX.2 models
Cambrian-S: Towards Spatial Supersensing in Video
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…
Native Multimodal Models are World Learners
Krea Realtime 14B. An open-source realtime AI video model.
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
This is the official implementation for Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1.
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
SOTAMak1r / Infinite-Forcing
Forked from guandeh17/Self-ForcingInfinite-Forcing: Towards Infinite-Long Video Generation