-
Shanghai Jiao Tong University
- https://sotamak1r.github.io/
- @SOTAMak1r
- https://scholar.google.com/citations?user=BXL9nMgAAAAJ&hl=zh-CN
Stars
AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …
World Model Self-Distillation project website
Ideogram 4: Open image model at the forefront of design
🎥 [Awesome] Egocentric / First-Person Video Datasets 📚 Papers, Benchmarks & Resources for Ego Vision
Our inference and training framework to run on the Cosmos Models
Implementation of Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players
Simulations and identifiability proof for LeJEPA
[AAAI 2026] Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
Code, data and weights for the paper **What drives success in physical planning with Joint-Embedding Predictive World Models?**
Geo-Align: Video Generation Alignment via Metric Geometry Reward
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
Lens is a 3.8B-parameter text-to-image diffusion model that achieves quality competitive with and in several cases surpassing models like FLUX and SD3, while requiring significantly less training c…
HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.
Official Implemenation for RAEv2: Improved Baselines with Representation Autoencoders
repository for training action-conditioned latent diffusion world models for robot video generation
[Arxiv 2026] ReactiveGWM: Steering NPC in Reactive Game World Models
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video
[ICML 2026] Orienting Latent Actions for Video World Modeling
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
[ICML2026] Official Implementation of AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in Unified Multimodal Models via Decompositional Verifiable Reward
A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models
Prompt as Code | GPT-Image2 工业级提示词引擎与模板库,470+ 个案例逆向工程,20+ 套工业级模板,并提炼出Skills,持续更新中
Heuristic Learning Blog Post
Official Repo of "D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models"
[Roadmap] Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling