-
Ant Group
- Hangzhou, China
-
09:27
(UTC +08:00) - https://zengyh1900.github.io/
- @zengyh1900
Highlights
- Pro
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
LongLive: Real-time Interactive Long Video Generation
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
Krea Realtime 14B. An open-source realtime AI video model.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Ring attention implementation with flash attention
SOTAMak1r / Infinite-Forcing
Forked from guandeh17/Self-ForcingInfinite-Forcing: Towards Infinite-Long Video Generation
A sparse attention kernel supporting mix sparse patterns
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
Dream to Control: Learning Behaviors by Latent Imagination
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Code for FastVGGT: Training-Free Acceleration of Visual Geometry Transformer
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Ongoing research training transformer models at scale
Virtual Community: An Open World for Humans, Robots, and Society
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Make self forcing endless. Add cache purging. Add prompt controllability.
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
An educational resource to help anyone learn deep reinforcement learning.