Highlights
- Pro
Stars
The fastest repo in history to surpass 50K stars ⭐, reaching the milestone in just 2 hours after publication. Better Harness Tools, not merely storing the archive of leaked Claude Code but make rea…
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
[ICLR 2026] LongLive: Real-time Interactive Long Video Generation
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
CoTracker is a model for tracking any point (pixel) on a video.
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning
A survey for visual generation alignment
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Official implementation of Continuous 3D Perception Model with Persistent State
[ICLR2026] The official code of "Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance"
Official Code Repo for UniVA: Universal Video Agents
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
The official implementation of paper "Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation"
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Automatic Metric for Evaluating Generated Videos
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
[ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
[CVPR 2026] Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
A high-throughput and memory-efficient inference and serving engine for LLMs
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
[NeurIPS 2025] Improving Video Generation with Human Feedback
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
[ICLR 2026] EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling