Skip to content
View agnJason's full-sized avatar

Block or report agnJason

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer.

Python 756 57 Updated Jun 12, 2026

The code for "InstructSAM: Segment Any Instance with Any Instructions"

Python 80 6 Updated May 26, 2026

A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.

Python 1,199 79 Updated Jun 13, 2026

🔥 Official code repository for "Unlocking Dense Metric Depth Estimation in VLMs"

Python 128 6 Updated May 21, 2026

Official codebase for "Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation"

Python 12 Updated Jun 3, 2026

Perfect Green Screen Keys

Python 13,866 857 Updated May 28, 2026

Official Implementation of CoInteract: Spatially-Structured Co-Generation for Interactive Human-Object Video Synthesis

Python 156 11 Updated May 7, 2026

Generative Refinement Networks for Visual Synthesis (Support C2I & T2I & T2V)

Python 133 3 Updated Jun 8, 2026

Official implementation of "OmniForcing: Unleashing Real-time Joint Audio-Visual Generation"[arXiv:2603.11647]. OmniForcing is the first framework to distill bidirectional audio-visual diffusion mo…

Python 156 2 Updated May 2, 2026

Helios: Real Real-Time Long Video Generation Model

Python 1,906 149 Updated Jun 10, 2026

Reinforcement Learning Framework for Visual Generation

Python 123 6 Updated Feb 13, 2026

[Tech Report] Alive: A Unified Audio-Video Generation Model

457 30 Updated Mar 31, 2026

[ICML 2026] | Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory

Python 177 6 Updated May 4, 2026

Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"

Python 2,145 241 Updated May 31, 2026

[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation

Python 765 28 Updated Apr 16, 2026

[AAAI 2026] Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

Python 299 28 Updated Nov 21, 2025
Python 1,723 200 Updated Nov 15, 2025

​​Unlimited-length talking video generation​​ that supports image-to-video and video-to-video generation

Python 6,885 1,212 Updated May 22, 2026

We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a re…

Python 1,238 110 Updated Jan 20, 2026

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 944 111 Updated Mar 18, 2026

Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

939 121 Updated Aug 27, 2025

Resources list for multimodal agentic reasoning

6 Updated Aug 11, 2025

[ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Python 3,705 536 Updated Feb 27, 2025

[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/

Python 2,909 310 Updated Feb 19, 2025

Official repository for LTX-Video

Python 10,473 1,034 Updated Jan 5, 2026

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 4,870 548 Updated Jun 10, 2026

Official implementary of HCoG: Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation [CVPR 2025]

Python 59 2 Updated Jul 28, 2025

[ICCV 2025] RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Python 134 16 Updated Sep 2, 2025

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

Python 248 18 Updated May 5, 2025
Next