Skip to content
View angzong's full-sized avatar

Block or report angzong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Dense Prediction Transformers

Python 2,322 279 Updated Dec 18, 2024

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 3,457 397 Updated Mar 23, 2026

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 8,457 1,195 Updated Mar 27, 2026

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

Jupyter Notebook 7,228 1,323 Updated Jan 18, 2023

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,793 2,400 Updated Mar 20, 2026

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 7,803 798 Updated Mar 24, 2026

MOVA: Towards Scalable and Synchronized Video–Audio Generation

Python 864 58 Updated Mar 14, 2026

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,222 117 Updated Mar 23, 2026

[NeurIPS 2025 D&B🔥] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Jupyter Notebook 205 7 Updated Mar 8, 2026

[CVPR 2026]UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

212 8 Updated Jan 29, 2026

InteractAvatar is a novel dual-stream DiT framework that enables talking avatars to perform Grounded Human-Object Interaction (GHOI)

Python 22 1 Updated Mar 26, 2026

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 1,028 58 Updated Nov 19, 2025
Python 332 27 Updated Feb 9, 2026

[CVPR 2026] OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

224 9 Updated Feb 21, 2026

​​Unlimited-length talking video generation​​ that supports image-to-video and video-to-video generation

Python 5,150 854 Updated Dec 18, 2025

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 461 12 Updated Mar 9, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,418 303 Updated Jan 5, 2026
Python 10,756 721 Updated Feb 9, 2026

Official code of Motus: A Unified Latent Action World Model

Python 908 38 Updated Jan 5, 2026

大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"

Jupyter Notebook 1,796 124 Updated Mar 26, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,787 1,698 Updated Jan 30, 2026

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,562 65 Updated Jun 14, 2025
Python 1,673 189 Updated Nov 15, 2025

[TPAMI 2025] Official Code for "SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation"

Python 271 29 Updated Feb 12, 2026

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 840 101 Updated Mar 18, 2026

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,863 480 Updated Dec 18, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 4,760 219 Updated Feb 12, 2026

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,998 328 Updated Aug 14, 2025
Next