Skip to content
View angzong's full-sized avatar

Block or report angzong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Dense Prediction Transformers

Python 2,324 279 Updated Dec 18, 2024

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 3,485 404 Updated Mar 23, 2026

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 8,604 1,213 Updated Mar 30, 2026

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

Jupyter Notebook 7,229 1,323 Updated Jan 18, 2023

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,819 2,406 Updated Mar 20, 2026

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 7,822 798 Updated Mar 24, 2026

MOVA: Towards Scalable and Synchronized Video–Audio Generation

Python 869 60 Updated Mar 14, 2026

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,227 118 Updated Mar 23, 2026

[NeurIPS 2025 D&B🔥] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Jupyter Notebook 205 7 Updated Mar 8, 2026

[CVPR 2026]UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

212 8 Updated Jan 29, 2026

InteractAvatar is a novel dual-stream DiT framework that enables talking avatars to perform Grounded Human-Object Interaction (GHOI)

Python 22 1 Updated Mar 26, 2026

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 1,030 58 Updated Nov 19, 2025
Python 332 27 Updated Feb 9, 2026

[CVPR 2026] OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

225 9 Updated Feb 21, 2026

​​Unlimited-length talking video generation​​ that supports image-to-video and video-to-video generation

Python 5,198 859 Updated Dec 18, 2025

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 462 12 Updated Mar 9, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,432 304 Updated Jan 5, 2026
Python 10,797 723 Updated Feb 9, 2026

Official code of Motus: A Unified Latent Action World Model

Python 917 39 Updated Jan 5, 2026

大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"

Jupyter Notebook 1,803 126 Updated Mar 26, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,827 1,707 Updated Jan 30, 2026

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,566 65 Updated Jun 14, 2025
Python 1,675 190 Updated Nov 15, 2025

[TPAMI 2025] Official Code for "SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation"

Python 269 29 Updated Feb 12, 2026

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 846 102 Updated Mar 18, 2026

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,868 480 Updated Dec 18, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 4,760 220 Updated Mar 29, 2026

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 4,004 328 Updated Aug 14, 2025
Next