Skip to content
View Wei-Baldwin-Zeng's full-sized avatar

Block or report Wei-Baldwin-Zeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
38 stars written in Python
Clear filter

Open-Sora: Democratizing Efficient Video Production for All

Python 27,757 2,753 Updated Apr 30, 2025

State-of-the-art 2D and 3D Face Analysis Project

Python 26,948 5,813 Updated Sep 27, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,186 1,665 Updated Sep 24, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,092 1,208 Updated Nov 4, 2025

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,699 618 Updated Feb 21, 2025

BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models

Python 7,769 1,855 Updated Oct 31, 2025

Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

Python 6,988 479 Updated Mar 18, 2025

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 6,954 691 Updated Jan 22, 2025

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,641 542 Updated Jul 11, 2024

Data processing for and with foundation models! 🍎 πŸ‹ 🌽 ➑️ ➑️🍸 🍹 🍷

Python 5,475 285 Updated Nov 5, 2025

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 4,324 516 Updated Mar 23, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,629 303 Updated Oct 20, 2025

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.

Python 2,216 379 Updated Nov 5, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,782 107 Updated Sep 27, 2024

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,763 76 Updated Oct 22, 2025

RetinaFace: Deep Face Detection Library for Python

Python 1,761 180 Updated Aug 11, 2025

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,515 144 Updated Sep 28, 2025

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 968 35 Updated Oct 22, 2025

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

Python 831 96 Updated Apr 18, 2024

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 817 48 Updated Aug 21, 2025

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. πŸŽ‰πŸŽ‰πŸŽ‰

Python 674 57 Updated Sep 30, 2025

A-MEM: Agentic Memory for LLM Agents

Python 664 79 Updated Oct 21, 2025

Official repo and evaluation implementation of VSI-Bench

Python 616 37 Updated Aug 5, 2025

Low-level locomotion policy training in Isaac Lab

Python 349 30 Updated Mar 7, 2025

Embodied Chain of Thought: A robotic policy that reason to solve the task.

Python 322 16 Updated Apr 5, 2025

[RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVid

Python 295 20 Updated Oct 15, 2025

PyTorch implementation of paper "ARTrack" and "ARTrackV2"

Python 292 35 Updated Oct 20, 2025
Python 279 34 Updated Mar 17, 2025

[CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"

Python 265 18 Updated Oct 16, 2025

Vision-Language Navigation Benchmark in Isaac Lab

Python 260 24 Updated Aug 28, 2025
Next