Skip to content
View Wei-Baldwin-Zeng's full-sized avatar

Block or report Wei-Baldwin-Zeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open-Sora: Democratizing Efficient Video Production for All

Python 27,764 2,754 Updated Apr 30, 2025

State-of-the-art 2D and 3D Face Analysis Project

Python 26,951 5,813 Updated Sep 27, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,187 1,665 Updated Sep 24, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,093 1,208 Updated Nov 4, 2025

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,699 618 Updated Feb 21, 2025

BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models

Python 7,769 1,855 Updated Oct 31, 2025

Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

Python 6,988 479 Updated Mar 18, 2025

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 6,955 691 Updated Jan 22, 2025

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,641 542 Updated Jul 11, 2024

Data processing for and with foundation models! 🍎 πŸ‹ 🌽 ➑️ ➑️🍸 🍹 🍷

Python 5,476 286 Updated Nov 6, 2025

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 4,329 518 Updated Mar 23, 2025

A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites

4,079 318 Updated Oct 17, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,629 303 Updated Oct 20, 2025

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.

Python 2,219 379 Updated Nov 5, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,782 107 Updated Sep 27, 2024

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,763 76 Updated Oct 22, 2025

RetinaFace: Deep Face Detection Library for Python

Python 1,762 180 Updated Aug 11, 2025

This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! πŸ”₯

1,581 89 Updated Oct 30, 2025

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,515 145 Updated Sep 28, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,250 58 Updated Oct 18, 2025

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 968 35 Updated Oct 22, 2025
Jupyter Notebook 838 142 Updated Jul 10, 2024

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

Python 831 96 Updated Apr 18, 2024

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 818 48 Updated Nov 6, 2025

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. πŸŽ‰πŸŽ‰πŸŽ‰

Python 674 57 Updated Sep 30, 2025

A-MEM: Agentic Memory for LLM Agents

Python 665 80 Updated Oct 21, 2025

Official repo and evaluation implementation of VSI-Bench

Python 616 37 Updated Aug 5, 2025

Low-level locomotion policy training in Isaac Lab

Python 349 30 Updated Mar 7, 2025

Embodied Chain of Thought: A robotic policy that reason to solve the task.

Python 322 16 Updated Apr 5, 2025
Next