Wei-Baldwin-Zeng

Wei-Baldwin-Zeng Wei-Baldwin-Zeng

2 followers · 1 following

Stars

38 stars written in Python

Clear filter

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 27,757 2,753 Updated Apr 30, 2025

deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project

Python 26,948 5,813 Updated Sep 27, 2025

OpenBMB / MiniCPM-V

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,186 1,665 Updated Sep 24, 2025

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,092 1,208 Updated Nov 4, 2025

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,699 618 Updated Feb 21, 2025

mikel-brostrom / boxmot

BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models

Python 7,769 1,855 Updated Oct 31, 2025

yangchris11 / samurai

Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

Python 6,988 479 Updated Mar 18, 2025

DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 6,954 691 Updated Jan 22, 2025

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,641 542 Updated Jul 11, 2024

datajuicer / data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 5,475 285 Updated Nov 5, 2025

openvla / openvla

Forked from TRI-ML/prismatic-vlms

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 4,324 516 Updated Mar 23, 2025

NVlabs / VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,629 303 Updated Oct 20, 2025

haosulab / ManiSkill

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.

Python 2,216 379 Updated Nov 5, 2025

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,782 107 Updated Sep 27, 2024

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,763 76 Updated Oct 22, 2025

serengil / retinaface

RetinaFace: Deep Face Detection Library for Python

Python 1,761 180 Updated Aug 11, 2025

thu-ml / RoboticsDiffusionTransformer

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,515 144 Updated Sep 28, 2025

TencentARC / SEED-Voken

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 968 35 Updated Oct 22, 2025

vimalabs / VIMA

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

Python 831 96 Updated Apr 18, 2024

OpenDriveLab / UniVLA

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 817 48 Updated Aug 21, 2025

FlagOpen / RoboBrain2.0

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉

Python 674 57 Updated Sep 30, 2025

agiresearch / A-mem

A-MEM: Agentic Memory for LLM Agents

Python 664 79 Updated Oct 21, 2025

vision-x-nyu / thinking-in-space

Official repo and evaluation implementation of VSI-Bench

Python 616 37 Updated Aug 5, 2025

yang-zj1026 / legged-loco

Low-level locomotion policy training in Isaac Lab

Python 349 30 Updated Mar 7, 2025

MichalZawalski / embodied-CoT

Forked from openvla/openvla

Embodied Chain of Thought: A robotic policy that reason to solve the task.

Python 322 16 Updated Apr 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly