Wei-Baldwin-Zeng

Wei-Baldwin-Zeng Wei-Baldwin-Zeng

2 followers · 1 following

Stars

36 results for source starred repositories written in Python

Clear filter

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 27,786 2,755 Updated Apr 30, 2025

deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project

Python 26,965 5,812 Updated Sep 27, 2025

OpenBMB / MiniCPM-V

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,193 1,664 Updated Sep 24, 2025

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,102 1,210 Updated Nov 4, 2025

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,700 618 Updated Feb 21, 2025

mikel-brostrom / boxmot

BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models

Python 7,776 1,855 Updated Oct 31, 2025

yangchris11 / samurai

Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

Python 6,990 481 Updated Mar 18, 2025

DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 6,968 693 Updated Jan 22, 2025

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,643 543 Updated Jul 11, 2024

datajuicer / data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 5,484 286 Updated Nov 7, 2025

NVlabs / VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,639 304 Updated Oct 20, 2025

haosulab / ManiSkill

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.

Python 2,227 381 Updated Nov 5, 2025

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,782 107 Updated Sep 27, 2024

serengil / retinaface

RetinaFace: Deep Face Detection Library for Python

Python 1,763 180 Updated Aug 11, 2025

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,763 76 Updated Oct 22, 2025

thu-ml / RoboticsDiffusionTransformer

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,517 145 Updated Sep 28, 2025

TencentARC / SEED-Voken

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 971 35 Updated Oct 22, 2025

vimalabs / VIMA

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

Python 831 96 Updated Apr 18, 2024

OpenDriveLab / UniVLA

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 820 49 Updated Nov 6, 2025

FlagOpen / RoboBrain2.0

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉

Python 680 57 Updated Sep 30, 2025

agiresearch / A-mem

A-MEM: Agentic Memory for LLM Agents

Python 668 80 Updated Oct 21, 2025

vision-x-nyu / thinking-in-space

Official repo and evaluation implementation of VSI-Bench

Python 618 37 Updated Aug 5, 2025

yang-zj1026 / legged-loco

Low-level locomotion policy training in Isaac Lab

Python 350 30 Updated Mar 7, 2025

jzhzhang / NaVid-VLN-CE

[RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVid

Python 298 20 Updated Oct 15, 2025

MIV-XJTU / ARTrack

PyTorch implementation of paper "ARTrack" and "ARTrackV2"

Python 292 35 Updated Oct 20, 2025

NVIDIA-AI-IOT / remembr

Python 280 34 Updated Mar 17, 2025

wsakobe / TrackVLA

[CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"

Python 267 19 Updated Oct 16, 2025

yang-zj1026 / NaVILA-Bench

Vision-Language Navigation Benchmark in Isaac Lab

Python 263 25 Updated Aug 28, 2025

FlagOpen / RoboOS

🤖 RoboOS: A Universal Embodied Operating System for Cross-Embodied and Multi-Robot Collaboration

Python 243 29 Updated Sep 4, 2025

embodiedreasoning / ERQA

Embodied Reasoning Question Answer (ERQA) Benchmark

Python 240 12 Updated Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly