Wei-Baldwin-Zeng

Wei-Baldwin-Zeng Wei-Baldwin-Zeng

2 followers · 1 following

Stars

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 27,764 2,754 Updated Apr 30, 2025

deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project

Python 26,951 5,813 Updated Sep 27, 2025

OpenBMB / MiniCPM-V

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,187 1,665 Updated Sep 24, 2025

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,093 1,208 Updated Nov 4, 2025

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,699 618 Updated Feb 21, 2025

mikel-brostrom / boxmot

BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models

Python 7,769 1,855 Updated Oct 31, 2025

yangchris11 / samurai

Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

Python 6,988 479 Updated Mar 18, 2025

DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 6,955 691 Updated Jan 22, 2025

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,641 542 Updated Jul 11, 2024

datajuicer / data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 5,476 286 Updated Nov 6, 2025

openvla / openvla

Forked from TRI-ML/prismatic-vlms

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 4,329 518 Updated Mar 23, 2025

GT-RIPL / Awesome-LLM-Robotics

A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites

4,079 318 Updated Oct 17, 2025

NVlabs / VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,629 303 Updated Oct 20, 2025

haosulab / ManiSkill

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.

Python 2,219 379 Updated Nov 5, 2025

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,782 107 Updated Sep 27, 2024

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,763 76 Updated Oct 22, 2025

serengil / retinaface

RetinaFace: Deep Face Detection Library for Python

Python 1,762 180 Updated Aug 11, 2025

zchoi / Awesome-Embodied-Robotics-and-Agent

This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥

1,581 89 Updated Oct 30, 2025

thu-ml / RoboticsDiffusionTransformer

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,515 145 Updated Sep 28, 2025

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,250 58 Updated Oct 18, 2025

TencentARC / SEED-Voken

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 968 35 Updated Oct 22, 2025

mk-minchul / AdaFace

Jupyter Notebook 838 142 Updated Jul 10, 2024

vimalabs / VIMA

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

Python 831 96 Updated Apr 18, 2024

OpenDriveLab / UniVLA

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 818 48 Updated Nov 6, 2025

FlagOpen / RoboBrain2.0

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉

Python 674 57 Updated Sep 30, 2025

agiresearch / A-mem

A-MEM: Agentic Memory for LLM Agents

Python 665 80 Updated Oct 21, 2025

vision-x-nyu / thinking-in-space

Official repo and evaluation implementation of VSI-Bench

Python 616 37 Updated Aug 5, 2025

GigaAI-research / General-World-Models-Survey

451 25 Updated Oct 30, 2025

yang-zj1026 / legged-loco

Low-level locomotion policy training in Isaac Lab

Python 349 30 Updated Mar 7, 2025

MichalZawalski / embodied-CoT

Forked from openvla/openvla

Embodied Chain of Thought: A robotic policy that reason to solve the task.

Python 322 16 Updated Apr 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly