Skip to content
View Wei-Baldwin-Zeng's full-sized avatar

Block or report Wei-Baldwin-Zeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Data processing for and with foundation models! 🍎 πŸ‹ 🌽 ➑️ ➑️🍸 🍹 🍷

Python 5,478 286 Updated Nov 6, 2025

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 820 48 Updated Nov 6, 2025

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.

Python 2,223 380 Updated Nov 5, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,096 1,209 Updated Nov 4, 2025

BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models

Python 7,772 1,855 Updated Oct 31, 2025

This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! πŸ”₯

1,583 89 Updated Oct 30, 2025

[TMLR 2024] repository for VLN with foundation models

205 10 Updated Oct 25, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,763 76 Updated Oct 22, 2025

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 969 35 Updated Oct 22, 2025

A-MEM: Agentic Memory for LLM Agents

Python 667 80 Updated Oct 21, 2025

PyTorch implementation of paper "ARTrack" and "ARTrackV2"

Python 292 35 Updated Oct 20, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,629 304 Updated Oct 20, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,251 58 Updated Oct 18, 2025

A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites

4,079 318 Updated Oct 17, 2025

[CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"

Python 265 18 Updated Oct 16, 2025

[RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVid

Python 296 20 Updated Oct 15, 2025

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. πŸŽ‰πŸŽ‰πŸŽ‰

Python 677 57 Updated Sep 30, 2025

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,515 145 Updated Sep 28, 2025

State-of-the-art 2D and 3D Face Analysis Project

Python 26,958 5,814 Updated Sep 27, 2025

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Python 178 15 Updated Sep 24, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,190 1,665 Updated Sep 24, 2025

πŸ€– RoboOS: A Universal Embodied Operating System for Cross-Embodied and Multi-Robot Collaboration

Python 239 28 Updated Sep 4, 2025

Vision-Language Navigation Benchmark in Isaac Lab

Python 261 24 Updated Aug 28, 2025

RetinaFace: Deep Face Detection Library for Python

Python 1,762 180 Updated Aug 11, 2025

Official repo and evaluation implementation of VSI-Bench

Python 617 37 Updated Aug 5, 2025

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Python 75 6 Updated May 17, 2025

Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics

Python 155 10 Updated May 6, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 27,770 2,755 Updated Apr 30, 2025

Embodied Chain of Thought: A robotic policy that reason to solve the task.

Python 322 16 Updated Apr 5, 2025
Next