Wei-Baldwin-Zeng

Wei-Baldwin-Zeng Wei-Baldwin-Zeng

2 followers · 1 following

autonomy_stack_go2 Public
Forked from jizhang-cmu/autonomy_stack_go2

Full Autonomy Stack for Unitree Go2

C++ Updated Mar 31, 2025
samurai Public
Forked from yangchris11/samurai

Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

Python Apache License 2.0 Updated Mar 18, 2025
VILA Public
Forked from NVlabs/VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python Apache License 2.0 Updated Jan 24, 2025
VLN-Survey-with-Foundation-Models Public
Forked from zhangyuejoslin/VLN-Survey-with-Foundation-Models

Updated Jan 8, 2025
RoboticsDiffusionTransformer Public
Forked from thu-ml/RoboticsDiffusionTransformer

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python MIT License Updated Dec 24, 2024
VLN-CE-Isaac Public
Forked from yang-zj1026/NaVILA-Bench

Vision-Language Navigation Benchmark in Isaac Lab

Python Other Updated Dec 20, 2024
legged-loco Public
Forked from yang-zj1026/legged-loco

Low-level locomotion policy training in Isaac Lab

Python MIT License Updated Dec 15, 2024
data-juicer Public
Forked from datajuicer/data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

Python Apache License 2.0 Updated Nov 8, 2024
CogVideo Public
Forked from zai-org/CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python Apache License 2.0 Updated Oct 28, 2024
Show-o Public
Forked from showlab/Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python Apache License 2.0 Updated Oct 27, 2024
videophy Public
Forked from Hritikbansal/videophy

Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics

Python MIT License Updated Oct 11, 2024
VAR Public
Forked from FoundationVision/VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…

Python MIT License Updated Oct 6, 2024
mar Public
Forked from LTH14/mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python MIT License Updated Sep 27, 2024
Open-MAGVIT2 Public
Forked from TencentARC/SEED-Voken

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Python Apache License 2.0 Updated Sep 27, 2024
Open-Sora Public
Forked from hpcaitech/Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python Apache License 2.0 Updated Aug 9, 2024
TextHawk Public
Forked from yuyq96/TextHawk

Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Python Updated Apr 16, 2024
Vary Public
Forked from Ucas-HaoranWei/Vary

Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Python Updated Dec 12, 2023
Monkey Public
Forked from Yuliang-Liu/Monkey

Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python MIT License Updated Dec 4, 2023
unilm Public
Forked from microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python MIT License Updated Nov 29, 2023
Kosmos2.5 Public
Forked from kyegomez/Kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

Python MIT License Updated Nov 27, 2023
nougat Public
Forked from facebookresearch/nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python MIT License Updated Nov 17, 2023
donut Public
Forked from clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python MIT License Updated Nov 15, 2023
Awesome-Open-Vocabulary-Object-Detection Public
Forked from witnessai/Awesome-Open-Vocabulary-Object-Detection

A curated list of papers, datasets and resources pertaining to open vocabulary object detection.

Updated Aug 21, 2023
sam-hq Public
Forked from SysCV/sam-hq

Segment Anything in High Quality

Python Apache License 2.0 Updated Aug 16, 2023
fc-clip Public
Forked from bytedance/fc-clip

This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Python Apache License 2.0 Updated Aug 15, 2023
LISA Public
Forked from dvlab-research/LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python Apache License 2.0 Updated Aug 10, 2023
ONE-PEACE Public
Forked from OFA-Sys/ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python Apache License 2.0 Updated Aug 10, 2023
DeOP Public
Forked from CongHan0808/DeOP

Open-vocabulary Semantic Segmentation

Python Updated Aug 3, 2023
Segment-Everything-Everywhere-All-At-Once Public
Forked from UX-Decoder/Segment-Everything-Everywhere-All-At-Once

Official implementation of the paper "Segment Everything Everywhere All at Once"

Python Apache License 2.0 Updated Jul 28, 2023
Grounded-Segment-Anything Public
Forked from IDEA-Research/Grounded-Segment-Anything

Grounded-SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook Apache License 2.0 Updated Jul 25, 2023

Wei-Baldwin-Zeng Wei-Baldwin-Zeng

autonomy_stack_go2 Public

Uh oh!

samurai Public

Uh oh!

VILA Public

Uh oh!

VLN-Survey-with-Foundation-Models Public

Uh oh!

RoboticsDiffusionTransformer Public

Uh oh!

VLN-CE-Isaac Public

Uh oh!

legged-loco Public

Uh oh!

data-juicer Public

Uh oh!

CogVideo Public

Uh oh!

Show-o Public

Uh oh!

videophy Public

Uh oh!

VAR Public

Uh oh!

mar Public

Uh oh!

Open-MAGVIT2 Public

Uh oh!

Open-Sora Public

Uh oh!

TextHawk Public

Uh oh!

Vary Public

Uh oh!

Monkey Public

Uh oh!

unilm Public

Uh oh!

Kosmos2.5 Public

Uh oh!

nougat Public

Uh oh!

donut Public

Uh oh!

Awesome-Open-Vocabulary-Object-Detection Public

Uh oh!

sam-hq Public

Uh oh!

fc-clip Public

Uh oh!

LISA Public

Uh oh!

ONE-PEACE Public

Uh oh!

DeOP Public

Uh oh!

Segment-Everything-Everywhere-All-At-Once Public

Uh oh!

Grounded-Segment-Anything Public

Uh oh!