jwyang

🏠

Jianwei Yang jwyang

🏠

1.9k followers · 32 following

Stars

microsoft / Magma

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,930 161 Updated Mar 3, 2026

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 26,328 2,444 Updated Apr 2, 2026

LatentActionPretraining / LAPA

[ICLR 2025] LAPA: Latent Action Pretraining from Videos

Python 544 43 Updated Jan 22, 2025

mu-cai / TemporalBench

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Python 40 1 Updated Nov 10, 2024

henry123-boy / SpaTracker

[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space

Python 1,046 42 Updated Aug 8, 2025

mu-cai / matryoshka-mm

Matryoshka Multimodal Models

Python 123 11 Updated Jan 22, 2025

MengLcool / DeepStack-VL

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 90 4 Updated Jun 17, 2024

zzxslp / SoM-LLaVA

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 145 4 Updated Aug 23, 2024

myshell-ai / JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Python 986 76 Updated Jul 23, 2024

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 761 52 Updated Sep 27, 2024

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,172 76 Updated Oct 21, 2024

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,677 222 Updated Jun 15, 2026

UX-Decoder / DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 540 26 Updated Apr 8, 2024

ishan0102 / vimGPT

Browse the web with GPT-4V and Vimium

Python 2,652 201 Updated Sep 25, 2024

roboflow / awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,686 134 Updated Jan 14, 2025

ddupont808 / GPT-4V-Act

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 1,059 100 Updated Dec 9, 2024

microsoft / SoM

[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,544 112 Updated Aug 19, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,210 18,169 Updated Jun 18, 2026

microsoft / X-Decoder

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,346 162 Updated Oct 5, 2023

TalalWasim / Video-FocalNets

Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]

Python 102 19 Updated Apr 30, 2024

UX-Decoder / Semantic-SAM

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,844 147 Updated Jul 10, 2025

Zhendong-Wang / Prompt-Diffusion

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Python 414 12 Updated Mar 25, 2024

UX-Decoder / Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,792 454 Updated Aug 19, 2024

google-research / arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,906 403 Updated Mar 27, 2026

IDEA-Research / OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python 759 47 Updated Jan 22, 2024

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Python 1,369 84 Updated Jan 23, 2024

zjc062 / mind-vis

Code base for MinD-Vis

Python 795 106 Updated May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly