Skip to content
View jwyang's full-sized avatar
🏠
🏠

Block or report jwyang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,930 161 Updated Mar 3, 2026

Fully open reproduction of DeepSeek-R1

Python 26,328 2,444 Updated Apr 2, 2026

[ICLR 2025] LAPA: Latent Action Pretraining from Videos

Python 544 43 Updated Jan 22, 2025

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Python 40 1 Updated Nov 10, 2024

[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space

Python 1,046 42 Updated Aug 8, 2025

Matryoshka Multimodal Models

Python 123 11 Updated Jan 22, 2025

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 90 4 Updated Jun 17, 2024

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 145 4 Updated Aug 23, 2024

Reaching LLaMA2 Performance with 0.1M Dollars

Python 986 76 Updated Jul 23, 2024

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 761 52 Updated Sep 27, 2024

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,172 76 Updated Oct 21, 2024
4 Updated Sep 30, 2024
Python 648 35 Updated Feb 15, 2024
Python 410 17 Updated Jul 29, 2024

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,677 222 Updated Jun 15, 2026

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 540 26 Updated Apr 8, 2024

Browse the web with GPT-4V and Vimium

Python 2,652 201 Updated Sep 25, 2024

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,686 134 Updated Jan 14, 2025

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 1,059 100 Updated Dec 9, 2024

[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,544 112 Updated Aug 19, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,210 18,169 Updated Jun 18, 2026

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,346 162 Updated Oct 5, 2023

Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]

Python 102 19 Updated Apr 30, 2024

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,844 147 Updated Jul 10, 2025

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Python 414 12 Updated Mar 25, 2024

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,792 454 Updated Aug 19, 2024

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,906 403 Updated Mar 27, 2026

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python 759 47 Updated Jan 22, 2024

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Python 1,369 84 Updated Jan 23, 2024

Code base for MinD-Vis

Python 795 106 Updated May 24, 2023
Next