Skip to content
View waxnkw's full-sized avatar

Block or report waxnkw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Gym for Agentic LLMs

Python 285 12 Updated Oct 9, 2025

Quantile Advantage Estimation for Entropy-Safe Reasoning

Python 15 Updated Sep 28, 2025

Official repo for paper "Sparse Representation and Construction for High-Resolution 3D Shapes Modeling".

1,223 64 Updated Jun 16, 2025

NVIDIA Isaac GR00T N1.5 - A Foundation Model for Generalist Robots.

Jupyter Notebook 5,012 774 Updated Oct 9, 2025

Long Context Transfer from Language to Vision

Python 394 20 Updated Mar 18, 2025
Python 4,294 408 Updated Sep 14, 2025

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 569 35 Updated Oct 20, 2024
Python 153 26 Updated Oct 31, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 27,359 2,695 Updated Apr 30, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,049 1,649 Updated Sep 24, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,298 722 Updated Sep 22, 2025

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Python 252 10 Updated Feb 5, 2024

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Python 838 52 Updated Jul 29, 2024

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,567 358 Updated May 13, 2025

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral

Python 91 Updated Nov 2, 2023

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,067 89 Updated Jun 13, 2024
Python 797 47 Updated Jul 8, 2024

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

Python 944 83 Updated Nov 11, 2023

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…

Python 539 39 Updated Apr 21, 2024

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

Python 323 36 Updated Aug 1, 2023

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

Python 270 24 Updated Oct 13, 2023

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,138 4,756 Updated Jun 2, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,362 584 Updated Oct 28, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 30,978 3,784 Updated Jul 23, 2024

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Python 156 20 Updated Dec 9, 2024

This is the code of ECCV 2022 (Oral) paper "Fine-Grained Scene Graph Generation with Data Transfer".

Jupyter Notebook 103 10 Updated Jan 24, 2023

Official repository for the A-OKVQA dataset

Python 99 13 Updated May 8, 2024

Code repository for "It's About Time: Analog clock Reading in the Wild"

Python 79 12 Updated Jun 15, 2024

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 31 1 Updated Jul 18, 2023

Visual Relation Grounding in Videos (ECCV'20, Spotlight)

Python 57 7 Updated Dec 8, 2022
Next