-
National University of Singapore
- https://waxnkw.github.io/
Stars
Quantile Advantage Estimation for Entropy-Safe Reasoning
Official repo for paper "Sparse Representation and Construction for High-Resolution 3D Shapes Modeling".
NVIDIA Isaac GR00T N1.5 - A Foundation Model for Generalist Robots.
Long Context Transfer from Language to Vision
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Open-Sora: Democratizing Efficient Video Production for All
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Running large language models on a single GPU for throughput-oriented scenarios.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
This is the code of ECCV 2022 (Oral) paper "Fine-Grained Scene Graph Generation with Data Transfer".
Code repository for "It's About Time: Analog clock Reading in the Wild"
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Visual Relation Grounding in Videos (ECCV'20, Spotlight)