Highlights
- Pro
Stars
Effortless data labeling with AI support from Segment Anything and other awesome models.
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
基于图像识别的自动化黑盒测试框架 | An automation black-box testing framework based on image recognition
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Simulation platform for general-purpose robotics & embodied AI learning.
(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
The official repository of "Video assistant towards large language model makes everything easy"
Fast and memory-efficient exact attention
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
An Extensible Continual Learning Framework Focused on Language Models (LMs)
OpenEQA Embodied Question Answering in the Era of Foundation Models
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
PyTorch code and models for V-JEPA self-supervised learning from video.
Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay