Skip to content
View Ray-ui's full-sized avatar

Block or report Ray-ui

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 293 20 Updated Aug 5, 2025

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,490 127 Updated Aug 5, 2025

This repository is intended to host tools and demos for ActivityNet

Jupyter Notebook 967 328 Updated Mar 21, 2024

[ICLR 2026] FastVGGT: Fast Visual Geometry Transformer

Python 677 39 Updated Jan 28, 2026

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]

Python 572 50 Updated Feb 11, 2026

TALL: Temporal Activity Localization via Language Query

Python 217 49 Updated Mar 15, 2018

Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining

Python 14 1 Updated Oct 12, 2025

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 3,003 338 Updated Aug 28, 2025

[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Python 374 34 Updated May 8, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,757 313 Updated Nov 28, 2025
Jupyter Notebook 242 14 Updated Jun 4, 2025

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 184 16 Updated Aug 2, 2025

[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

Python 128 4 Updated Apr 4, 2025

A minimal, educational HEVC (H.265) encoder written in Python.

Python 40 1 Updated Feb 10, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 201,473 36,068 Updated Feb 16, 2026

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Python 1,824 245 Updated Apr 9, 2024

A Large-scale Video Action Dataset

Python 400 11 Updated Jan 16, 2026
Python 13 2 Updated Nov 11, 2025

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

3,077 139 Updated Dec 20, 2025

Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Python 225 5 Updated Feb 13, 2026

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 85,386 12,920 Updated Feb 9, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,303 1,596 Updated Jan 30, 2026

每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈

Jupyter Notebook 5,504 531 Updated Feb 5, 2026

大模型基础: 一文了解大模型基础知识

6,755 569 Updated Dec 18, 2025

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,893 301 Updated Jan 16, 2024

让我们一起刷CodeTop

40 3 Updated Oct 13, 2025

《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀

Shell 60,338 12,338 Updated Jan 27, 2026

Xiaomi Miloco

Python 2,336 159 Updated Feb 14, 2026

The code for the paper: "Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models"

Jupyter Notebook 56 4 Updated Oct 24, 2025
Next