Lists (8)
Sort Name ascending (A-Z)
Stars
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
由Claude扮演一位专业的股票分析师,通过 Python 脚本获取真实市场数据,结合技术分析和消息面,为用户生成决策看板。
This repository contains code and metadata of How2 dataset
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Automate the process of making money online.
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…
[NeurIPS'25 Spotlight] Official implementation of "JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation"
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
A curated list of papers, tools, and resources on Multi-Token Prediction (MTP) and related techniques in Large Language Models (LLMs), Speech-Language Models (SLMs), and more.
Awesome Unified Multimodal Models
Benchmarking Audio-Visual Social Interactivity in Omni Models
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
StreamingVLM: Real-Time Understanding for Infinite Video Streams
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
[CVPR 2026 highlight] Official release of EgoAVU Egocentric Audio-Visual Understanding
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.
Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
[ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
An End-to-End Infrastructure for Training and Evaluating Various LLM Agents
World's First Full-Chinese Ray-Ban Meta AI Assistant - 全球首个全中文 Ray-Ban Meta 智能眼镜 AI 助手
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System