Skip to content
View zjutkarma's full-sized avatar
🦋
🦋

Block or report zjutkarma

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.

Python 125 8 Updated Jul 27, 2024

由Claude扮演一位专业的股票分析师,通过 Python 脚本获取真实市场数据,结合技术分析和消息面,为用户生成决策看板。

Python 38 3 Updated Mar 4, 2026

This repository contains code and metadata of How2 dataset

Python 193 20 Updated Dec 30, 2024

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 485 31 Updated May 9, 2026
Python 47 4 Updated Feb 9, 2025

利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

Python 57,316 8,281 Updated May 12, 2026

Automate the process of making money online.

Python 30,485 3,255 Updated May 15, 2026

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…

Python 189 24 Updated Feb 23, 2026

[NeurIPS'25 Spotlight] Official implementation of "JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation"

Python 73 7 Updated Feb 26, 2026

[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 134 Updated Apr 7, 2026

[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Python 38 Updated May 7, 2026

A curated list of papers, tools, and resources on Multi-Token Prediction (MTP) and related techniques in Large Language Models (LLMs), Speech-Language Models (SLMs), and more.

107 7 Updated May 16, 2026

Awesome Unified Multimodal Models

1,249 39 Updated Mar 24, 2026

Benchmarking Audio-Visual Social Interactivity in Omni Models

Python 47 1 Updated May 7, 2026

EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

Python 258 8 Updated Aug 20, 2025

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 981 62 Updated Oct 15, 2025
Python 3 Updated Mar 11, 2026

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 34,366 3,441 Updated May 15, 2026

[CVPR 2026 highlight] Official release of EgoAVU Egocentric Audio-Visual Understanding

Python 30 4 Updated Apr 26, 2026

The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.

140 10 Updated Dec 4, 2023

Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format

Python 44 Updated Feb 5, 2025

[ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning

Python 35 2 Updated Jan 14, 2026
Python 34 2 Updated Nov 5, 2025

[CVPR 2025] EgoLife: Towards Egocentric Life Assistant

Python 426 19 Updated Mar 19, 2025

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 372,630 77,242 Updated May 17, 2026

Advancing Open-source World Models

Python 3,758 329 Updated May 10, 2026

An End-to-End Infrastructure for Training and Evaluating Various LLM Agents

Python 794 68 Updated Feb 9, 2026

World's First Full-Chinese Ray-Ban Meta AI Assistant - 全球首个全中文 Ray-Ban Meta 智能眼镜 AI 助手

Swift 448 73 Updated Mar 29, 2026

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 20,566 2,541 Updated Mar 16, 2026
Next