Stars
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
LongCat Audio Tokenizer and Detokenizer
Text-audio foundation model from Boson AI
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
OmniGen2: Exploration to Advanced Multimodal Generation.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
✨✨Latest Advances on Multimodal Large Language Models
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Explore the Multimodal “Aha Moment” on 2B Model
Latest Advances on System-2 Reasoning
No fortress, purely open ground. OpenManus is Coming.
This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov
VoiceBench: Benchmarking LLM-Based Voice Assistants
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
DeepEP: an efficient expert-parallel communication library