Skip to content
View Alex-Songs's full-sized avatar

Block or report Alex-Songs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 447 25 Updated Dec 15, 2025

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Python 71 8 Updated Dec 3, 2025

LongCat Audio Tokenizer and Detokenizer

Python 264 18 Updated Dec 15, 2025

Text-audio foundation model from Boson AI

Python 7,769 578 Updated Sep 15, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,523 185 Updated Dec 25, 2025

[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Python 101 6 Updated Dec 12, 2025

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Python 140 6 Updated May 16, 2025

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Python 2,668 272 Updated Nov 26, 2025

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python 374 44 Updated Jun 17, 2025

Audio Large Language Models

Python 832 42 Updated Jul 5, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,314 332 Updated Dec 24, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,975 12 Updated Dec 2, 2025

s1: Simple test-time scaling

Python 6,620 764 Updated Jun 25, 2025

small audio language model for reasoning

Python 83 4 Updated Dec 4, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,216 31,533 Updated Dec 24, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 304 Updated Jun 12, 2025

✨✨Latest Advances on Multimodal Large Language Models

17,054 1,098 Updated Dec 23, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,835 1,084 Updated Dec 25, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,288 103 Updated Oct 29, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 620 23 Updated Mar 18, 2025

Latest Advances on System-2 Reasoning

Python 1,298 73 Updated Jun 8, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 51,443 8,977 Updated Nov 17, 2025

This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov

Jupyter Notebook 2,058 341 Updated Dec 15, 2025

VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 312 19 Updated Dec 11, 2025

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 198 14 Updated Jul 29, 2025

每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈

Jupyter Notebook 4,981 489 Updated Oct 13, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,829 1,036 Updated Dec 24, 2025
Next