Skip to content
View Alex-Songs's full-sized avatar

Block or report Alex-Songs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 444 24 Updated Dec 15, 2025

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Python 72 8 Updated Dec 3, 2025

LongCat Audio Tokenizer and Detokenizer

Python 263 18 Updated Dec 15, 2025

Text-audio foundation model from Boson AI

Python 7,754 577 Updated Sep 15, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,504 180 Updated Dec 21, 2025

[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Python 101 6 Updated Dec 12, 2025

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Python 140 6 Updated May 16, 2025

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Python 2,665 272 Updated Nov 26, 2025

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python 374 44 Updated Jun 17, 2025

Audio Large Language Models

Python 828 42 Updated Jul 5, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,290 327 Updated Dec 15, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,972 12 Updated Dec 2, 2025

s1: Simple test-time scaling

Python 6,615 764 Updated Jun 25, 2025

small audio language model for reasoning

Python 82 4 Updated Dec 4, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,101 31,502 Updated Dec 20, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 303 Updated Jun 12, 2025

✨✨Latest Advances on Multimodal Large Language Models

17,029 1,095 Updated Dec 12, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,758 1,071 Updated Dec 21, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,286 103 Updated Oct 29, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 620 23 Updated Mar 18, 2025

Latest Advances on System-2 Reasoning

Python 1,297 73 Updated Jun 8, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 51,384 8,964 Updated Nov 17, 2025

This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov

Jupyter Notebook 2,052 341 Updated Dec 15, 2025

VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 310 19 Updated Dec 11, 2025

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 197 14 Updated Jul 29, 2025

每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈

Jupyter Notebook 4,951 488 Updated Oct 13, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,819 1,034 Updated Dec 5, 2025
Next