-
University of Chinese Academy of Sciences
Stars
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
GPT-4o-level, real-time spoken dialogue system.
XiaomiMiMo / lmms-eval
Forked from EvolvingLMMs-Lab/lmms-evalAccelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
FastLongSpeech is a novel framework designed to extend the capabilities of Large Speech-Language Models for efficient long-speech processing without necessitating dedicated long-speech training data.
StreamUni is a framework that efficiently enables unified Large Speech-Language Models to accomplish streaming speech translation in a cohesive manner.
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…
Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
Awesome speech/audio LLMs, representation learning, and codec models
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
The official repository of Dynamic-SUPERB.
A TTS model capable of generating ultra-realistic dialogue in one pass.
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Evaluation Protocol for Large-Scale Zero-Shot TTS Literature
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Code for ICML25 Paper "Overcoming Non-monotonicity in Transducer-based Streaming Generation"
Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.