Stars
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Reference-aware automatic speech evaluation toolkit
[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025)
Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune
Streaming and Fine-tuning for Chatterbox TTS
SoTA open-source TTS
SoTA open-source TTS
zero-shot voice conversion & singing voice conversion, with real-time support
A simple, high-quality voice conversion tool focused on ease of use and performance.
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
Build a Wake Word Detection model for Voice Assistant using PyTorch
OpenReview configuration for EMNLP 2025 demo papers
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Recurrent neural network for audio noise reduction
DiFlow-TTS delivers low-latency zero-shot TTS via discrete flow matching and factorized speech tokens. A compact, open framework for fast voice synthesis.🐙
Artificial Neural Engine Machine Learning Library
Simultaneous speech-to-text model
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
An extremely fast Python linter and code formatter, written in Rust.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Educational implementation of the Discrete Flow Matching paper
Discrete Flow Matching implemented in PyTorch