Lists (16)
Sort Name ascending (A-Z)
Stars
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
first base model for full-duplex conversational audio
Vocal Remover using Deep Neural Networks
Underthesea - Vietnamese NLP Toolkit
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Implementation of Hinton's forward-forward (FF) algorithm - an alternative to back-propagation
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
A webui for different audio related Neural Networks
This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Long-form streaming TTS system for multi-speaker dialogue generation
Unofficial PyTorch implementation of Google AI's VoiceFilter system
Audio processing by using pytorch 1D convolution network
Tools for handling multimodal data in machine learning projects.
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.
A Framework for Speech, Language, Audio, Music Processing with Large Language Model