Lists (16)
Sort Name ascending (A-Z)
Stars
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
LLM training code for Databricks foundation models
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Vector (and Scalar) Quantization, in Pytorch
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
A python package to analyze and compare voices with deep learning
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
An unofficial PyTorch implementation of the audio LM VALL-E
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
This library provides common speech features for ASR including MFCCs and filterbank energies.
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
TorchCFM: a Conditional Flow Matching library
Audio generation using diffusion models, in PyTorch.
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
AI powered speech denoising and enhancement