Stars
The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grapheme to phoneme conversion with deep learning.
Unified automatic quality assessment for speech, music, and sound.
A lightweight library for Frechet Audio Distance calculation.
JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
A powerful coding agent toolkit providing semantic retrieval and editing capabilities (MCP server & other integrations)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured
MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.
A library for audio and music analysis, feature extraction.
Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".
Deezer source separation library including pretrained models.
A project to help researchers reproduce research papers using LLMs, addressing the problem of "Coming Soon" repos with no actual code.
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.