Lists (30)
Sort Name ascending (A-Z)
Acoustic Echo Cancellation
audio effect
Audio Synthesis
BabyCry Det
corpus
database
Hearing Aid
HIFI-DSP
KWS
LLM
machine translation
mic array
Music & Song AGI
Music Source Separation
neural audio codec
NLP
NN algorithm
Pronunciation Assessment
SED
Sound Source Localization
Sound Source Separation
Speaker Diarization
Speaker ID
Speech Enhancement
Speech Recognition
Spoken Language ID
Spoken Language Understanding
Text-to-Speech
TinyML
voice agent
- All languages
- Assembly
- AutoHotkey
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- D
- Dart
- Dockerfile
- Eiffel
- Emacs Lisp
- Faust
- Fortran
- Go
- Groovy
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- KiCad Layout
- Kotlin
- Lex
- LiveScript
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Mathematica
- Mojo
- Nim
- OCaml
- Objective-C
- Objective-C++
- OpenEdge ABL
- OpenSCAD
- PHP
- PLSQL
- Perl
- PostScript
- Prolog
- PureBasic
- Python
- R
- Racket
- ReScript
- RenderScript
- Rich Text Format
- Roff
- Ruby
- Rust
- Sass
- Scala
- Scheme
- Shell
- Swift
- SystemVerilog
- TeX
- TypeScript
- Verilog
- Vim Script
- Vue
- XC
- XML
- sed
Starred repositories
end-to-end text to audio scene generation model
Parameter-efficient text-to-audio generation for edge and low-memory deployment.
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Industrial audio online policy distillation (OPD) training stack for ASR and TTS, distilling compact audio models from stronger teacher models.
alphacep / GigaAM
Forked from salute-developers/GigaAMFoundational Model for Speech Recognition Tasks
TugaPhone is a Python library that phonemizes arbitrary Portuguese text across major Lusophone dialects (pt-PT, pt-BR, pt-AO, pt-MZ, pt-TL). It uses a curated phonetic lexicon plus a rule-based fal…
a Python library for phonetic fuzzy searching and segment-to-segment distance computation. It allows you to find words that "sound like" a query by analyzing International Phonetic Alphabet (IPA) f…
A browser-based tool for aligning audio with text transcriptions and IPA (International Phonetic Alphabet) at word, grapheme, and sentence level. Runs entirely client-side, no server, no dependenci…
Official implementation of the Interspeech 2026 paper: UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction
Extract a target speaker’s clean, non-overlapped speech from multi-speaker audio and export word-safe LJSpeech-style TTS datasets.
Official implementation of "USAD: Universal Speech and Audio Representation via Distillation"
Zonos2 is a leading open-weight text-to-speech MoE.
Koel Labs innovates open-source speech research, inclusive speech technologies, and real-time pronunciation feedback for language learners! This repo contains the ML training, evaluation, and data …
Codebase for the Interspeech 2026 Paper: MambAdapter: Lightweight Mamba-Based Adapters for Parameter-Efficient Transfer Learning in Speech and Audio
Inference server for MioTTS, a lightweight and fast LLM-based TTS model.
🚀 Fastest Anything-to-Audio Gen for conditioned sound and music creation.
Native full-duplex speech dialogue inference for BayLing-Duplex.