ebezzam

Eric Bezzam ebezzam

Audio ML @huggingface

108 followers · 1 following

Highlights

Organizations

Lists (4)

Sort

Stars

pzelasko / kaldialign

Python wrappers for Kaldi Levenshtein's distance and alignment code.

CMake 69 12 Updated Jun 12, 2026

Blinorot / utmos-pytorch

Unofficial fairseq-free PyTorch implementation of UTMOS (v1, 2022), matching the original system.

Python 33 1 Updated Jun 6, 2026

huggingface / kernels-community

Kernel sources for https://huggingface.co/kernels-community

Python 127 53 Updated Jun 12, 2026

k2-fsa / OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,443 1,162 Updated Jun 11, 2026

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,650 361 Updated Jun 21, 2025

google / visqol

Perceptual Quality Estimator for speech and audio

C++ 896 141 Updated May 17, 2025

gradium-ai / gradbot

Open source framework to vibecode and prototype voice agents with Gradium APIs

Rust 95 20 Updated Jun 9, 2026

harvard-edge / cs249r_book

Machine Learning Systems

Python 24,894 2,992 Updated Jun 14, 2026

keith2018 / TinyTorch

A lightweight deep learning training framework implemented from scratch in C++, featuring a PyTorch-style API.

C++ 183 27 Updated Jun 10, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,906 292 Updated Jan 30, 2026

huggingface / hf-mount

Mount Hugging Face Buckets and repos as local filesystems. No download, no copy, no waiting.

Rust 744 53 Updated Jun 14, 2026

MatthewCYM / VoiceBench

[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 369 25 Updated Jun 11, 2026

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,297 111 Updated Mar 2, 2025

boson-ai / EmergentTTS-Eval-public

[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.

Python 220 16 Updated Dec 9, 2025

Blinorot / ALARM

[Interspeech 2026] Official Implementation of "ALARM: Audio–Language Alignment for Reasoning Models"

Python 13 1 Updated Jun 10, 2026

Deep-unlearning / smol-audio

Practical, Colab-friendly notebooks for fine-tuning and running audio AI models

Jupyter Notebook 418 29 Updated May 19, 2026

andimarafioti / nano-parakeet

Pure-PyTorch Parakeet TDT inference

Python 47 8 Updated Mar 10, 2026

HumeAI / tada

Open Source Speech Language Model

Jupyter Notebook 994 107 Updated May 11, 2026

scienceetonnante / eleusis-llm-benchmark

Benchmarking Large Language Models using the Eleusis card game

Python 14 3 Updated Feb 16, 2026

andimarafioti / faster-qwen3-tts

Real-time text-to-speech with Qwen3-TTS

Python 1,118 167 Updated Jun 10, 2026

ace-step / ACE-Step-1.5

The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.

Python 11,078 1,344 Updated May 27, 2026

nvidia-riva / python-clients

Riva Python client API and CLI utils

Python 132 49 Updated Jun 3, 2026

ekwek1 / soprano

Soprano: Instant, Ultra-Realistic Text-to-Speech

Python 1,235 105 Updated Jan 15, 2026

LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Python 1,881 483 Updated May 17, 2026

resemble-ai / chatterbox

SoTA open-source TTS

Python 25,064 3,322 Updated Jun 10, 2026

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,380 3,434 Updated Jun 14, 2026