-
Hugging Face
- France
- https://ebezzam.github.io/
- @ericbezzam
- in/eric-bezzam
Highlights
- Pro
Lists (4)
Sort Name ascending (A-Z)
Stars
Python wrappers for Kaldi Levenshtein's distance and alignment code.
Unofficial fairseq-free PyTorch implementation of UTMOS (v1, 2022), matching the original system.
Kernel sources for https://huggingface.co/kernels-community
High-Quality Voice Cloning TTS for 600+ Languages
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Open source framework to vibecode and prototype voice agents with Gradium APIs
A lightweight deep learning training framework implemented from scratch in C++, featuring a PyTorch-style API.
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
Mount Hugging Face Buckets and repos as local filesystems. No download, no copy, no waiting.
[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.
[Interspeech 2026] Official Implementation of "ALARM: Audio–Language Alignment for Reasoning Models"
Practical, Colab-friendly notebooks for fine-tuning and running audio AI models
Benchmarking Large Language Models using the Eleusis card game
Real-time text-to-speech with Qwen3-TTS
The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
Soprano: Instant, Ultra-Realistic Text-to-Speech
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
The Hugging Face Course on Transformers for Audio
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.