Skip to content
View ebezzam's full-sized avatar

Highlights

  • Pro

Organizations

@LCAV

Block or report ebezzam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Python wrappers for Kaldi Levenshtein's distance and alignment code.

CMake 69 12 Updated Jun 12, 2026

Unofficial fairseq-free PyTorch implementation of UTMOS (v1, 2022), matching the original system.

Python 33 1 Updated Jun 6, 2026

Kernel sources for https://huggingface.co/kernels-community

Python 127 53 Updated Jun 12, 2026

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,443 1,162 Updated Jun 11, 2026

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,650 361 Updated Jun 21, 2025

Perceptual Quality Estimator for speech and audio

C++ 896 141 Updated May 17, 2025

Open source framework to vibecode and prototype voice agents with Gradium APIs

Rust 95 20 Updated Jun 9, 2026

Machine Learning Systems

Python 24,894 2,992 Updated Jun 14, 2026

A lightweight deep learning training framework implemented from scratch in C++, featuring a PyTorch-style API.

C++ 183 27 Updated Jun 10, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,906 292 Updated Jan 30, 2026

Mount Hugging Face Buckets and repos as local filesystems. No download, no copy, no waiting.

Rust 744 53 Updated Jun 14, 2026

[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 369 25 Updated Jun 11, 2026

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,297 111 Updated Mar 2, 2025

[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.

Python 220 16 Updated Dec 9, 2025

[Interspeech 2026] Official Implementation of "ALARM: Audio–Language Alignment for Reasoning Models"

Python 13 1 Updated Jun 10, 2026

Practical, Colab-friendly notebooks for fine-tuning and running audio AI models

Jupyter Notebook 418 29 Updated May 19, 2026

Pure-PyTorch Parakeet TDT inference

Python 47 8 Updated Mar 10, 2026

Open Source Speech Language Model

Jupyter Notebook 994 107 Updated May 11, 2026

Benchmarking Large Language Models using the Eleusis card game

Python 14 3 Updated Feb 16, 2026

Real-time text-to-speech with Qwen3-TTS

Python 1,118 167 Updated Jun 10, 2026

The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.

Python 11,078 1,344 Updated May 27, 2026

Riva Python client API and CLI utils

Python 132 49 Updated Jun 3, 2026

Soprano: Instant, Ultra-Realistic Text-to-Speech

Python 1,235 105 Updated Jan 15, 2026

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Python 1,881 483 Updated May 17, 2026

SoTA open-source TTS

Python 25,064 3,322 Updated Jun 10, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,380 3,434 Updated Jun 14, 2026

The Hugging Face Course on Transformers for Audio

MDX 500 152 Updated May 26, 2026

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,838 254 Updated Dec 30, 2025
Python 18 1 Updated Nov 19, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,443 440 Updated Dec 11, 2025
Next