Skip to content
View atuxhe's full-sized avatar

Block or report atuxhe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

end-to-end text to audio scene generation model

42 1 Updated Jun 16, 2026

Parameter-efficient text-to-audio generation for edge and low-memory deployment.

13 Updated May 29, 2026
Jupyter Notebook 1 Updated Mar 5, 2026

Sdpcodec (Interspeech 2026) source code

Python 3 1 Updated Jun 18, 2026

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Python 684 43 Updated May 30, 2026

Industrial audio online policy distillation (OPD) training stack for ASR and TTS, distilling compact audio models from stronger teacher models.

Python 197 14 Updated Jun 5, 2026

Foundational Model for Speech Recognition Tasks

Python 1 Updated Jun 17, 2026

TugaPhone is a Python library that phonemizes arbitrary Portuguese text across major Lusophone dialects (pt-PT, pt-BR, pt-AO, pt-MZ, pt-TL). It uses a curated phonetic lexicon plus a rule-based fal…

Python 3 Updated Jun 13, 2026

a Python library for phonetic fuzzy searching and segment-to-segment distance computation. It allows you to find words that "sound like" a query by analyzing International Phonetic Alphabet (IPA) f…

Python 8 2 Updated May 30, 2026

A browser-based tool for aligning audio with text transcriptions and IPA (International Phonetic Alphabet) at word, grapheme, and sentence level. Runs entirely client-side, no server, no dependenci…

JavaScript 7 Updated Jun 13, 2026
Python 2 Updated Jun 16, 2026

Official implementation of the Interspeech 2026 paper: UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

Python 4 1 Updated Jun 17, 2026
Python 6 1 Updated Jun 16, 2026
Python 1 Updated Jan 8, 2026
Python 6 Updated Jun 7, 2026

Extract a target speaker’s clean, non-overlapped speech from multi-speaker audio and export word-safe LJSpeech-style TTS datasets.

Python 18 2 Updated Jun 14, 2026
Python 4 1 Updated May 20, 2025
Python 27 1 Updated Feb 27, 2026

Official implementation of "USAD: Universal Speech and Audio Representation via Distillation"

Python 10 1 Updated Jun 7, 2026

Zonos2 is a leading open-weight text-to-speech MoE.

Python 214 24 Updated Jun 16, 2026
Python 3 Updated Jun 14, 2026

Koel Labs innovates open-source speech research, inclusive speech technologies, and real-time pronunciation feedback for language learners! This repo contains the ML training, evaluation, and data …

Jupyter Notebook 21 6 Updated Jun 17, 2026

Codebase for the Interspeech 2026 Paper: MambAdapter: Lightweight Mamba-Based Adapters for Parameter-Efficient Transfer Learning in Speech and Audio

Python 3 Updated Jun 10, 2026

Inference server for MioTTS, a lightweight and fast LLM-based TTS model.

Python 144 21 Updated Feb 14, 2026
Python 5 Updated Jun 15, 2026

🚀 Fastest Anything-to-Audio Gen for conditioned sound and music creation.

Python 61 15 Updated Jun 16, 2026

Native full-duplex speech dialogue inference for BayLing-Duplex.

Python 14 2 Updated Jun 17, 2026
Python 1 Updated Jun 5, 2026
Next