Skip to content
View wanghuii1's full-sized avatar

Block or report wanghuii1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

Python 343 22 Updated Dec 25, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,760 304 Updated Aug 14, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,204 835 Updated Nov 20, 2025

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 32,048 6,639 Updated Sep 30, 2025

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

878 85 Updated Jul 8, 2025

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,099 644 Updated Aug 10, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,828 2,038 Updated Dec 21, 2025

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Jupyter Notebook 1,208 176 Updated Dec 8, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,900 282 Updated Sep 25, 2025

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,194 334 Updated Sep 10, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 43,991 5,864 Updated Aug 16, 2024

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,865 344 Updated Jan 4, 2024

Predicts the level of noise and reverberation on your audiofiles

Jupyter Notebook 173 33 Updated Jun 17, 2025

A toolkit for speaker diarization.

Jupyter Notebook 350 39 Updated Dec 9, 2025

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,780 1,384 Updated Dec 6, 2023

The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"

Python 184 16 Updated Sep 24, 2025

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 179 11 Updated Sep 1, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 51,408 8,617 Updated Nov 12, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 18,290 2,038 Updated Dec 23, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,213 2,689 Updated Aug 12, 2024

✨✨Latest Advances on Multimodal Large Language Models

17,058 1,098 Updated Dec 25, 2025

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 941 101 Updated Oct 24, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,483 7,820 Updated Dec 24, 2025

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement

Python 525 79 Updated May 26, 2023

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Python 460 68 Updated Feb 14, 2023

This is the audio sample repository for speech separation model "MossFormer2".

Python 161 11 Updated Nov 28, 2024

multi-scale time domain speaker extraction

Python 70 19 Updated Jun 7, 2021

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Python 5,241 625 Updated Jul 11, 2025

The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions localized in space and time, resulting in 1.65M action labels with multiple labels per human occurring fr…

339 29 Updated Feb 9, 2022
Next