wanghuii1

Hui Wang wanghuii1

8 followers · 1 following

HangZhou, China

Achievements

Stars

FunAudioLLM / Fun-Audio-Chat

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

Python 343 22 Updated Dec 25, 2025

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,760 304 Updated Aug 14, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,204 835 Updated Nov 20, 2025

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 32,048 6,639 Updated Sep 30, 2025

Yuan-ManX / ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

878 85 Updated Jul 8, 2025

yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,099 644 Updated Aug 10, 2024

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,828 2,038 Updated Dec 21, 2025

shivammehta25 / Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Jupyter Notebook 1,208 176 Updated Dec 8, 2025

facebookresearch / flow_matching

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,900 282 Updated Sep 25, 2025

lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,194 334 Updated Sep 10, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 43,991 5,864 Updated Aug 16, 2024

facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,865 344 Updated Jan 4, 2024

marianne-m / brouhaha-vad

Predicts the level of noise and reverberation on your audiofiles

Jupyter Notebook 173 33 Updated Jun 17, 2025

BUTSpeechFIT / DiariZen

A toolkit for speaker diarization.

Jupyter Notebook 350 39 Updated Dec 9, 2025

jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,780 1,384 Updated Dec 6, 2023

pyannote / AMI-diarization-setup

Forked from BUTSpeechFIT/AMI-diarization-setup

Shell 44 21 Updated Jan 22, 2024

IDRnD / redimnet

The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"

Python 184 16 Updated Sep 24, 2025

SpeechColab / GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 179 11 Updated Sep 1, 2025

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 51,408 8,617 Updated Nov 12, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 18,290 2,038 Updated Dec 23, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,213 2,689 Updated Aug 12, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

17,058 1,098 Updated Dec 25, 2025

X-LANCE / SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 941 101 Updated Oct 24, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,483 7,820 Updated Dec 24, 2025

JusperLee / Conv-TasNet

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement

Python 525 79 Updated May 26, 2023

JusperLee / Dual-Path-RNN-Pytorch

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Python 460 68 Updated Feb 14, 2023

alibabasglab / MossFormer2

This is the audio sample repository for speech separation model "MossFormer2".

Python 161 11 Updated Nov 28, 2024

xuchenglin28 / speaker_extraction_SpEx

multi-scale time domain speaker extraction

Python 70 19 Updated Jun 7, 2021

modelscope / FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Python 5,241 625 Updated Jul 11, 2025

cvdfoundation / ava-dataset

The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions localized in space and time, resulting in 1.65M action labels with multiple labels per human occurring fr…

339 29 Updated Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hui Wang wanghuii1

Achievements

Achievements

Block or report wanghuii1

Stars

FunAudioLLM / Fun-Audio-Chat

modelscope / ClearerVoice-Studio

kyutai-labs / moshi

facebookresearch / fairseq

Yuan-ManX / ai-audio-datasets

yl4579 / StyleTTS2

SWivid / F5-TTS

shivammehta25 / Matcha-TTS

facebookresearch / flow_matching

lifeiteng / vall-e

coqui-ai / TTS

facebookresearch / encodec

marianne-m / brouhaha-vad

BUTSpeechFIT / DiariZen

jaywalnut310 / vits

pyannote / AMI-diarization-setup

IDRnD / redimnet

SpeechColab / GigaSpeech2

karpathy / nanoGPT

FunAudioLLM / CosyVoice

haotian-liu / LLaVA

BradyFU / Awesome-Multimodal-Large-Language-Models

X-LANCE / SLAM-LLM

hiyouga / LLaMA-Factory

JusperLee / Conv-TasNet

JusperLee / Dual-Path-RNN-Pytorch

alibabasglab / MossFormer2

xuchenglin28 / speaker_extraction_SpEx

modelscope / FunClip

cvdfoundation / ava-dataset