Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Python 18,030 1,849 Updated Jun 15, 2026

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 4,083 353 Updated Jan 8, 2025

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Python 11,618 1,698 Updated Jun 15, 2026

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,384 3,436 Updated Jun 15, 2026

lllyasviel / sd-forge-layerdiffuse

[WIP] Layer Diffusion for WebUI (via Forge)

Python 4,110 352 Updated Aug 30, 2024

Mikubill / sd-webui-controlnet

WebUI extension for ControlNet

Python 17,855 2,015 Updated Aug 12, 2024

zuruoke / watermark-removal

a machine learning image inpainting task that instinctively removes watermarks from image indistinguishable from the ground truth image

Python 4,602 536 Updated Jun 5, 2026

yerfor / GeneFacePlusPlus

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Python 1,809 255 Updated Oct 18, 2024

TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 5,987 863 Updated Sep 26, 2025

facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 10,214 1,514 Updated Apr 24, 2024

nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)

Python 1,246 188 Updated May 18, 2026

Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 13,042 2,829 Updated Jun 22, 2025

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,147 2,697 Updated Jan 23, 2026

facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multi…

Python 1,319 187 Updated Nov 16, 2023

Anjok07 / ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

Python 25,069 1,874 Updated Mar 13, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 45,564 6,115 Updated Aug 16, 2024

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 58,708 6,421 Updated Apr 30, 2026

voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

Python 9,309 1,224 Updated Jun 15, 2026

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 102,780 12,539 Updated Apr 15, 2026

Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

C++ 10,482 954 Updated May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ShiFei fifilyu

Achievements

Achievements

Block or report fifilyu

Stars

Hmbown / CodeWhale

espnet / espnet

myshell-ai / OpenVoice

TencentGameMate / chinese_speech_pretrain

SYSTRAN / faster-whisper

suno-ai / bark

RVC-Project / Retrieval-based-Voice-Conversion-WebUI

HeyPuter / puter

pyannote / pyannote-audio

kaldi-asr / kaldi

modelscope / FunASR