Skip to content
View fifilyu's full-sized avatar

Block or report fifilyu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open-source, community-driven agent harness

Rust 38,386 3,304 Updated Jun 15, 2026

End-to-End Speech Processing Toolkit

Python 9,858 2,408 Updated Jun 15, 2026

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 36,719 4,099 Updated Apr 19, 2025

chinese speech pretrained models

Shell 1,204 89 Updated Aug 23, 2024

Faster Whisper transcription with CTranslate2

Python 23,647 1,936 Updated Nov 19, 2025

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 39,160 4,682 Updated Aug 19, 2024

Easily train a good VC model with voice data <= 10 mins!

Python 36,016 5,081 Updated Nov 24, 2024

🌐 The Internet Computer! Free, Open-Source, and Self-Hostable.

TypeScript 42,303 3,889 Updated Jun 15, 2026

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 10,126 1,070 Updated Jun 12, 2026

kaldi-asr/kaldi is the official location of the Kaldi project.

Shell 15,413 5,357 Updated Sep 22, 2025

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Python 18,030 1,849 Updated Jun 15, 2026

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 4,083 353 Updated Jan 8, 2025

A PyTorch-based Speech Toolkit

Python 11,618 1,698 Updated Jun 15, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,384 3,436 Updated Jun 15, 2026

[WIP] Layer Diffusion for WebUI (via Forge)

Python 4,110 352 Updated Aug 30, 2024

WebUI extension for ControlNet

Python 17,855 2,015 Updated Aug 12, 2024

a machine learning image inpainting task that instinctively removes watermarks from image indistinguishable from the ground truth image

Python 4,602 536 Updated Jun 5, 2026

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Python 1,809 255 Updated Oct 18, 2024

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 5,987 863 Updated Sep 26, 2025

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 10,214 1,514 Updated Apr 24, 2024

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)

Python 1,246 188 Updated May 18, 2026

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 13,042 2,829 Updated Jun 22, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,147 2,697 Updated Jan 23, 2026

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multi…

Python 1,319 187 Updated Nov 16, 2023

GUI for a Vocal Remover that uses Deep Neural Networks.

Python 25,069 1,874 Updated Mar 13, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 45,564 6,115 Updated Aug 16, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 58,708 6,421 Updated Apr 30, 2026

so-vits-svc fork with realtime support, improved interface and more features.

Python 9,309 1,224 Updated Jun 15, 2026

Robust Speech Recognition via Large-Scale Weak Supervision

Python 102,780 12,539 Updated Apr 15, 2026

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

C++ 10,482 954 Updated May 24, 2026
Next