Skip to content
View qichilu's full-sized avatar

Block or report qichilu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 1,069 403 Updated Jan 23, 2026

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 14,337 2,115 Updated Apr 4, 2026

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,956 779 Updated Feb 11, 2024

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Python 524 88 Updated Dec 28, 2023

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,206 333 Updated Sep 10, 2025
Python 8,292 2,357 Updated Apr 14, 2026

Implementation of the paper "Spoken Language Recognition using X-vectors" in Pytorch

Python 108 26 Updated Jul 20, 2020

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,889 261 Updated Dec 8, 2025

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 8,808 764 Updated Mar 26, 2026

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 75,675 10,240 Updated Apr 14, 2026
Python 1,459 185 Updated Feb 11, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 15,686 1,647 Updated Mar 17, 2026

SoX, Swiss Army knife of sound processing

C 60 33 Updated Nov 20, 2017

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Python 9,684 2,096 Updated Apr 16, 2024

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

Python 726 100 Updated Apr 26, 2024

Python client for Triton's Kaldi backend

C++ 2 Updated Dec 27, 2022

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 14,780 3,410 Updated Aug 12, 2024

Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.

C++ 673 193 Updated Apr 15, 2026

An easy to use PyTorch to TensorRT converter

Python 4,864 700 Updated Aug 17, 2024

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,900 2,345 Updated Apr 13, 2026

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,964 393 Updated Apr 15, 2026

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Python 2,341 554 Updated Jul 27, 2024

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Python 112 32 Updated Aug 26, 2021

The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”

Python 88 12 Updated Dec 20, 2022

Kaldi ASR wrapper scripts

Python 2 1 Updated Jul 17, 2017

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Python 2,166 614 Updated Oct 27, 2023

Command line utility for forced alignment using Kaldi

Python 1,796 287 Updated Mar 31, 2026

Include Basis-MelGAN, MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Python 157 19 Updated Jul 2, 2021

☺️ One Shot Voice Cloning base on Unet-TTS

Jupyter Notebook 245 43 Updated Mar 22, 2022

Library for Textless Spoken Language Processing

Python 557 57 Updated Aug 29, 2023
Next