Skip to content
View cpdu's full-sized avatar

Block or report cpdu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open-Source Frontier Voice AI

Python 18,731 2,070 Updated Dec 17, 2025

Seamlessly download and de-drm comics and manga from Kindle in highest possible quality

Python 91 25 Updated Feb 3, 2024
Python 74 5 Updated Jun 25, 2025

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,073 1,088 Updated Nov 18, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,193 2,684 Updated Aug 12, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,499 302 Updated Nov 5, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 6,968 601 Updated Jul 4, 2025

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis

Python 1,131 54 Updated Mar 5, 2025

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,849 139 Updated Jul 5, 2024

Foundational model for human-like, expressive TTS

Python 4,197 694 Updated Jul 30, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

70,799 8,102 Updated Jun 4, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,554 774 Updated May 27, 2025

An unofficial implementation of "UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding".

Python 26 1 Updated Nov 4, 2023

Pytorch implementation of BigVSAN

Python 203 18 Updated Dec 9, 2025

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 438 34 Updated Jan 25, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,662 165 Updated Dec 5, 2025

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,969 784 Updated Feb 11, 2024

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,194 334 Updated Sep 10, 2025

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 668 53 Updated Oct 1, 2024

A differentiable version of SPTK

Python 191 19 Updated Dec 11, 2025

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,727 1,167 Updated Nov 14, 2024
Python 1,309 384 Updated Nov 28, 2025

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

588 32 Updated Jun 19, 2023

An ODE-based generative neural vocoder using Rectified Flow

Python 58 6 Updated Apr 29, 2023

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,049 4,670 Updated Dec 19, 2025

Making large AI models cheaper, faster and more accessible

Python 41,298 4,546 Updated Dec 8, 2025

Keep track of big models in audio domain, including speech, singing, music etc.

503 29 Updated Sep 26, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,333 3,242 Updated Dec 20, 2025

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,157 143 Updated Sep 5, 2024
Next